使用OpenSearch純向量檢索場景實踐。
1. 什么是向量檢索
人工智能算法可以對物理世界的人/物/場景所產生各種非結構化數據(如語音、圖片、視頻,語言文字、行為等)進行抽象,變成多維的向量。這些向量如同數學空間中的坐標,標識著各個實體和實體關系。我們一般將非結構化數據變成向量的過程稱為 Embedding,而非結構化檢索則是對這些生成的向量進行檢索,從而找到相應實體的過程。
非結構化檢索本質是向量檢索技術,其主要的應用領域如人臉識別、推薦系統、圖片搜索、視頻指紋、語音處理、自然語言處理、文件搜索等。隨著 AI 技術的廣泛應用,以及數據規模的不斷增長,向量檢索也逐漸成了 AI 技術鏈路中不可或缺的一環,更是對傳統搜索技術的補充,并且具備多模態搜索的能力。
為了滿足更多元化、更復雜的多模態檢索場景,開放搜索推出向量檢索功能,可以一站式完成高性能向量檢索系統的搭建。
2. 開放搜索實例創建
步驟1:點擊立即購買
步驟2:配置實例規格參數
配置說明:
商品類型:
后付費
(測試期間可使用后付費);地域和可用區:
華東1(杭州)
(可自定義);應用名:
test_vector_opensearch
(可自定義);版本類型:
通用版
;規格:選擇10GB,1000LCU(獨享計算型 最低配),點擊“立即購買”;
步驟3:確認訂單:勾選“我已閱讀并同意”后“確認開通
開放搜索產品實例創建完成。
3. 向量召回服務實例配置
開放搜索控制臺配置應用需要依次按照如下步驟進行:功能選擇-->應用結構-->索引結構-->數據源-->完成。
3.1. 應用結構
在開放搜索控制臺-應用管理-->應用列表中找到對應的應用點擊“配置”
步驟一:配置應用結構
應用結構創建,有4種方式:數據源創建,手動創建,模板上傳和文檔上傳,此處以MaxCompute為例:點擊通過數據源創建選擇MaxCompute點擊新建數據庫
填寫連接數據庫信息:
步驟二:選擇對應的表點擊確認
步驟三:選擇主表和主鍵,如有多表join需求,可以參考多表join。
注意:向量字段一定要設置為double array類型。
3.2. 索引結構
索引字段說明
應用結構配置完成后,系統會自動生成索引字段及其分析器、索引標簽、和包含字段:
說明:這里需要為向量字段(vector_field
)配置向量索引,維度可根據用戶需求進行選擇,OpenSearch默認支持64、128、256、512、1536維向量。
屬性字段默認展示字段說明
3.3. 數據源
在配置應用結構時如選擇MaxCompute數據源,此處會自動映射對應的項目表,您只需根據需求填寫對應的分區導入條件即可,不填默認導入表全部分區數據:
若數據源表字段名稱與配置應用結構中名稱不一致,可點擊編輯按鈕手動修改映射字段:
確認無誤后點擊完成:
3.4. 配置完成
4. 在線查詢
向量查詢語法點擊此處進行參考。
搜索測試頁檢索:擴展功能>搜索測試
#這里使用的是1536維向量,未全部展示
vector_index:'-0.01786,0.03692,0.03710,0.01668,0.03655,-0.03515,0.02017,-0.00653,-0.01419,-0.01708,-0.00091,-0.03528,0.02821,-0.02194,-0.01609,-0.02045,0.02209,0.06413,0.06233,0.03064,-0.00863,-0.06810,0.00729,0.07912,-0.03948,0.06932,0.02051,-0.00688,-0.01138,0.03207,0.03040,-0.00050,0.06220,-0.03895,0.04575,-0.00259,0.04358,0.02027,0.03342,-0.02916,0.04793,-0.02954,0.04327,0.06156,-0.00230,0.00653,0.01515,-0.00287,0.03546,-0.01551,-0.03049,0.07542,-0.01563,0.00680,0.00598,-0.00396,0.00330,0.00359,-0.03395,-0.00825,-0.02175,0.04479,0.04008,0.03558,-0.03011,-0.00015,0.03086,-0.00941,0.03113,0.00758,-0.04333,0.04607,-0.02520,-0.01260,-0.04726,0.00564,-0.02423,-0.00439,-0.02739,-0.01674,0.06426,-0.05995,0.01762,0.04370,0.02211,-0.03174,0.04465,0.00475,-0.03577,0.01111,-0.00963,0.03510,-0.02533,-0.00444,0.00161,0.00561,0.00066,-0.04074,0.00682,0.03293,-0.01630,-0.02575,0.02834,0.02679,-0.04558,0.02395,0.00531,0.01240,0.04064,0.03599,0.00172,0.00413,-0.06839...&sf=0.8'