日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

基礎(chǔ)特性

更新時(shí)間:

Java Low Level REST Client是Elasticsearch官方提供的低級(jí)別REST客戶端,其API不負(fù)責(zé)數(shù)據(jù)的編碼與解碼。Lindorm向量引擎支持向量數(shù)據(jù)檢索功能,兼容Elasticsearch協(xié)議,同時(shí)支持標(biāo)量、向量、全文混合檢索功能。如果您想要自定義請(qǐng)求和響應(yīng)處理方式,可以通過(guò)Java Low Level REST Client訪問(wèn)向量引擎。

前提條件

  • 已安裝Java環(huán)境,要求安裝JDK 1.8及以上版本。

  • 已開(kāi)通向量引擎。如何開(kāi)通,請(qǐng)參見(jiàn)開(kāi)通向量引擎

  • 已開(kāi)通搜索引擎。如何開(kāi)通,請(qǐng)參見(jiàn)開(kāi)通指南

  • 已將客戶端IP地址添加至Lindorm白名單,具體操作請(qǐng)參見(jiàn)設(shè)置白名單

準(zhǔn)備工作

安裝Java Low Level REST Client

以Maven項(xiàng)目為例,在pom.xml文件的dependencies中添加依賴項(xiàng)。示例代碼如下:

<dependency>
  <groupId>org.elasticsearch.client</groupId>
  <artifactId>elasticsearch-rest-client</artifactId>
  <version>7.10.0</version>
</dependency>
<dependency>
  <groupId>org.apache.logging.log4j</groupId>
  <artifactId>log4j-core</artifactId>
  <version>2.8.2</version>
</dependency>
<dependency>
  <groupId>org.apache.logging.log4j</groupId>
  <artifactId>log4j-api</artifactId>
  <version>2.7</version>
</dependency>

連接搜索引擎

//Lindorm搜索引擎的Elasticsearch兼容地址
String search_url = "ld-t4n5668xk31ui****-proxy-search-public.lindorm.rds.aliyuncs.com";
int search_port = 30070;

// 配置用戶名密碼
String username = "user";
String password = "test";
final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
credentialsProvider.setCredentials(AuthScope.ANY, new UsernamePasswordCredentials(username, password));
RestClientBuilder restClientBuilder = RestClient.builder(new HttpHost(search_url, search_port));
restClientBuilder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
  @Override
  public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
    return httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider);
  }
});

參數(shù)說(shuō)明

參數(shù)

說(shuō)明

search_url

Lindorm搜索引擎的Elasticsearch兼容地址。如何獲取,請(qǐng)參見(jiàn)查看連接地址

重要
  • 如果應(yīng)用部署在ECS實(shí)例,建議您通過(guò)專有網(wǎng)絡(luò)訪問(wèn)Lindorm實(shí)例,可獲得更高的安全性和更低的網(wǎng)絡(luò)延遲。

  • 如果應(yīng)用部署在本地,在通過(guò)公網(wǎng)連接Lindorm實(shí)例前,需在控制臺(tái)開(kāi)通公網(wǎng)地址。開(kāi)通方式:在控制臺(tái)的左側(cè)導(dǎo)航欄,選擇數(shù)據(jù)庫(kù)連接,單擊搜索引擎頁(yè)簽,在頁(yè)簽右上角單擊開(kāi)通公網(wǎng)地址

  • 通過(guò)專有網(wǎng)絡(luò)訪問(wèn)Lindorm實(shí)例,search_url請(qǐng)?zhí)顚慐lasticsearch兼容地址對(duì)應(yīng)的專有網(wǎng)絡(luò)地址。通過(guò)公網(wǎng)訪問(wèn)Lindorm實(shí)例,search_url請(qǐng)?zhí)顚慐lasticsearch兼容地址對(duì)應(yīng)的公網(wǎng)地址。

search_port

Lindorm搜索引擎Elasticsearch兼容的端口,固定為30070。

username

訪問(wèn)搜索引擎的用戶名和密碼。

默認(rèn)用戶名和密碼的獲取方式:在控制臺(tái)的左側(cè)導(dǎo)航欄,選擇數(shù)據(jù)庫(kù)連接,單擊搜索引擎頁(yè)簽,在搜索引擎頁(yè)簽可獲取。

password

創(chuàng)建向量索引

hnsw類型索引

以創(chuàng)建索引vector_test為例:

String indexName = "vector_test";

// 創(chuàng)建索引
Request indexRequest = new Request("PUT", "/" + indexName);
indexRequest.setJsonEntity("{\n" +
  " \"settings\" : {\n" +
  "    \"index\": {\n" +
  "      \"number_of_shards\": 2,\n" +
  "      \"knn\": true\n" +
  "    }\n" +
  "  },\n" +
  "  \"mappings\": {\n" +
  "    \"_source\": {\n" +
  "      \"excludes\": [\"vector1\"]\n" +
  "    },\n" +
  "    \"properties\": {\n" +
  "      \"vector1\": {\n" +
  "        \"type\": \"knn_vector\",\n" +
  "        \"dimension\": 3,\n" +
  "        \"data_type\": \"float\",\n" +
  "        \"method\": {\n" +
  "          \"engine\": \"lvector\",\n" +
  "          \"name\": \"hnsw\", \n" +
  "          \"space_type\": \"l2\",\n" +
  "          \"parameters\": {\n" +
  "            \"m\": 24,\n" +
  "            \"ef_construction\": 500\n" +
  "         }\n" +
  "       }\n" +
  "      },\n" +
  "      \"field1\": {\n" +
  "        \"type\": \"long\"\n" +
  "      }\n" +
  "    }\n" +
  "  }\n" +
  "}");
Response response = restClient.performRequest(indexRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("responseBody = " + responseBody);

ivfpq類型索引

以創(chuàng)建索引vector_ivfpq_test為例:

String indexName = "vector_ivfpq_test";
Request indexRequest = new Request("PUT", "/" + indexName);
int dim = 3;
String createIndexJson = "{\n" +
  "  \"settings\": {\n" +
  "    \"index\": {\n" +
  "      \"number_of_shards\": 4,\n" +
  "      \"knn\": true,\n" +
  "      \"knn.offline.construction\": true\n" +
  "    }\n" +
  "  },\n" +
  "  \"mappings\": {\n" +
  "    \"_source\": {\n" +
  "      \"excludes\": [\"vector1\"]\n" +
  "    },\n" +
  "    \"properties\": {\n" +
  "      \"vector1\": {\n" +
  "        \"type\": \"knn_vector\",\n" +
  "        \"dimension\": %d,\n" +
  "        \"data_type\": \"float\",\n" +
  "        \"method\": {\n" +
  "          \"engine\": \"lvector\",\n" +
  "          \"name\": \"ivfpq\",\n" +
  "          \"space_type\": \"cosinesimil\",\n" +
  "          \"parameters\": {\n" +
  "            \"m\": %d,\n" +
  "            \"nlist\": 10000,\n" +
  "            \"centroids_use_hnsw\": true,\n" +
  "            \"centroids_hnsw_m\": 48,\n" +
  "            \"centroids_hnsw_ef_construct\": 500,\n" +
  "            \"centroids_hnsw_ef_search\": 200\n" +
  "          }\n" +
  "        }\n" +
  "      },\n" +
  "      \"field1\": {\n" +
  "        \"type\": \"long\"\n" +
  "      }\n" +
  "    }\n" +
  "  }\n" +
  "}"

createIndexJson = String.format(createIndexJson, dim, dim);
indexRequest.setJsonEntity(createIndexJson);
Response response = restClient.performRequest(indexRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("responseBody = " + responseBody);

稀疏向量索引

以創(chuàng)建索引vector_sparse_test為例:

String indexName = "vector_sparse_test";

// 創(chuàng)建索引
Request indexRequest = new Request("PUT", "/" + indexName);
indexRequest.setJsonEntity("{\n" +
  " \"settings\" : {\n" +
  "    \"index\": {\n" +
  "      \"number_of_shards\": 2,\n" +
  "      \"knn\": true\n" +
  "    }\n" +
  "  },\n" +
  "  \"mappings\": {\n" +
  "    \"_source\": {\n" +
  "      \"excludes\": [\"vector1\"]\n" +
  "    },\n" +
  "    \"properties\": {\n" +
  "      \"vector1\": {\n" +
  "        \"type\": \"knn_vector\",\n" +
  "        \"data_type\": \"sparse_vector\",\n" +
  "        \"method\": {\n" +
  "          \"engine\": \"lvector\",\n" +
  "          \"name\": \"sparse_hnsw\",\n" +
  "          \"space_type\": \"innerproduct\",\n" +
  "          \"parameters\": {\n" +
  "            \"m\": 24,\n" +
  "            \"ef_construction\": 200\n" +
  "         }\n" +
  "       }\n" +
  "      },\n" +
  "      \"field1\": {\n" +
  "        \"type\": \"long\"\n" +
  "      }\n" +
  "    }\n" +
  "  }\n" +
  "}");
Response response = restClient.performRequest(indexRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("responseBody = " + responseBody);

數(shù)據(jù)寫入

包含向量列的索引的數(shù)據(jù)寫入方式與普通索引的數(shù)據(jù)寫入方式一致。

單條寫入

以寫入索引vector_test為例:

String indexName = "vector_test";
String documentId = "1";
String jsonString = "{ \"field1\": 1, \"vector1\": [1.2, 1.3, 1.4] }";
Request request = new Request(
  "PUT",  // 指定了文檔ID時(shí)使用PUT方法
  "/" + indexName + "/_doc/" + documentId);
request.setJsonEntity(jsonString);
response = restClient.performRequest(request);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("writeDoc responseBody = " + responseBody);

批量寫入

// 批量寫入數(shù)據(jù)
Random random = new Random();
Request bulkRequest = new Request("POST", "/_bulk");
StringBuilder bulkJsonBuilder = new StringBuilder();
for (int i = 2; i < 10; i++) {
  // 請(qǐng)將field和value替換為實(shí)際業(yè)務(wù)字段與值
  bulkJsonBuilder.append("{\"index\":{\"_index\":\"").append(indexName).append("\",\"_id\":\"").append(i).append("\"}}").append("\n");
  String value = String.valueOf(random.nextInt());
  float[] floatArray = {random.nextFloat(), random.nextFloat(), random.nextFloat()};
  String floatArrayString = Arrays.toString(floatArray);
  System.out.println(i + " " + value + " " + floatArrayString);
  bulkJsonBuilder.append("{\"field1\":\"").append(value).append("\",\"vector1\":\"").append(floatArrayString).append("\"}").append("\n");
}
bulkRequest.setJsonEntity(bulkJsonBuilder.toString());
response = restClient.performRequest(bulkRequest);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("bulkWriteDoc responseBody = " + responseBody);

// 發(fā)送刷新請(qǐng)求,強(qiáng)制已寫數(shù)據(jù)可見(jiàn)
response = restClient.performRequest(new Request("POST", "/" + indexName + "/_refresh"));
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("responseBody = " + responseBody);

稀疏向量寫入

寫入方式與上述方式相同,但需要修改vector1的格式。

// 寫入單條數(shù)據(jù)
String documentId = "1";
String jsonString = "{ \"field1\": 1, \"vector1\": {\"indices\": [10, 12, 16], \"values\": [1.2, 1.3, 1.4]} }";
Request request = new Request(
  "PUT",  // 指定了文檔ID時(shí)使用PUT方法
  "/" + indexName + "/_doc/" + documentId);
request.setJsonEntity(jsonString);
response = restClient.performRequest(request);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("writeDoc responseBody = " + responseBody);

索引構(gòu)建

重要
  • 除ivfpq索引,其他類型索引創(chuàng)建時(shí)index.knn.offline.construction默認(rèn)為false,即在線索引,無(wú)需手動(dòng)構(gòu)建。

  • 在觸發(fā)ivfpq索引構(gòu)建前需注意:在創(chuàng)建ivfpq索引時(shí),需將index.knn.offline.construction顯式指定為true,且在發(fā)起構(gòu)建時(shí)務(wù)必確保已寫入足夠的數(shù)據(jù)量,必須大于256條且超過(guò)nlist的30倍。

  • 手動(dòng)觸發(fā)索引構(gòu)建完成后,后續(xù)可正常寫入和查詢,無(wú)需再次構(gòu)建索引

觸發(fā)構(gòu)建

以構(gòu)建索引vector_ivfpq_test為例:

// 構(gòu)建索引
Request buildIndexRequest = new Request("POST", "/_plugins/_vector/index/build");
String jsonString = "{ \"indexName\": \"vector_ivfpq_test\", \"fieldName\": \"vector1\", \"removeOldIndex\": \"true\" }";
response = restClient.performRequest(buildIndexRequest);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("buildIndex responseBody = " + responseBody);

參數(shù)說(shuō)明

參數(shù)

是否必填

說(shuō)明

indexName

表名稱,例如vector_ivfpq_test

fieldName

針對(duì)哪個(gè)字段構(gòu)建索引,例如vector1

removeOldIndex

構(gòu)建索引時(shí),是否刪除舊的索引。取值如下:

  • true:在觸發(fā)構(gòu)建時(shí),會(huì)刪除舊的索引數(shù)據(jù),在構(gòu)建完成后才能進(jìn)行knn查詢。

    重要

    實(shí)際業(yè)務(wù)使用,建議設(shè)置為true

  • false(默認(rèn)值):會(huì)保留舊的索引,但會(huì)影響檢索性能。

返回結(jié)果如下:

{
  "payload": ["default_vector_ivfpq_test_vector1"]
}

返回結(jié)果為索引構(gòu)建生成的taskId

查看索引狀態(tài)

// 查看索引狀態(tài)
Request buildIndexRequest = new Request("GET", "/_plugins/_vector/index/tasks");
String jsonString = "{ \"indexName\": \"vector_ivfpq_test\", \"fieldName\": \"vector1\", \"taskIds\": \"[default_vector_ivfpq_test_vector1]\" }";
buildIndexRequest.setJsonEntity(jsonString);
Response response = restClient.performRequest(buildIndexRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("queryBuildIndex responseBody = " + responseBody);

其中,taskIds為觸發(fā)構(gòu)建時(shí)生成的taskId,可以填寫空的數(shù)組,例如\"taskIds\": \"[]\",效果與上述已填寫taskIds的效果一致。

返回結(jié)果如下:

{
  "payload": ["task: default_vector_ivfpq_test_vector1, stage: FINISH, innerTasks: xxx, info: finish building"]
}

其中,stage表示構(gòu)建狀態(tài),共包含以下幾種狀態(tài):START(開(kāi)始構(gòu)建)、TRAIN(訓(xùn)練階段)、BUILDING(構(gòu)建中)、ABORT(終止構(gòu)建)、FINISH(構(gòu)建完成)和FAIL(構(gòu)建失敗)。

說(shuō)明

ABORT通常調(diào)用/index/abort接口來(lái)終止索引構(gòu)建。

終止構(gòu)建

終止索引的構(gòu)建流程。狀態(tài)為FINISH的索引不支持調(diào)用該方法。

// 終止構(gòu)建索引
Request buildIndexRequest = new Request("POST", "/_plugins/_vector/index/tasks/abort");
String jsonString = "{ \"indexName\": \"vector_ivfpq_test\", \"fieldName\": \"vector1\", \"taskIds\": \"[\"default_vector_ivfpq_test_vector1\"]\" }";
buildIndexRequest.setJsonEntity(jsonString);
Response response = restClient.performRequest(buildIndexRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("abortBuildIndex responseBody = " + responseBody);

返回結(jié)果如下:

{
  "payload":["Task: default_vector_ivfpq_test_vector1 remove success"]
}

數(shù)據(jù)查詢

純向量數(shù)據(jù)查詢

純向量數(shù)據(jù)的查詢可以通過(guò)knn結(jié)構(gòu)實(shí)現(xiàn)。

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
jsonString = "{"
  + "\"size\": 10,"
  + "\"query\": {"
  +     "\"knn\": {"
  +         "\"vector1\": {"
  +             "\"vector\": [2.2, 2.3, 2.4],"
  +             "\"k\": 10"
  +         "}"
  +     "}"
  + "},"
  + "\"ext\": {\"lvector\": {\"min_score\": \"0.1\"}}"
  + "}";
searchRequest.setJsonEntity(jsonString);
response = restClient.performRequest(searchRequest);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);

參數(shù)說(shuō)明

參數(shù)結(jié)構(gòu)

參數(shù)

是否必填

說(shuō)明

knn

vector

查詢時(shí)使用的向量。

k

返回最相似的K個(gè)數(shù)據(jù)。

重要

在純向量檢索場(chǎng)景中,建議將sizek設(shè)置為相同的值。

ext

lvector.min_score

相似度閾值,要求返回的向量得分大于該值。返回的向量得分范圍為[0,1]。

取值范圍:[0,+inf]。默認(rèn)值為0

lvector.filter_type

融合查詢使用的模式。取值如下:

  • pre_filter:先過(guò)濾結(jié)構(gòu)化數(shù)據(jù),再查詢向量數(shù)據(jù)。

  • post_filter:先查詢向量數(shù)據(jù),再過(guò)濾結(jié)構(gòu)化數(shù)據(jù)。

默認(rèn)值為空。

lvector.ef_search

HNSW算法中,索引構(gòu)建時(shí)動(dòng)態(tài)列表的長(zhǎng)度。只能用于HNSW算法。

取值范圍:[1,1000]。默認(rèn)值為100

lvector.nprobe

要查詢的聚類單元(cluster units)的數(shù)量。請(qǐng)根據(jù)您的召回率要求,對(duì)該參數(shù)的值進(jìn)行調(diào)整已達(dá)到理想效果。值越大,召回率越高,搜索性能越低。

取值范圍:[1,method.parameters.nlist]。無(wú)默認(rèn)值。

重要

僅適用于ivfpq算法。

lvector.reorder_factor

使用原始向量創(chuàng)建重排序(reorder)。ivfpq算法計(jì)算的距離為量化后的距離,會(huì)有一定的精度損失,需要使用原始向量進(jìn)行重排序。比例為k * reorder_factor ,通常用于提升召回精度,但會(huì)增加性能開(kāi)銷。

取值范圍:[1,200]。默認(rèn)值為10

重要
  • 僅適用于ivfpq算法。

  • k值較小時(shí)可以設(shè)置為5,如果k大于100,直接設(shè)置為1即可。

以hnsw索引vector_test為例,返回結(jié)果如下:

單擊展開(kāi)返回結(jié)果

{
    "took": 65,
    "timed_out": false,
    "terminated_early": false,
    "num_reduce_phases": 0,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10,
            "relation": "eq"
        },
        "max_score": 0.25,
        "hits": [
            {
                "_index": "vector_test",
                "_id": "1",
                "_score": 0.25
            },
            {
                "_index": "vector_test",
                "_id": "32",
                "_score": 0.14561969
            },
            {
                "_index": "vector_test",
                "_id": "122",
                "_score": 0.13761099
            },
            {
                "_index": "vector_test",
                "_id": "80",
                "_score": 0.13138853
            },
            {
                "_index": "vector_test",
                "_id": "12",
                "_score": 0.12602884
            },
            {
                "_index": "vector_test",
                "_id": "120",
                "_score": 0.123480916
            },
            {
                "_index": "vector_test",
                "_id": "39",
                "_score": 0.12126313
            },
            {
                "_index": "vector_test",
                "_id": "27",
                "_score": 0.117812514
            },
            {
                "_index": "vector_test",
                "_id": "29",
                "_score": 0.11756193
            },
            {
                "_index": "vector_test",
                "_id": "81",
                "_score": 0.11755075
            }
        ]
    }
}

返回指定字段

如果需要在查詢時(shí)返回指定字段,可以指定 "_source": ["field1", "field2"] 或使用"_source": true 返回非向量的全部字段。以查詢索引vector_test為例,使用方法如下:

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
jsonString = "{"
  + "\"size\": 10,"
  + "\"_source\": [\"field1\"],"
  + "\"query\": {"
  +     "\"knn\": {"
  +         "\"vector1\": {"
  +             "\"vector\": [2.2, 2.3, 2.4],"
  +             "\"k\": 10"
  +         "}"
  +     "}"
  + "},"
  + "\"ext\": {\"lvector\": {\"min_score\": \"0.1\"}}"
  + "}";
searchRequest.setJsonEntity(jsonString);
response = restClient.performRequest(searchRequest);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);

返回結(jié)果如下:

單擊展開(kāi)返回結(jié)果

{
  "took": 31,
  "timed_out": false,
  "terminated_early": false,
  "num_reduce_phases": 0,
  "_shards": {
    "total": 2,
    "successful": 2,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10,
      "relation": "eq"
    },
    "max_score": 0.25,
    "hits": [
      {
        "_index": "vector_test",
        "_id": "1",
        "_score": 0.25,
        "_source": {
          "field1": 1
        }
      },
      {
        "_index": "vector_test",
        "_id": "67",
        "_score": 0.15348388,
        "_source": {
          "field1": "-487556052"
        }
      },
      {
        "_index": "vector_test",
        "_id": "83",
        "_score": 0.1416535,
        "_source": {
          "field1": "1733994439"
        }
      },
      {
        "_index": "vector_test",
        "_id": "43",
        "_score": 0.13119161,
        "_source": {
          "field1": "-747555255"
        }
      },
      {
        "_index": "vector_test",
        "_id": "54",
        "_score": 0.1267109,
        "_source": {
          "field1": "-1544683361"
        }
      },
      {
        "_index": "vector_test",
        "_id": "110",
        "_score": 0.12533507,
        "_source": {
          "field1": "882740211"
        }
      },
      {
        "_index": "vector_test",
        "_id": "48",
        "_score": 0.124014825,
        "_source": {
          "field1": "-513152633"
        }
      },
      {
        "_index": "vector_test",
        "_id": "40",
        "_score": 0.12398689,
        "_source": {
          "field1": "1360426997"
        }
      },
      {
        "_index": "vector_test",
        "_id": "60",
        "_score": 0.12019993,
        "_source": {
          "field1": "10377260"
        }
      },
      {
        "_index": "vector_test",
        "_id": "61",
        "_score": 0.12009792,
        "_source": {
          "field1": "-2097991339"
        }
      }
    ]
  }
}

hsnw算法查詢

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
jsonString = "{"
  + "\"size\": 10,"
  + "\"query\": {"
  +     "\"knn\": {"
  +         "\"vector1\": {"
  +             "\"vector\": [2.2, 2.3, 2.4],"
  +             "\"k\": 10"
  +         "}"
  +     "}"
  + "},"
  + "\"ext\": {\"lvector\": {\"ef_search\": \"100\"}}"
  + "}";
searchRequest.setJsonEntity(jsonString);
response = restClient.performRequest(searchRequest);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);

ivfpq算法查詢

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
jsonString = "{"
  + "\"size\": 10,"
  + "\"query\": {"
  +     "\"knn\": {"
  +         "\"vector1\": {"
  +             "\"vector\": [2.2, 2.3, 2.4],"
  +             "\"k\": 10"
  +         "}"
  +     "}"
  + "},"
  + "\"ext\": {\"lvector\": {\"nprobe\": \"60\", \"reorder_factor\": \"2\"}}"
  + "}";
searchRequest.setJsonEntity(jsonString);
response = restClient.performRequest(searchRequest);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);
重要
  • 如果k值相對(duì)較大,如大于100,將reorder_factor的值設(shè)置為1即可。

  • 當(dāng)nlist的值為10000時(shí),可以先將nprobe設(shè)置為60,查看檢索效果。如果想繼續(xù)提升召回率,可適當(dāng)增加nprobe的值,如80、100、120、140、160,該值引起的性能損耗遠(yuǎn)小于reorder_factor,但也不適宜設(shè)置過(guò)大。

稀疏向量查詢

查詢方式與上述方式相同,但需要修改vector1的格式。

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
jsonString = "{"
  + "\"size\": 10,"
  + "\"query\": {"
  +     "\"knn\": {"
  +         "\"vector1\": {"
  +             "\"vector\": {\"indices\": [10, 45, 16], \"values\": [0.5, 0.5, 0.2]},"
  +             "\"k\": 10"
  +         "}"
  +     "}"
  + "}"
  + "}";
searchRequest.setJsonEntity(jsonString);
response = restClient.performRequest(searchRequest);
responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);

融合查詢

向量列的查詢可與普通列的查詢條件結(jié)合,并返回綜合的查詢結(jié)果。在實(shí)際業(yè)務(wù)使用時(shí), Post_Filter近似查詢通常能獲取更相似的檢索結(jié)果。

Pre-Filter近似查詢

通過(guò)在knn查詢結(jié)構(gòu)內(nèi)部添加過(guò)濾器filter,并指定filter_type參數(shù)的值為pre_filter,可實(shí)現(xiàn)先過(guò)濾結(jié)構(gòu)化數(shù)據(jù),再查詢向量數(shù)據(jù)。

說(shuō)明

目前結(jié)構(gòu)化過(guò)濾數(shù)據(jù)的上限為10,000條。

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
String jsonString = jsonString = "{"
  + "\"size\": 10,"
  + "\"query\": {"
  + "  \"knn\": {"
  + "    \"vector1\": {"
  + "      \"vector\": [2.2, 2.3, 2.4],"
  + "      \"filter\": {"
  + "        \"range\": {"
  + "          \"field1\": {"
  + "            \"gte\": 0"
  + "          }"
  + "        }"
  + "      },"
  + "      \"k\": 10"
  + "    }"
  + "  }"
  + "},"
  + "\"ext\": {\"lvector\": {\"filter_type\": \"pre_filter\"}}"
  + "}";
searchRequest.setJsonEntity(jsonString);
Response response = restClient.performRequest(searchRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);

Post-Filter近似查詢

通過(guò)在knn查詢結(jié)構(gòu)內(nèi)部添加過(guò)濾器filter,并指定filter_type參數(shù)的值為post_filter,可實(shí)現(xiàn)先查詢向量數(shù)據(jù),再過(guò)濾結(jié)構(gòu)化數(shù)據(jù)。

說(shuō)明

在使用Post_Filter近似查詢時(shí),可以適當(dāng)將k的值設(shè)置大一些,以便獲取更多的向量數(shù)據(jù)再進(jìn)行過(guò)濾。

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
String jsonString = "{\n" +
  "  \"size\": 10,\n" +
  "  \"query\": {\n" +
  "    \"knn\": {\n" +
  "      \"vector1\": {\n" +
  "        \"vector\": [2.2, 2.3, 2.4],\n" +
  "        \"filter\": {\n" +
  "          \"range\": {\n" +
  "            \"field1\": {\n" +
  "              \"gte\": 0\n" +
  "            }\n" +
  "          }\n" +
  "        },\n" +
  "        \"k\": 1000\n" +
  "      }\n" +
  "    }\n" +
  "  },\n" +
  "  \"ext\": {\n" +
  "    \"lvector\": {\n" +
  "      \"filter_type\": \"post_filter\"\n" +
  "    }\n" +
  "  }\n" +
  "}";
searchRequest.setJsonEntity(jsonString);
Response response = restClient.performRequest(searchRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);

在使用Post_Filter近似查詢時(shí)需要適當(dāng)放大k的值,如果使用ivfpq算法,還需要調(diào)整reorder_factor的值。具體使用如下:

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
String jsonString = "{\n" +
  "  \"size\": 10,\n" +
  "  \"query\": {\n" +
  "    \"knn\": {\n" +
  "      \"vector1\": {\n" +
  "        \"vector\": [2.2, 2.3, 2.4],\n" +
  "        \"filter\": {\n" +
  "          \"range\": {\n" +
  "            \"field1\": {\n" +
  "              \"gte\": 0\n" +
  "            }\n" +
  "          }\n" +
  "        },\n" +
  "        \"k\": 1000\n" +
  "      }\n" +
  "    }\n" +
  "  },\n" +
  "  \"ext\": {\n" +
  "    \"lvector\": {\n" +
  "      \"filter_type\": \"post_filter\",\n" +
  "      \"nprobe\": \"60\",\n" +
  "      \"reorder_factor\": \"1\"\n" +
  "    }\n" +
  "  }\n" +
  "}";
searchRequest.setJsonEntity(jsonString);
Response response = restClient.performRequest(searchRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);
重要
  • 在Post_Filter近似查詢場(chǎng)景中,可以將k值放大至10,000、最大控制在20,000之內(nèi),從而將處理時(shí)延控制在百毫秒之內(nèi)。如果k值相對(duì)較大,將reorder_factor的值設(shè)置為1即可。

  • 當(dāng)nlist的值為10000時(shí),可以先將nprobe設(shè)置為60,查看檢索效果。如果檢索效果不理想,可適當(dāng)增加nprobe的值,如80、100、120、140、160,該值引起的性能損耗遠(yuǎn)小于reorder_factor,但也不宜設(shè)置過(guò)大。

您也可以通過(guò)post_filter添加過(guò)濾條件,實(shí)現(xiàn)Post-Filter近似查詢。

// knn查詢
Request searchRequest = new Request("GET", "/" + indexName + "/_search");
String jsonString ="{\n" +
  "  \"size\": 10,\n" +
  "  \"query\": {\n" +
  "    \"knn\": {\n" +
  "      \"vector1\": {\n" +
  "        \"vector\": [2.2, 2.3, 2.4],\n" +
  "        \"k\": 10\n" +
  "      }\n" +
  "    }\n" +
  "  },\n" +
  "  \"post_filter\": {\n" +
  "    \"range\": {\n" +
  "      \"field1\": {\n" +
  "        \"gte\": 0\n" +
  "      }\n" +
  "    }\n" +
  "  }\n" +
  "}";
searchRequest.setJsonEntity(jsonString);
Response response = restClient.performRequest(searchRequest);
String responseBody = EntityUtils.toString(response.getEntity());
System.out.println("search responseBody = " + responseBody);

常規(guī)用法

  • 查詢所有索引及其數(shù)據(jù)量。

    Request request = new Request("GET", "/_cat/indices?v");
    Response response = restClient.performRequest(request);
    String responseBody = EntityUtils.toString(response.getEntity());
    System.out.println(responseBody);

    返回結(jié)果如下:

    health status index        uuid        pri rep docs.count docs.deleted store.size pri.store.size
    green  open   vector_test  vector_test 2   0          2            0      6.8kb          6.8kb
  • 查詢指定索引的數(shù)據(jù)量。

    Request request = new Request("GET", "/" + indexName + "/_count");
    Response response = restClient.performRequest(request);
    String responseBody = EntityUtils.toString(response.getEntity());
    System.out.println(responseBody);

    返回結(jié)果如下:

    {
      "count" : 2,
      "_shards" : {
        "total" : 2,
        "successful" : 2,
        "skipped" : 0,
        "failed" : 0
      }
    }
  • 查看索引創(chuàng)建信息。

    Request request = new Request("GET", "/" + indexName);
    Response response = restClient.performRequest(request);
    String responseBody = EntityUtils.toString(response.getEntity());
    System.out.println(responseBody);

    返回結(jié)果如下:

    單擊展開(kāi)返回結(jié)果

    {
      "vector_test" : {
        "aliases" : { },
        "mappings" : {
          "_source" : {
            "excludes" : [
              "vector1"
            ]
          },
          "properties" : {
            "field1" : {
              "type" : "long"
            },
            "vector1" : {
              "type" : "knn_vector",
              "dimension" : 3,
              "data_type" : "float",
              "method" : {
                "engine" : "lvector",
                "space_type" : "l2",
                "name" : "hnsw",
                "parameters" : {
                  "ef_construction" : 200,
                  "m" : 24
                }
              }
            }
          }
        },
        "settings" : {
          "index" : {
            "search" : {
              "slowlog" : {
                "level" : "DEBUG",
                "threshold" : {
                  "fetch" : {
                    "warn" : "1s",
                    "trace" : "200ms",
                    "debug" : "500ms",
                    "info" : "800ms"
                  },
                  "query" : {
                    "warn" : "10s",
                    "trace" : "500ms",
                    "debug" : "1s",
                    "info" : "5s"
                  }
                }
              }
            },
            "indexing" : {
              "slowlog" : {
                "level" : "DEBUG",
                "threshold" : {
                  "index" : {
                    "warn" : "10s",
                    "trace" : "500ms",
                    "debug" : "2s",
                    "info" : "5s"
                  }
                }
              }
            },
            "number_of_shards" : "2",
            "provided_name" : "vector_test",
            "knn" : "true",
            "creation_date" : "1727169417350",
            "number_of_replicas" : "0",
            "uuid" : "vector_test",
            "version" : {
              "created" : "136287927"
            }
          }
        }
      }
    }
  • 刪除整個(gè)索引。

    Request deleteIndexRequest = new Request("DELETE", "/" + indexName);
    Response response = restClient.performRequest(deleteIndexRequest);
    String responseBody = EntityUtils.toString(response.getEntity());
    System.out.println("delIndex responseBody = " + responseBody);
  • 通過(guò)查詢刪除。

    request = new Request("POST", "/" + indexName + "/_delete_by_query");
    jsonString = "{\n" +
      "    \"query\": {\n" +
      "      \"term\": {\n" +
      "        \"field1\": 1\n" +
      "      }\n" +
      "    }\n" +
      "}";
    request.setJsonEntity(jsonString);
    response = restClient.performRequest(searchRequest);
    responseBody = EntityUtils.toString(response.getEntity());
    System.out.println("deleteByQuery responseBody = " + responseBody);