文檔抽取
文檔介紹了文檔抽取API的調(diào)用方式,調(diào)用前,請先閱讀API使用指南。
內(nèi)容簡介
文檔抽取接口可以對各種類型文檔和表格中的關(guān)鍵信息進(jìn)行自動化抽取,返回通用KV結(jié)構(gòu)化內(nèi)容。
文檔抽取接口為異步接口,需要先調(diào)用文檔抽取異步提交服務(wù)SubmitDocumentExtractJob接口進(jìn)行異步任務(wù)提交,然后調(diào)用文檔抽取結(jié)果查詢服務(wù)GetDocumentExtractResult接口進(jìn)行結(jié)果輪詢,建議每10秒輪詢一次,最多輪詢120分鐘,如果120分鐘還未查詢到處理完成結(jié)果,則視為處理超時(shí)。
當(dāng)異步任務(wù)處理提交后,用戶可以在處理結(jié)束后的24小時(shí)之內(nèi)查詢處理結(jié)果,超過24小時(shí)后將無法查詢到處理結(jié)果。
操作步驟
步驟一:調(diào)用文檔抽取異步提交服務(wù)SubmitDocumentExtractJob接口
請求參數(shù)
名稱 | 類型 | 必填 | 描述 | 示例值 |
FileUrl | string | 是 | 單個(gè)文檔的url(支持1000頁以內(nèi)的pdf文件,支持100MB以內(nèi)的pdf文件,支持20MB以內(nèi)的單張圖片)。 如果需要本地上傳文件方式,sdk會提供單獨(dú)入?yún)⒅С治募魃蟼鳌?/p> | https://example.com/example.pdf |
FileName | string | 否 | 文件名,需帶文件類型后綴。與fileNameExtension二選一。 | example.pdf |
FileNameExtension | string | 否 | 文件類型,與fileName二選一。支持類型:pdf、jpg、jpeg、png、bmp、gif。 |
支持的文檔格式:pdf和圖片,圖片支持jpg、jpeg、png、bmp、gif。
返回參數(shù)
名稱 | 類型 | 描述 | 示例值 |
RequestId | string | 請求唯一Id。 | 43A29C77-405E-4CC0-BC55-EE694AD0**** |
Data | object | 返回?cái)?shù)據(jù)。 | {"Id": "docmind-20220712-b15f****"} |
Id | string | 業(yè)務(wù)訂單號,用于后續(xù)查詢接口進(jìn)行查詢的唯一標(biāo)識。 | docmind-20220712-b15f**** |
Code | string | 狀態(tài)碼。 | 200 |
Message | string | 詳細(xì)信息。 | Message |
使用示例
本接口支持本地文檔上傳和傳入文檔URL這兩種調(diào)用方式。
本地文檔上傳:以Java SDK為例,本地文檔上傳調(diào)用方式的請求示例代碼如下,調(diào)用submitDocumentExtractJobAdvance接口,通過fileUrlObject參數(shù)實(shí)現(xiàn)本地文檔上傳。
說明獲取并使用AccessKey信息的方式,可參考SDK概述中不同語言的SDK使用指南。
import com.aliyun.docmind_api20220711.models.*; import com.aliyun.teaopenapi.models.Config; import com.aliyun.docmind_api20220711.Client; import com.aliyun.teautil.models.RuntimeOptions; import java.io.File; import java.io.FileInputStream; public static void submit() throws Exception { // 使用默認(rèn)憑證初始化Credentials Client。 com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client(); Config config = new Config() // 通過credentials獲取配置中的AccessKey ID .setAccessKeyId(credentialClient.getAccessKeyId()) // 通過credentials獲取配置中的AccessKey Secret .setAccessKeySecret(credentialClient.getAccessKeySecret()); // 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com"; Client client = new Client(config); // 創(chuàng)建RuntimeObject實(shí)例并設(shè)置運(yùn)行參數(shù) RuntimeOptions runtime = new RuntimeOptions(); SubmitDocumentExtractJobAdvanceRequest advanceRequest = new SubmitDocumentExtractJobAdvanceRequest(); File file = new File("D:\\example.pdf"); advanceRequest.fileUrlObject = new FileInputStream(file); advanceRequest.fileName = "example.pdf"; // 發(fā)起請求并處理應(yīng)答或異常。 SubmitDocumentExtractJobResponse response = client.submitDocumentExtractJobAdvance(advanceRequest, runtime); }
const Client = require('@alicloud/docmind-api20220711'); const Credential = require('@alicloud/credentials'); const Util = require('@alicloud/tea-util'); const fs = require('fs'); const getResult = async () => { // 使用默認(rèn)憑證初始化Credentials Client const cred = new Credential.default(); const client = new Client.default({ // 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com endpoint: 'docmind-api.cn-hangzhou.aliyuncs.com', // 通過credentials獲取配置中的AccessKey ID accessKeyId: cred.credential.accessKeyId, // 通過credentials獲取配置中的AccessKey Secret accessKeySecret: cred.credential.accessKeySecret, type: 'access_key', regionId: 'cn-hangzhou', }); const advanceRequest = new Client.SubmitDocumentExtractJobAdvanceRequest(); const file = fs.createReadStream('./example.pdf'); advanceRequest.fileUrlObject = file; advanceRequest.fileName = 'example.pdf'; const runtimeObject = new Util.RuntimeOptions({}); const response = await client.submitDocumentExtractJobAdvance(advanceRequest, runtimeObject); return response.body; }
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models from alibabacloud_tea_util.client import Client as UtilClient from alibabacloud_tea_util import models as util_models from alibabacloud_credentials.client import Client as CredClient def submit_file(): cred=CredClient() config = open_api_models.Config( # 通過credentials獲取配置中的AccessKey ID access_key_id=cred.get_access_key_id(), # 通過credentials獲取配置中的AccessKey Secret access_key_secret=cred.get_access_key_secret() ) # 訪問的域名 config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com' client = docmind_api20220711Client(config) request = docmind_api20220711_models.SubmitDocumentExtractJobAdvanceRequest( # file_url_object : 本地文件流 file_url_object=open("./example.pdf", "rb"), # file_name :文件名稱。名稱必須包含文件類型 file_name='123.pdf', # file_name_extension : 文件后綴格式。與文件名二選一 file_name_extension='pdf' ) runtime = util_models.RuntimeOptions() try: # 復(fù)制代碼運(yùn)行請自行打印 API 的返回值 response = client.submit_document_extract_job_advance(request, runtime) # API返回值格式層級為 body -> data -> 具體屬性。可根據(jù)業(yè)務(wù)需要打印相應(yīng)的結(jié)果。如下示例為打印返回的業(yè)務(wù)id格式 # 獲取屬性值均以小寫開頭, print(response.body.data.id) except Exception as error: # 如有需要,請打印 error UtilClient.assert_as_string(error.message)
import ( "fmt" "os" openClient "github.com/alibabacloud-go/darabonba-openapi/v2/client" "github.com/alibabacloud-go/docmind-api-20220711/client" "github.com/alibabacloud-go/tea-utils/v2/service" "github.com/aliyun/credentials-go/credentials" ) func submit(){ // 使用默認(rèn)憑證初始化Credentials Client。 credential, err := credentials.NewCredential(nil) // 通過credentials獲取配置中的AccessKey ID accessKeyId, err := credential.GetAccessKeyId() // 通過credentials獲取配置中的AccessKey Secret accessKeySecret, err := credential.GetAccessKeySecret() // 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com var endpoint string = "docmind-api.cn-hangzhou.aliyuncs.com" config := openClient.Config{AccessKeyId: accessKeyId, AccessKeySecret: accessKeySecret, Endpoint: &endpoint} // 初始化client cli, err := client.NewClient(&config) if err != nil { panic(err) } // 上傳本地文檔調(diào)用接口 filename := "D:\\example.pdf" f, err := os.Open(filename) if err != nil { panic(err) } // 初始化接口request request := client.SubmitDocumentExtractJobAdvanceRequest{ FileName: &filename, FileUrlObject: f, } // 創(chuàng)建RuntimeObject實(shí)例并設(shè)置運(yùn)行參數(shù) options := service.RuntimeOptions{} response, err := cli.SubmitDocumentExtractJobAdvance(&request, &options) if err != nil { panic(err) } // 打印結(jié)果 fmt.Println(response.Body.String()) }
using Newtonsoft.Json; using System; using System.Collections; using System.Collections.Generic; using System.IO; using System.Threading.Tasks; using Tea; using Tea.Utils; public static void SubmitFile() { // 使用默認(rèn)憑證初始化Credentials Client。 var akCredential = new Aliyun.Credentials.Client(null); AlibabaCloud.OpenApiClient.Models.Config config = new AlibabaCloud.OpenApiClient.Models.Config { // 通過credentials獲取配置中的AccessKey Secret AccessKeyId = akCredential.GetAccessKeyId(), // 通過credentials獲取配置中的AccessKey Secret AccessKeySecret = akCredential.GetAccessKeySecret(), }; // 訪問的域名 config.Endpoint = "docmind-api.cn-hangzhou.aliyuncs.com"; //需要安裝額外的依賴庫--> AlibabaCloud.DarabonbaStream AlibabaCloud.SDK.Docmind_api20220711.Client client = new AlibabaCloud.SDK.Docmind_api20220711.Client(config); Stream bodySyream = AlibabaCloud.DarabonbaStream.StreamUtil.ReadFromFilePath("<YOUR-FILE-PATH>"); AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobAdvanceRequest request = new AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobAdvanceRequest { FileUrlObject = bodySyream, FileNameExtension = "pdf" }; AlibabaCloud.TeaUtil.Models.RuntimeOptions runtime = new AlibabaCloud.TeaUtil.Models.RuntimeOptions(); try { // 復(fù)制代碼運(yùn)行請自行打印 API 的返回值 client.SubmitDocumentExtractJobAdvance(request, runtime); } catch (TeaException error) { // 如有需要,請打印 error AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message); } catch (Exception _error) { TeaException error = new TeaException(new Dictionary<string, object> { { "message", _error.Message } }); // 如有需要,請打印 error AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message); } }
傳入文檔URL:以Java SDK為例,傳入文檔URL調(diào)用方式的請求示例代碼如下,調(diào)用submitDocumentExtractJob接口,通過fileUrl參數(shù)實(shí)現(xiàn)傳入文檔URL。請注意,您傳入的文檔URL必須為公網(wǎng)可訪問下載的URL地址,無跨域限制,URL不帶特殊轉(zhuǎn)義字符。
說明獲取并使用AccessKey信息的方式,可參考SDK概述中不同語言的SDK使用指南。
import com.aliyun.docmind_api20220711.models.*; import com.aliyun.teaopenapi.models.Config; import com.aliyun.docmind_api20220711.Client; public static void submit() throws Exception { // 使用默認(rèn)憑證初始化Credentials Client。 com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client(); Config config = new Config() // 通過credentials獲取配置中的AccessKey ID .setAccessKeyId(credentialClient.getAccessKeyId()) // 通過credentials獲取配置中的AccessKey Secret .setAccessKeySecret(credentialClient.getAccessKeySecret()); // 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com"; Client client = new Client(config); // 替換成具體異步任務(wù)提交類API接口的入?yún)⒑头椒? SubmitDocumentExtractJobRequest request = new SubmitDocumentExtractJobRequest(); request.fileName = "example.pdf"; request.fileUrl = "https://example.com/example.pdf"; SubmitDocumentExtractJobResponse response = client.submitDocumentExtractJob(request); }
const Client = require('@alicloud/docmind-api20220711'); const Credential = require('@alicloud/credentials'); const getResult = async () => { // 使用默認(rèn)憑證初始化Credentials Client const cred = new Credential.default(); const client = new Client.default({ // 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com endpoint: 'docmind-api.cn-hangzhou.aliyuncs.com', // 通過credentials獲取配置中的AccessKey ID accessKeyId: cred.credential.accessKeyId, // 通過credentials獲取配置中的AccessKey Secret accessKeySecret: cred.credential.accessKeySecret, type: 'access_key', regionId: 'cn-hangzhou' }); const request = new Client.SubmitDocumentExtractJobRequest(); request.fileName = 'example.pdf'; request.fileUrl = 'https://example.com/example.pdf'; const response = await client.submitDocumentExtractJob(request); return response.body; }
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models from alibabacloud_tea_util.client import Client as UtilClient from alibabacloud_credentials.client import Client as CredClient def submit_url(): cred=CredClient() config = open_api_models.Config( # 通過credentials獲取配置中的AccessKey ID access_key_id=cred.get_access_key_id(), # 通過credentials獲取配置中的AccessKey Secret access_key_secret=cred.get_access_key_secret() ) # 訪問的域名 config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com' client = docmind_api20220711Client(config) request = docmind_api20220711_models.SubmitDocumentExtractJobRequest( # file_url : 文件url地址 file_url='https://example.com/example.pdf', # file_name :文件名稱。名稱必須包含文件類型 file_name='123.pdf', # file_name_extension : 文件后綴格式。與文件名二選一 file_name_extension='pdf' ) try: # 復(fù)制代碼運(yùn)行請自行打印 API 的返回值 response = client.submit_document_extract_job(request) # API返回值格式層級為 body -> data -> 具體屬性。可根據(jù)業(yè)務(wù)需要打印相應(yīng)的結(jié)果。如下示例為打印返回的業(yè)務(wù)id格式 # 獲取屬性值均以小寫開頭, print(response.body.data.id) except Exception as error: # 如有需要,請打印 error UtilClient.assert_as_string(error.message)
import ( "fmt" openClient "github.com/alibabacloud-go/darabonba-openapi/v2/client" "github.com/alibabacloud-go/docmind-api-20220711/client" "github.com/aliyun/credentials-go/credentials" ) func submit(){ // 使用默認(rèn)憑證初始化Credentials Client。 credential, err := credentials.NewCredential(nil) // 通過credentials獲取配置中的AccessKey ID accessKeyId, err := credential.GetAccessKeyId() // 通過credentials獲取配置中的AccessKey Secret accessKeySecret, err := credential.GetAccessKeySecret() // 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com var endpoint string = "docmind-api.cn-hangzhou.aliyuncs.com" config := openClient.Config{AccessKeyId: accessKeyId, AccessKeySecret: accessKeySecret, Endpoint: &endpoint} // 初始化client cli, err := client.NewClient(&config) if err != nil { panic(err) } // 文件URL fileURL := "https://example.com/example.pdf" // 文件名 fileName := "example.pdf" // 初始化接口request request := client.SubmitDocumentExtractJobRequest{ FileUrl: &fileURL, FileName: &fileName, } response, err := cli.SubmitDocumentExtractJob(&request) if err != nil { panic(err) } // 打印結(jié)果 fmt.Println(response.Body.String()) }
using Newtonsoft.Json; using System; using System.Collections; using System.Collections.Generic; using System.IO; using System.Threading.Tasks; using Tea; using Tea.Utils; public static void SubmitUrl() { // 使用默認(rèn)憑證初始化Credentials Client。 var akCredential = new Aliyun.Credentials.Client(null); AlibabaCloud.OpenApiClient.Models.Config config = new AlibabaCloud.OpenApiClient.Models.Config { // 通過credentials獲取配置中的AccessKey Secret AccessKeyId = akCredential.GetAccessKeyId(), // 通過credentials獲取配置中的AccessKey Secret AccessKeySecret = akCredential.GetAccessKeySecret(), }; // 訪問的域名 config.Endpoint = "docmind-api.cn-hangzhou.aliyuncs.com"; AlibabaCloud.SDK.Docmind_api20220711.Client client = new AlibabaCloud.SDK.Docmind_api20220711.Client(config); AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobRequest request = new AlibabaCloud.SDK.Docmind_api20220711.Models.SubmitDocumentExtractJobRequest { FileUrl = "https://example.pdf", FileNameExtension = "pdf" }; try { // 復(fù)制代碼運(yùn)行請自行打印 API 的返回值 client.SubmitDocumentExtractJob(request); } catch (TeaException error) { // 如有需要,請打印 error AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message); } catch (Exception _error) { TeaException error = new TeaException(new Dictionary<string, object> { { "message", _error.Message } }); // 如有需要,請打印 error AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message); } }
use AlibabaCloud\SDK\Docmindapi\V20220711\Docmindapi; use AlibabaCloud\SDK\Docmindapi\V20220711\Models\SubmitDocumentExtractJobRequest; use Darabonba\OpenApi\Models\Config; use AlibabaCloud\Tea\Utils\Utils\RuntimeOptions; use AlibabaCloud\Tea\Exception\TeaUnableRetryError; use AlibabaCloud\Credentials\Credential; // 使用默認(rèn)憑證初始化Credentials Client。 $bearerToken = new Credential(); $config = new Config(); // 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com $config->endpoint = "docmind-api.cn-hangzhou.aliyuncs.com"; // 通過credentials獲取配置中的AccessKey ID $config->accessKeyId = $bearerToken->getCredential()->getAccessKeyId(); // 通過credentials獲取配置中的AccessKey Secret $config->accessKeySecret = $bearerToken->getCredential()->getAccessKeySecret(); $config->type = "access_key"; $config->regionId = "cn-hangzhou"; $client = new Docmindapi($config); $request = new SubmitDocumentExtractJobRequest(); $runtime = new RuntimeOptions(); $runtime->maxIdleConns = 3; $runtime->connectTimeout = 10000; $runtime->readTimeout = 10000; $request->fileName = "example.pdf"; $request->fileUrl = "https://example.com/example.pdf"; try { $response = $client->submitDocumentExtractJob($request, $runtime); var_dump($response->toMap()); } catch (TeaUnableRetryError $e) { var_dump($e->getMessage()); var_dump($e->getErrorInfo()); var_dump($e->getLastException()); var_dump($e->getLastRequest()); }
正常返回示例:JSON
格式。
{
"RequestId": "43A29C77-405E-4CC0-BC55-EE694AD00655",
"Data": {
"Id": "docmind-20220712-b15fe420"
}
}
步驟二:輪詢文檔抽取結(jié)果查詢服務(wù)GetDocumentExtractResult接口
調(diào)用查詢接口的入?yún)D就是前面異步任務(wù)提交接口返回的出參ID,查詢結(jié)果有處理中、處理成功、處理失敗三種情況。建議每10秒輪詢一次,最多輪詢120分鐘。若明確返回Completed為true或者超過輪詢最大時(shí)間,則終止輪詢。
請求參數(shù)
名稱 | 類型 | 必填 | 描述 | 示例值 |
Id | string | 是 | 需要查詢的業(yè)務(wù)訂單號,訂單號從提交接口的返回結(jié)果中獲取。 | docmind-20220712-b15f**** |
返回參數(shù)
名稱 | 類型 | 描述 | 示例值 |
RequestId | string | 請求唯一Id。 | 43A29C77-405E-4CC0-BC55-EE694AD0**** |
Completed | boolean | 異步任務(wù)是否處理完成,false表示任務(wù)仍在處理中,true代表任務(wù)處理完成,有處理成功或處理失敗的明確結(jié)果。 | true |
Status | string | 異步任務(wù)處理完成的狀態(tài),最終處理結(jié)束后的狀態(tài)。Success為處理成功,F(xiàn)ail為處理失敗。 | Success |
Data | string | 返回?cái)?shù)據(jù),通用KV結(jié)構(gòu)化內(nèi)容的JSON數(shù)據(jù)結(jié)構(gòu)返回。 | - |
Code | string | 狀態(tài)碼。 | 200 |
Message | string | 詳細(xì)信息。 | Message |
使用示例
以Java SDK為例,調(diào)用文檔抽取接口的結(jié)果查詢類API示例代碼如下,調(diào)用getDocumentExtractResult接口,通過Id參數(shù)傳入查詢流水號。
獲取并使用AccessKey信息的方式,可參考SDK概述中不同語言的SDK使用指南。
import com.aliyun.docmind_api20220711.models.*;
import com.aliyun.teaopenapi.models.Config;
import com.aliyun.docmind_api20220711.Client;
public static void submit() throws Exception {
// 使用默認(rèn)憑證初始化Credentials Client。
com.aliyun.credentials.Client credentialClient = new com.aliyun.credentials.Client();
Config config = new Config()
// 通過credentials獲取配置中的AccessKey ID
.setAccessKeyId(credentialClient.getAccessKeyId())
// 通過credentials獲取配置中的AccessKey Secret
.setAccessKeySecret(credentialClient.getAccessKeySecret());
// 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
config.endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
Client client = new Client(config);
GetDocumentExtractResultRequest resultRequest = new GetDocumentExtractResultRequest();
resultRequest.id = "docmind-20220902-824b****";
GetDocumentExtractResultResponse response = client.getDocumentExtractResult(resultRequest);
}
const Client = require('@alicloud/docmind-api20220711');
const Credential = require('@alicloud/credentials');
const getResult = async () => {
// 使用默認(rèn)憑證初始化Credentials Client
const cred = new Credential.default();
const client = new Client.default({
// 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
endpoint: 'docmind-api.cn-hangzhou.aliyuncs.com',
// 通過credentials獲取配置中的AccessKey ID
accessKeyId: cred.credential.accessKeyId,
// 通過credentials獲取配置中的AccessKey Secret
accessKeySecret: cred.credential.accessKeySecret,
type: 'access_key',
regionId: 'cn-hangzhou'
});
const resultRequest = new Client.GetDocumentExtractResultRequest();
resultRequest.id = "docmind-20220902-824b****";
const response = await client.getDocumentExtractResult(resultRequest);
return response.body;
}
from alibabacloud_docmind_api20220711.client import Client as docmind_api20220711Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_docmind_api20220711 import models as docmind_api20220711_models
from alibabacloud_tea_util.client import Client as UtilClient
from alibabacloud_credentials.client import Client as CredClient
def query():
cred=CredClient()
config = open_api_models.Config(
# 通過credentials獲取配置中的AccessKey ID
access_key_id=cred.get_access_key_id(),
# 通過credentials獲取配置中的AccessKey Secret
access_key_secret=cred.get_access_key_secret()
)
# 訪問的域名
config.endpoint = f'docmind-api.cn-hangzhou.aliyuncs.com'
client = docmind_api20220711Client(config)
request = docmind_api20220711_models.GetDocumentExtractResultRequest(
# id : 任務(wù)提交接口返回的id
id='docmind-20220902-824b****'
)
try:
# 復(fù)制代碼運(yùn)行請自行打印 API 的返回值
response = client.get_document_extract_result(request)
# API返回值格式層級為 body -> data -> 具體屬性。可根據(jù)業(yè)務(wù)需要打印相應(yīng)的結(jié)果。獲取屬性值均以小寫開頭
# 獲取異步任務(wù)處理情況,可根據(jù)response.body.completed判斷是否需要繼續(xù)輪詢結(jié)果
print(response.body.completed)
# 獲取返回結(jié)果。建議先把response.body.data轉(zhuǎn)成json,然后再從json里面取具體需要的值。
print(response.body.data)
except Exception as error:
# 如有需要,請打印 error
UtilClient.assert_as_string(error.message)
import (
"fmt"
openClient "github.com/alibabacloud-go/darabonba-openapi/v2/client"
"github.com/alibabacloud-go/docmind-api-20220711/client"
"github.com/aliyun/credentials-go/credentials"
)
func submit(){
// 使用默認(rèn)憑證初始化Credentials Client。
credential, err := credentials.NewCredential(nil)
// 通過credentials獲取配置中的AccessKey ID
accessKeyId, err := credential.GetAccessKeyId()
// 通過credentials獲取配置中的AccessKey Secret
accessKeySecret, err := credential.GetAccessKeySecret()
// 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
var endpoint string = "docmind-api.cn-hangzhou.aliyuncs.com"
config := openClient.Config{AccessKeyId: accessKeyId, AccessKeySecret: accessKeySecret, Endpoint: &endpoint}
// 初始化client
cli, err := client.NewClient(&config)
if err != nil {
panic(err)
}
id := "docmind-20220925-76b1****"
// 調(diào)用查詢接口
request := client.GetDocumentExtractResultRequest{Id: &id}
response, err := cli.GetDocumentExtractResult(&request)
if err != nil {
panic(err)
}
// 打印查詢結(jié)果
fmt.Println(response.Body.String())
}
using Newtonsoft.Json;
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Threading.Tasks;
using Tea;
using Tea.Utils;
public static void GetResult()
{
// 使用默認(rèn)憑證初始化Credentials Client。
var akCredential = new Aliyun.Credentials.Client(null);
AlibabaCloud.OpenApiClient.Models.Config config = new AlibabaCloud.OpenApiClient.Models.Config
{
// 通過credentials獲取配置中的AccessKey Secret
AccessKeyId = akCredential.GetAccessKeyId(),
// 通過credentials獲取配置中的AccessKey Secret
AccessKeySecret = akCredential.GetAccessKeySecret(),
};
// 訪問的域名
config.Endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
AlibabaCloud.SDK.Docmind_api20220711.Client client = new AlibabaCloud.SDK.Docmind_api20220711.Client(config);
AlibabaCloud.SDK.Docmind_api20220711.Models.GetDocumentExtractResultRequest request = new AlibabaCloud.SDK.Docmind_api20220711.Models.GetDocumentExtractResultRequest
{
Id = "docmind-20220902-824b****"
};
AlibabaCloud.TeaUtil.Models.RuntimeOptions runtime = new AlibabaCloud.TeaUtil.Models.RuntimeOptions();
try
{
// 復(fù)制代碼運(yùn)行請自行打印 API 的返回值
client.GetDocumentExtractResult(request);
}
catch (TeaException error)
{
// 如有需要,請打印 error
AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
}
catch (Exception _error)
{
TeaException error = new TeaException(new Dictionary<string, object>
{
{ "message", _error.Message }
});
// 如有需要,請打印 error
AlibabaCloud.TeaUtil.Common.AssertAsString(error.Message);
}
}
use AlibabaCloud\SDK\Docmindapi\V20220711\Docmindapi;
use AlibabaCloud\SDK\Docmindapi\V20220711\Models\GetDocumentExtractResultRequest;
use Darabonba\OpenApi\Models\Config;
use AlibabaCloud\Tea\Utils\Utils\RuntimeOptions;
use AlibabaCloud\Tea\Exception\TeaUnableRetryError;
use AlibabaCloud\Credentials\Credential;
// 使用默認(rèn)憑證初始化Credentials Client。
$bearerToken = new Credential();
$config = new Config();
// 訪問的域名,支持ipv4和ipv6兩種方式,ipv6請使用docmind-api-dualstack.cn-hangzhou.aliyuncs.com
$config->endpoint = "docmind-api.cn-hangzhou.aliyuncs.com";
// 通過credentials獲取配置中的AccessKey ID
$config->accessKeyId = $bearerToken->getCredential()->getAccessKeyId();
// 通過credentials獲取配置中的AccessKey Secret
$config->accessKeySecret = $bearerToken->getCredential()->getAccessKeySecret();
$config->type = "access_key";
$config->regionId = "cn-hangzhou";
$client = new Docmindapi($config);
$request = new GetDocumentExtractResultRequest();
$request->id = "docmind-20220902-824b****";
$runtime = new RuntimeOptions();
$runtime->maxIdleConns = 3;
$runtime->connectTimeout = 10000;
$runtime->readTimeout = 10000;
try {
$response = $client->getDocumentExtractResult($request, $runtime);
var_dump($response->toMap());
} catch (TeaUnableRetryError $e) {
var_dump($e->getMessage());
var_dump($e->getErrorInfo());
var_dump($e->getLastException());
var_dump($e->getLastRequest());
}
查詢結(jié)果
查詢結(jié)果有處理中、處理成功、處理失敗三種情況,分別說明每種情況的返回結(jié)果示例。
處理中的返回結(jié)果如下所示:
{ "RequestId": "2AABD2C2-D24F-12F7-875D-683A27C3****", "Completed": false, "Code": "DocProcessing", "Message": "Document processing", "HostId": "ocr-api.cn-hangzhou.aliyuncs.com", "Recommend": "https://next.api.aliyun.com/troubleshoot?q=DocProcessing&product=docmind-api" }
處理中Completed會返回false,表示任務(wù)沒有處理結(jié)束,仍在處理中。這種情況需要繼續(xù)輪詢,直到明確返回Completed為true或者超過輪詢最大時(shí)間。
處理失敗的返回結(jié)果如下所示:
{ "RequestId": "A8EF3A36-1380-1116-A39E-B377BE27****", "Completed": true, "Status": "Fail", "Code": "UrlNotLegal", "Message": "Failed to process the document. The document url you provided is not legal.", "HostId": "docmind-api.cn-hangzhou.aliyuncs.com", "Recommend": "https://next.api.aliyun.com/troubleshoot?q=IDP.UrlNotLegal&product=docmind-api" }
處理失敗Completed會返回true,表示任務(wù)處理結(jié)束,Status返回值為Fail,表示處理成功失敗,同時(shí)會返回失敗Code和詳細(xì)原因Message。訪問錯(cuò)誤碼可以查看錯(cuò)誤碼詳細(xì)介紹。
處理成功的返回結(jié)果如下所示:
{ "Status": "Success", "RequestId": "73134E1A-E281-1B2C-A105-D0ECFE2D****", "Completed": true, "Data": { "status": "success", "errorCode": null, "errorMessage": null, "result": { "kvListInfo": [ [ [{ "value": [ "019W" ], "key": [ "Voyage" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "Ningbo" ], "key": [ "POL" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "2022-05-3110:00" ], "key": [ "ETD" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "Piraeus" ], "key": [ "POD" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "2022-06-2007:00" ], "key": [ "ETA" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } } ], [{ "value": [ "" ], "key": [ "Voyage" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "Piraeus" ], "key": [ "POL" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "" ], "key": [ "ETD" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "Algeciras" ], "key": [ "POD" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } }, { "value": [ "" ], "key": [ "ETA" ], "extInfo": { "table_id": "adf1d2f40b208d4923764d2ea6175365" } } ] ] ], "kvInfo": [{ "value": [ "Ningbo" ], "key": [ "接貨地" ], "extInfo": { "valueLayoutId": "7248c73597b46266b9c84505f2bab8fe", "valueConfidence": 0.9994202852249146, "keyConfidence": 0.9719930092493693, "keyLayoutId": "7248c73597b46266b9c84505f2bab8fe" } }], "pageInfo": [{ "imageWidth": 1917, "imageUrl": "http://docmind-api-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/idp/ab4fe775d9dd423182f30db57c62d379/example1.jpg?Expires=1661221931&OSSAccessKeyId=XX&Signature=YY", "angle": 0.0, "pageIdCurDoc": 1, "imageType": "JPEG", "imageHeight": 2713, "pageIdAllDocs": 1 }, { "imageWidth": 1917, "imageUrl": "http://docmind-api-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/idp/ab4fe775d9dd423182f30db57c62d379/example2.jpg?Expires=1661221931&OSSAccessKeyId=XX&Signature=YY", "angle": 0.0, "pageIdCurDoc": 2, "imageType": "JPEG", "imageHeight": 2713, "pageIdAllDocs": 2 }, { "imageWidth": 1917, "imageUrl": "http://docmind-api-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/idp/ab4fe775d9dd423182f30db57c62d379/example3.jpg?Expires=1661221931&OSSAccessKeyId=XX&Signature=YY", "angle": 0.0, "pageIdCurDoc": 3, "imageType": "JPEG", "imageHeight": 2713, "pageIdAllDocs": 3 } ] } } }
處理成功Completed會返回true,表示任務(wù)處理結(jié)束,Status返回值為Success,表示處理成功。具體的處理結(jié)果在Data節(jié)點(diǎn)中,如下所示為Data節(jié)點(diǎn)的具體格式:
名稱
類型
示例值
描述
status
String
init
狀態(tài)值,包括:
init(初始化),processing(處理中),success(成功)
result
JSONObject
-
kv抽取結(jié)果
kvListInfo
array嵌入array
需要注意,可能有多個(gè)表格。
kv列表的信息,一般出現(xiàn)于表格kv,請?zhí)貏e注意,kvListInfo本身是個(gè)array嵌套array的形式,因?yàn)榭赡苌婕暗蕉鄠€(gè)表格的表格抽取結(jié)果,所以每個(gè)表格都是一組kvInfo的集合
kvInfo
array
-
段落kv信息
valuePos
array
-
value的坐標(biāo),可能有多個(gè)
width
int
863
寬
x
int
410
x坐標(biāo)
y
int
837
y坐標(biāo)
pageId
int
0
頁碼
height
int
45
高
existCorrection
boolean
false
是否存在糾錯(cuò)
existTranscoding
boolean
true
是否存在轉(zhuǎn)碼
originalValue
array
某公司
處理前的原始值
keyPos
array
-
key的坐標(biāo)
width
int
863
寬
x
int
410
x坐標(biāo)
y
int
837
y坐標(biāo)
pageId
int
0
頁碼
height
int
45
高
keyDesc
array
甲方名稱
key的描述
value
array
某公司
最終處理后的抽取值
key
array
firstPartyName
key的英文code
extInfo
object
-
擴(kuò)展信息
valueConfidence
double
0.9994202852249146
value的置信度
keyConfidence
double
0.9719930092493693
key的置信度
extractFrom
String
-
抽取來源,默認(rèn)是nlp
pageInfo
array
-
文檔頁面列表
imageType
string
JPEG
頁面轉(zhuǎn)換后的類型
imageUrl
string
-
頁面轉(zhuǎn)圖片后的圖片url
angle
float
90
頁面轉(zhuǎn)圖片后的圖片的旋轉(zhuǎn)角度,為逆時(shí)針旋轉(zhuǎn)角度
imageWidth
int
1917
頁面轉(zhuǎn)圖的寬
imageHeight
int
1917
頁面轉(zhuǎn)圖的高
pageIdCurDoc
int
0
頁面在當(dāng)前文檔的頁索引,從0開始
pageIdAllDocs
int
0
頁面在所有文檔的頁索引,從0開始