日本熟妇hd丰满老熟妇,中文字幕一区二区三区在线不卡 ,亚洲成片在线观看,免费女同在线一区二区

Call Caching 用法詳解

更新時(shí)間:

背景

  • Cromwell 的 Call Caching 功能如何開啟和關(guān)閉?

  • 在一些場(chǎng)景下,提交工作流時(shí)不想使用 Call Caching,需要無條件執(zhí)行,該如何設(shè)置?

  • 工作流重新提交后,有一些 task 預(yù)期不需要重新執(zhí)行,但依然執(zhí)行了,Call Caching 疑似沒有生效,怎么查看原因?

本篇文檔將對(duì) Call Caching 的使用做一個(gè)詳細(xì)的介紹,包括功能的開啟和關(guān)閉、如何通過查看元數(shù)據(jù)的方式,確認(rèn) Call Caching 未生效的原因等。

Call Caching 設(shè)置

配置文件中設(shè)置全局 Call Caching 開關(guān)狀態(tài)

如果要使用 Cromwell 的 Call Caching 功能,需要在 Server 的配置文件中設(shè)置:

call-caching {
  # Allows re-use of existing results for jobs you have already run
  # (default: false)
  enabled = true
  # Whether to invalidate a cache result forever if we cannot reuse them. Disable this if you expect some cache copies
  # to fail for external reasons which should not invalidate the cache (e.g. auth differences between users):
  # (default: true)
  invalidate-bad-cache-results = true
}

call-caching.enabled 是 Call Caching 功能的開關(guān),可以按照自己的需求開啟和關(guān)閉。

設(shè)置單個(gè) Workflow 的 Call Caching

在 Call Caching 功能全局開啟的狀態(tài)下,提交工作流時(shí),可以通過攜帶如下兩個(gè) option 選項(xiàng)設(shè)置本次執(zhí)行是否使用 Call Caching:

{
    "write_to_cache": true,
    "read_from_cache": true
}
  • write_to_cache: 表示本次 workflow 執(zhí)行結(jié)果是否寫入 Cache,實(shí)際上就是是否給后面的工作流復(fù)用。默認(rèn)是 true

  • read_from_cache: 表示本次 workflow 執(zhí)行是否從 Cache 中讀取之前的結(jié)果,也就是是否復(fù)用以前的結(jié)果,默認(rèn)是 true,如果設(shè)置為 false,表示本次執(zhí)行不使用 Call Caching,強(qiáng)制執(zhí)行。

查看元數(shù)據(jù)

工作流執(zhí)行時(shí),每一個(gè) task 的每一個(gè) call(對(duì)應(yīng)批量計(jì)算的一個(gè)作業(yè))都會(huì)有 metadata,記錄了這個(gè)步驟的運(yùn)行過程,當(dāng)然也包括 Call Caching 的詳細(xì)信息,通過下面的命令可以查詢一個(gè)工作流的 metadata:

widdler query -m [WorkflowId]

在元數(shù)據(jù)信息中找到對(duì)應(yīng)的 task 的詳細(xì)信息,比如:

{
    "callRoot": "oss://gene-test/cromwell_test/GATK4_VariantDiscovery_pipeline_hg38/53cfd3fc-e9d5-4431-83ec-be6c51ab9365/call-HaplotypeCaller/shard-10",
    "inputs": {
        "gatk_path": "/gatk/gatk",
        "ref_fasta": "oss://genomics-public-data-shanghai/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta",
        "cluster_config": "OnDemand ecs.sn2ne.xlarge img-ubuntu-vpc",
        "input_bam_index": "oss://gene-test/cromwell_test/GATK4_VariantDiscovery_pipeline_hg38/cf55a2d1-572c-4490-8edf-07656802a79b/call-GatherBamFiles/NA12878.hg38.ready.bam.bai",
        "output_filename": "NA12878.hg38.vcf.gz",
        "contamination": null,
        "ref_fasta_index": "oss://genomics-public-data-shanghai/broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.fai",
        "ref_dict": "oss://genomics-public-data-shanghai/broad-references/hg38/v0/Homo_sapiens_assembly38.dict",
        "interval_list": "/home/data/GATK_human_genome_resource_bundle/hg38/hg38_wgs_scattered_calling_intervals/temp_0047_of_50/scattered.interval_list",
        "input_bam": "oss://gene-test/cromwell_test/GATK4_VariantDiscovery_pipeline_hg38/cf55a2d1-572c-4490-8edf-07656802a79b/call-GatherBamFiles/NA12878.hg38.ready.bam.bam",
        "docker_image": "registry.cn-shanghai.aliyuncs.com/wgs_poc/poc:4.0.10.1"
    },
    "returnCode": 0,
    "callCaching": {
        "allowResultReuse": true,
        "hashes": {
            "output expression": {
                "File output_vcf_index": "A162250CB6F52CC32CB75F5C5793E8BB",
                "File output_vcf": "7FD061EEA1D3C63912D7B5FB1F3C5218"
            },
            "runtime attribute": {
                "userData": "N/A",
                "docker": "F323AFFA030FBB5B352C60BD7D615255",
                "failOnStderr": "68934A3E9455FA72420237EB05902327",
                "imageId": "N/A",
                "continueOnReturnCode": "CFCD208495D565EF66E7DFF9F98764DA"
            },
            "output count": "C81E728D9D4C2F636F067F89CC14862C",
            "input count": "D3D9446802A44259755D38E6D163E820",
            "command template": "9104DF40289AB292A52C2A753FBF58D2",
            "input": {
                "File interval_list": "04dc2cb895d13a40657d5e2aa7d31e8c",
                "String output_filename": "2B77B986117FC94D088273AD4D592964",
                "File ref_fasta": "9A513FB0533F04ED87AE9CB6281DC19B-400",
                "File input_bam_index": "D7CA83047E1B6B8269DF095F637621FE-1",
                "String gatk_path": "EB83BBB666B0660B076106408FFC0A9B",
                "String docker_image": "0981A914F6271269D58AA49FD18A6C13",
                "String cluster_config": "B4563EC1789E5EB82B3076D362E6D88F",
                "File ref_dict": "3884C62EB0E53FA92459ED9BFF133AE6",
                "File input_bam": "9C0AC9A52F5640AA06A0EBCE6A97DF51-301",
                "File ref_fasta_index": "F76371B113734A56CDE236BC0372DE0A"
            },
            "backend name": "AE9178757DD2A29CF80C1F5B9F34882E"
        },
        "effectiveCallCachingMode": "ReadAndWriteCache",
        "hit": false,
        "result": "Cache Miss"
    },
    "stderr": "oss://gene-test/cromwell_test/GATK4_VariantDiscovery_pipeline_hg38/53cfd3fc-e9d5-4431-83ec-be6c51ab9365/call-HaplotypeCaller/shard-10/stderr",
    "shardIndex": 10,
    "stdout": "oss://gene-test/cromwell_test/GATK4_VariantDiscovery_pipeline_hg38/53cfd3fc-e9d5-4431-83ec-be6c51ab9365/call-HaplotypeCaller/shard-10/stdout",
    "outputs": {
        "output_vcf": "oss://gene-test/cromwell_test/GATK4_VariantDiscovery_pipeline_hg38/53cfd3fc-e9d5-4431-83ec-be6c51ab9365/call-HaplotypeCaller/shard-10/NA12878.hg38.vcf.gz",
        "output_vcf_index": "oss://gene-test/cromwell_test/GATK4_VariantDiscovery_pipeline_hg38/53cfd3fc-e9d5-4431-83ec-be6c51ab9365/call-HaplotypeCaller/shard-10/NA12878.hg38.vcf.gz.tbi"
    },
    "commandLine": "set -e\n\n  /gatk/gatk --java-options \"-Xmx4g -Xmx4g\" \\\n    HaplotypeCaller \\\n    -R /cromwell_inputs/73a7571e/Homo_sapiens_assembly38.fasta \\\n    -I /cromwell_inputs/02f1b5ca/NA12878.hg38.ready.bam.bam \\\n    -L /home/data/GATK_human_genome_resource_bundle/hg38/hg38_wgs_scattered_calling_intervals/temp_0047_of_50/scattered.interval_list \\\n    -O NA12878.hg38.vcf.gz \\\n    -contamination 0",
    "attempt": 1,
    "jobId": "job-000000005DB051A800006F970001CAC8",
    "start": "2019-10-25T02:38:03.522Z",
    "backendStatus": "Finished",
    "runtimeAttributes": {
        "cluster": "Right(AutoClusterConfiguration(OnDemand,ecs.sn2ne.xlarge,img-ubuntu-vpc,None,None,None))",
        "continueOnReturnCode": "0",
        "failOnStderr": "false",
        "vpc": "BcsVpcConfiguration(Some(10.20.200.0/24),Some(vpc-uf61zj30k0ebuen0xi7ci))",
        "mounts": "BcsInputMount(Right(nas://10.20.66.4:/data/ali_yun_test/),Left(/home/data),true)",
        "docker": "BcsDockerWithoutPath(registry.cn-shanghai.aliyuncs.com/wgs_poc/poc:4.0.10.1)",
        "autoReleaseJob": "false",
        "maxRetries": "0"
    },
    "executionStatus": "Done",
    "end": "2019-10-25T03:22:23.481Z",
    "executionEvents": [
        {
            "endTime": "2019-10-25T03:22:21.626Z",
            "description": "RunningJob",
            "startTime": "2019-10-25T02:38:03.645Z"
        },
        {
            "endTime": "2019-10-25T03:22:22.481Z",
            "description": "UpdatingCallCache",
            "startTime": "2019-10-25T03:22:21.626Z"
        },
        {
            "endTime": "2019-10-25T02:38:03.645Z",
            "description": "CallCacheReading",
            "startTime": "2019-10-25T02:38:03.643Z"
        },
        {
            "endTime": "2019-10-25T02:38:03.522Z",
            "description": "Pending",
            "startTime": "2019-10-25T02:38:03.522Z"
        },
        {
            "endTime": "2019-10-25T02:38:03.542Z",
            "description": "WaitingForValueStore",
            "startTime": "2019-10-25T02:38:03.542Z"
        },
        {
            "endTime": "2019-10-25T03:22:23.481Z",
            "description": "UpdatingJobStore",
            "startTime": "2019-10-25T03:22:22.481Z"
        },
        {
            "endTime": "2019-10-25T02:38:03.643Z",
            "description": "PreparingJob",
            "startTime": "2019-10-25T02:38:03.542Z"
        },
        {
            "endTime": "2019-10-25T02:38:03.542Z",
            "description": "RequestingExecutionToken",
            "startTime": "2019-10-25T02:38:03.522Z"
        }
    ],
    "backend": "BCS"
}

在上面的元數(shù)據(jù)中,有一項(xiàng) callCaching,主要記錄了如下信息:

  • allowResultReuse:是否允許其他工作流復(fù)用。

    • 如果當(dāng)前工作流設(shè)置了不允許寫入 Cache,則不可以復(fù)用

    • 如果當(dāng)前工作流設(shè)置了允許寫入 Cache,則只有任務(wù)執(zhí)行成功,才允許復(fù)用

  • hashes:當(dāng)前任務(wù)的輸入、輸出、運(yùn)行時(shí)等參數(shù)的 hash 記錄,用于比對(duì)兩次運(yùn)行條件是否一樣。

  • effectiveCallCachingMode:Call Caching 的模式,比如是否從 Cache 中讀取,或者是否寫入 Cache 等。

  • hit:當(dāng)前任務(wù)在 Cache 是否命中。

  • result:當(dāng)前任務(wù)在 Cache 中命中的詳情,比如哪個(gè)工作流的哪個(gè) task 的哪個(gè) shard。

綜合上面的解釋,我們看到實(shí)例中的這個(gè) call, 是 GATK4_VariantDiscovery_pipeline_hg38 這個(gè)工作流的 HaplotypeCaller 這個(gè) task 的10號(hào) shard,Call Cache 情況如下:

  • 未在 Cache 中命中,完整的執(zhí)行了一次

  • 執(zhí)行成功,可以允許后的流程復(fù)用

Call Caching 未生效問題排查

如果遇到不符合預(yù)期的 task,可以通過如下步驟排查原因:

  • 查看當(dāng)前 workflow 重新執(zhí)行的 task 的 Call Caching 元數(shù)據(jù)

    • 如果當(dāng)前 task 的 Call Caching 的模式是不使用Cache(可能是提交作業(yè)時(shí)設(shè)置了不使用 Call Caching 的選項(xiàng)),則不會(huì)去利用之前的結(jié)果,確實(shí)會(huì)強(qiáng)制重新執(zhí)行,是符合預(yù)期的

    • 如果當(dāng)前 task 未命中 Cache,則需要查看之前的 workflow, 進(jìn)一步確認(rèn)未命中的原因

  • 查看之前的 workflow 的 task 的 CalCaching 元數(shù)據(jù),確認(rèn)之前的 task 是否執(zhí)行成功,是否可以復(fù)用

    • 如果之前的 task 的不允許復(fù)用,可能是執(zhí)行失敗了,或者雖然執(zhí)行成功,但 Cache 模式設(shè)置的不寫入 Cache,即不允許復(fù)用

    • 如果之前的 task 允許復(fù)用,但未命中,則需要比較兩次的 hash 記錄,可能是由于 Call Caching 相關(guān)的參數(shù)變化引起的