{"id":378,"date":"2024-12-31T19:32:51","date_gmt":"2024-12-31T11:32:51","guid":{"rendered":"https:\/\/www.kz-hub.tech\/?p=378"},"modified":"2025-07-04T15:05:51","modified_gmt":"2025-07-04T07:05:51","slug":"%e4%bd%bf%e7%94%a8annovar%e8%bf%9b%e8%a1%8cvcf%e6%b3%a8%e9%87%8a%e5%b9%b6%e7%bb%98%e5%88%b6%e7%80%91%e5%b8%83%e5%9b%be","status":"publish","type":"post","link":"https:\/\/www.kz-hub.tech\/index.php\/2024\/12\/31\/%e4%bd%bf%e7%94%a8annovar%e8%bf%9b%e8%a1%8cvcf%e6%b3%a8%e9%87%8a%e5%b9%b6%e7%bb%98%e5%88%b6%e7%80%91%e5%b8%83%e5%9b%be\/","title":{"rendered":"\u4f7f\u7528Annovar\u8fdb\u884cvcf\u6ce8\u91ca\u5e76\u7ed8\u5236\u7011\u5e03\u56fe"},"content":{"rendered":"<h2>1. \u4e0b\u8f7dAnnovar\u8f6f\u4ef6\u672c\u4f53<\/h2>\n<p><a href=\"https:\/\/annovar.openbioinformatics.org\/en\/latest\/\">https:\/\/annovar.openbioinformatics.org\/en\/latest\/<\/a><\/p>\n<h2>2. \u4e0b\u5728Annovar\u6240\u9700\u7684\u6ce8\u91ca\u6587\u4ef6<\/h2>\n<pre><code class=\"language-bash\">annotate_variation.pl -buildver hg38 -downdb -webfrom annovar avdblist humandb_hg38\nannotate_variation.pl -buildver hg38 -downdb -webfrom annovar refGene humandb_hg38\nannotate_variation.pl -buildver hg38 -downdb cytoBand humandb_hg38\nannotate_variation.pl -buildver hg38 -downdb -webfrom annovar avsnp151 humandb_hg38\nannotate_variation.pl -buildver hg38 -downdb -webfrom annovar genomad41_exome humandb_hg38<\/code><\/pre>\n<pre><code># https:\/\/annovar.openbioinformatics.org\/en\/latest\/user-guide\/filter\/#cosmic-annotations\n# Prepare COSMIC database (COSMIC &gt; 100)\n\ntar xvf Cosmic_GenomeScreensMutant_Vcf_v100_GRCh38.tar\ntar xvf Cosmic_GenomeScreensMutant_Tsv_v100_GRCh38.tar\ngunzip Cosmic_GenomeScreensMutant_v100_GRCh38.vcf.gz\ngunzip Cosmic_GenomeScreensMutant_v100_GRCh38.tsv.gz\ntar xvf Cosmic_NonCodingVariants_Tsv_v100_GRCh38.tar\ntar xvf Cosmic_NonCodingVariants_Vcf_v100_GRCh38.tar\ngunzip Cosmic_NonCodingVariants_v100_GRCh38.vcf.gz\ngunzip Cosmic_NonCodingVariants_v100_GRCh38.tsv.gz\n\n# Run in Linux\necho -e &#039;#Chr\\tStart\\tEnd\\tRef\\tAlt\\tCOSMIC100&#039; &gt; hg38_cosmic100_raw.txt\nprepare_annovar_user.pl -dbtype cosmic Cosmic_GenomeScreensMutant_v100_GRCh38.tsv -vcf Cosmic_GenomeScreensMutant_v100_GRCh38.vcf &gt;&gt; hg38_cosmic100_raw.txt \nprepare_annovar_user.pl -dbtype cosmic Cosmic_NonCodingVariants_v100_GRCh38.tsv -vcf Cosmic_NonCodingVariants_v100_GRCh38.vcf &gt;&gt; hg38_cosmic100_raw.txt \nindex_annovar.pl hg38_cosmic100_raw.txt -outfile hg38_cosmic100.txt <\/code><\/pre>\n<pre><code># \u6ce8\u610f\u624b\u52a8\u6784\u5efa\u7684COSMIC\u6ce8\u91ca\u6587\u4ef6\u4f1a\u6709\u91cd\u590d\u884c\uff0c\u8fd0\u884cannovar\u4f1a\u62a5\u9519\n# \u53c2\u8003\uff1ahttps:\/\/github.com\/WGLab\/doc-ANNOVAR\/issues\/121\n# \u9996\u5148\u68c0\u67e5\u662f\u5426\u6709\u91cd\u590d\u884c\nawk &#039;{count[$0]++} END {for (line in count) if (count[line] &gt; 1) print count[line], line}&#039; hg38_EAS.sites.2015_08.txt\n\n# \u4ee5\u4e0b\u811a\u672c\u53bb\u91cd\uff08\u672a\u5b9e\u9645\u8fd0\u884c\uff09\uff1a\ninput_file=&quot;hg38_EAS.sites.2015_08.txt&quot;\noutput_file=&quot;hg38_EAS.sites.2015_08.unique.txt&quot;\ntemp_file=$(mktemp)\n\nawk &#039;\nBEGIN {\n    FS = OFS = &quot;\\t&quot;;\n}\n\n{\n    key = $1 &quot;\\t&quot; $2 &quot;\\t&quot; $3 &quot;\\t&quot; $4 &quot;\\t&quot; $5;\n    match($0, \/OCCURENCE=([0-9]+)\/, arr);\n    occ = arr[1];\n\n    if (seen[key] == &quot;&quot;) {\n        seen[key] = occ;\n        max_line[key] = $0;\n    } else if (occ &gt; seen[key]) {\n        seen[key] = occ;\n        max_line[key] = $0;\n    }\n}\n\nEND {\n    for (k in max_line) {\n        print max_line[k];\n    }\n}\n&#039; &quot;$input_file&quot; &gt; &quot;$temp_file&quot;\nmv &quot;$temp_file&quot; &quot;$output_file&quot;\necho &quot;De-duplication completed, results are saved in $output_file&quot;\n<\/code><\/pre>\n<h2>3. \u5bf9vcf\u6587\u4ef6\u8fd0\u884cAnnovar<\/h2>\n<pre><code class=\"language-bash\">table_annovar.pl 952T_wes\/952T.purple.somatic.vcf.gz \/home\/zhoukaiwen\/database\/annovar\/humandb_hg38 -buildver hg38 -out 952T -remove -protocol refGene,cytoBand,avsnp151,cosmic70,exac03,gnomad41_exome -operation g,r,f,f,f,f -nastring . -vcfinput<\/code><\/pre>\n<p>\u6ce8\u610f-operation\u7684\u53c2\u6570\u8bbe\u7f6e\uff1a<\/p>\n<ol>\n<li>g: Gene-based annotation (e.g., functional annotation), used in refGene, ensGene<\/li>\n<li>r: Region-based annotation (annotate specific regions), used in cytoBand, genomicSuperDups<\/li>\n<li>f: Filter-based annotation (compare against databases), used in 1000g, avsnp, clinvar<\/li>\n<\/ol>\n<p>\u6279\u91cf\u8fd0\u884c<\/p>\n<pre><code class=\"language-bash\">cd somatic\n\nls *somatic.passed.vcf | perl -ne &#039;chomp; my $name = $1 if ($_ =~ \/([^\\\/]+)\\.somatic\\.passed\\.vcf\/); print &quot;table_annovar.pl $name.somatic.passed.vcf \/data02\/zhangmengmeng\/database\/annovar_db\/humandb_hg38 -buildver hg38 -out $name -remove -protocol refGene,cytoBand,avsnp151,cosmic70,exac03,gnomad41_exome -operation g,r,f,f,f,f -nastring . -vcfinput \\n&quot;&#039;&gt;annovar.sh\n<\/code><\/pre>\n<h2>4. \u5408\u5e76\u591a\u4e2a\u6837\u672c\u7684Annovar\u7ed3\u679c<\/h2>\n<pre><code class=\"language-bash\">for i in *.hg38_multianno.txt\ndo\n    sample=`echo $i | awk -F &#039;.&#039; &#039;{print $1}&#039;`\n    cut -f &#039;1-10&#039; $i | sed &#039;1d&#039; | sed &quot;s\/$\/\\t${sample}\/&quot; &gt;&gt; all_sample.txt\ndone\n\nsed -i &#039;1s\/^\/Chr\\tStart\\tEnd\\tRef\\tAlt\\tFunc.refGene\\tGene.refGene\\tGeneDetail.refGene\\tExonicFunc.refGene\\tAAChange.refGene\\tTumor_Sample_Barcode\\n\/&#039; all_sample.txt<\/code><\/pre>\n<h2>5. \u5728R\u4e2d\u4f7f\u7528maftools\u7ed8\u5236\u5408\u5e76\u7684\u7ed3\u679c<\/h2>\n<pre><code class=\"language-R\">library(maftools)\nvar_maf= annovarToMaf(annovar = &quot;all_sample.txt&quot;, \n                              Center = &#039;NA&#039;, \n                              refBuild = &#039;hg38&#039;, \n                              tsbCol = &#039;Tumor_Sample_Barcode&#039;, \n                              table = &#039;refGene&#039;,MAFobj =T,\n                              sep = &quot;\\t&quot;)\n\nplotmafSummary(maf = var_maf, rmOutlier = TRUE, addStat = &#039;median&#039;)\noncoplot(var_maf)<\/code><\/pre>\n","protected":false},"excerpt":{"rendered":"<p>1. \u4e0b\u8f7dAnnovar\u8f6f\u4ef6\u672c\u4f53 https:\/\/annovar.openbioinformatics.org&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-378","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts\/378","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/comments?post=378"}],"version-history":[{"count":10,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts\/378\/revisions"}],"predecessor-version":[{"id":601,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/posts\/378\/revisions\/601"}],"wp:attachment":[{"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/media?parent=378"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/categories?post=378"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kz-hub.tech\/index.php\/wp-json\/wp\/v2\/tags?post=378"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}