Skip to content

KevinZ的小窝

Menu
  • Home
  • Categories
Menu

使用vcftools拆分多样本vcf文件

Posted on 2025年 4月 27日2025年 6月 26日 by KevinZhou

vcftools下载地址:

https://vcftools.github.io/
安装:

autogen.sh
./configure --prefix=/data02/zhangmengmeng/software/vcftools-vcftools-d511f46
make
make install

# ~/.bashrc 中添加:
export PATH=/data02/zhangmengmeng/software/vcftools-vcftools-d511f46/bin:$PATH
export PERL5LIB=/data02/zhangmengmeng/software/vcftools-vcftools-d511f46/src/perl/:$PERL5LIB

参考链接:

https://www.biostars.org/p/108112/#108816

拆分文件

# Use VCFtools' vcf-query to make a list of all the sample IDs in the multisample VCF:
vcf-query --list-columns allsamples.vcf > sample_ids
此处需要手动打开文件删除normal样本(此处为-B)

# For each sample ID, run vcf-subset to create per-sample VCFs in a subfolder:
mkdir vcf2maf
cat FETB01_sample_ids | perl -ne 'chomp; print `cat FETB01.somatic.passed.vcf | vcf-subset --columns FETB01-B,$_ > vcf2maf/$_.vcf`'
cat FETB02_sample_ids | perl -ne 'chomp; print `cat FETB02.somatic.passed.vcf | vcf-subset --columns FETB02-B,$_ > vcf2maf/$_.vcf`'
cat FETB03_sample_ids | perl -ne 'chomp; print `cat FETB03.somatic.passed.vcf | vcf-subset --columns FETB03-B,$_ > vcf2maf/$_.vcf`'
cat FETB04_sample_ids | perl -ne 'chomp; print `cat FETB04.somatic.passed.vcf | vcf-subset --columns FETB04-B,$_ > vcf2maf/$_.vcf`'
cat FETB05_sample_ids | perl -ne 'chomp; print `cat FETB05.somatic.passed.vcf | vcf-subset --columns FETB05-B,$_ > vcf2maf/$_.vcf`'
cat FETB06_sample_ids | perl -ne 'chomp; print `cat FETB06.somatic.passed.vcf | vcf-subset --columns FETB06-B,$_ > vcf2maf/$_.vcf`'

# For each VCF, run vcf2maf with the `--tumor-id` specified, to create per-sample MAFs into the subfolder:
perl -ne 'chomp; print "perl vcf2maf.pl --input-vcf vcf2maf/$_.vcf --output-maf vcf2maf/$_.vep.maf --tumor-id $_\n"' *sample_ids >> vcf2maf/vcf2maf.sh
# or:
perl -ne 'chomp;next if($_=~/^$/);my @a=split /\t/;print "vcf2maf.pl --input-vcf $a[0].vcf --output-maf $a[0].maf --normal-id $a[1] --tumor-id $a[0] --vcf-tumor-id $a[0] --vcf-normal-id $a[1] --vep-path /home/zhoukaiwen/software/anaconda3/envs/vcf2maf/bin --vep-data /home/zhoukaiwen/database/vep/ --ref-fasta /home/zhoukaiwen/database/GRCh38/genecode_GRCh38.p14.genome.fa --verbose --ncbi-build GRCh38 --cache-version 114 && cat $a[0].maf|grep -E \"version|FILTER|PASS\" > $a[0]_filtered.maf && echo $a[0] vcf2maf ok\n"' ../../somatic/sample_pair.txt >vcf2maf.sh

# Concatenate the per-sample MAFs together, making sure that the MAF header is not duplicated:
cat *_filtered.maf | egrep "^#|^Hugo_Symbol" | head -2 > AllSamples_filtered.maf
cat *_filtered.maf | egrep -v "^#|^Hugo_Symbol" >> AllSamples_filtered.maf

cat *M_filtered.maf | egrep "^#|^Hugo_Symbol" | head -2 > AllSamples_M_filtered.maf
cat *M_filtered.maf | egrep -v "^#|^Hugo_Symbol" >> AllSamples_M_filtered.maf

cat *E_filtered.maf | egrep "^#|^Hugo_Symbol" | head -2 > AllSamples_E_filtered.maf
cat *M_filtered.maf | egrep -v "^#|^Hugo_Symbol" >> AllSamples_E_filtered.maf
2025 年 4 月
一 二 三 四 五 六 日
 123456
78910111213
14151617181920
21222324252627
282930  
« 3 月   5 月 »

俺家的猫~

胖达~

© 2026 KevinZ的小窝 |

粤ICP备2023017690号

|

粤公网安备 44010402003004号