1. 下载UPhyloplot2软件
UPhyloplot2 Github官网: https://github.com/harbourlab/UPhyloplot2
参考文档: https://www.jianshu.com/p/38280bda882a
2. 运行 InferCNV
# 构建infercnv_obj软件报的提示:
# Please use "options(scipen = 100)" before running infercnv if you are using the analysis_mode="subclusters" option or you may encounter an error while the hclust is being generated.
# 一定要设定为random_trees运行,否则自动选择Leiden clustering,但是这样运行巨慢,跑1个树的分支就要10h,应用后台R nohup运行
options(scipen = 100)
infercnv_obj = infercnv::run(infercnv_obj,
cutoff=0.1, # cutoff=1 works well for Smart-seq2, and cutoff=0.1 works well for 10x Genomics
out_dir="InferCNV_MPT01_UPhyloplot2",
cluster_by_groups=FALSE,
analysis_mode="subclusters", #默认是"samples"
denoise=TRUE,
write_expr_matrix = T,
HMM=TRUE,
plot_steps=T,
scale_data=T,
noise_filter=0.12,
HMM_type='i6',
num_threads=8,
hclust_method='ward.D2',
tumor_subcluster_partition_method='random_trees',
tumor_subcluster_pval=0.05)
3. InferCNV跑完后的数据预处理
输出结果有几个文件在后面画进化树会用到:
- 画进化树需要: 17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.cell_groupings(包含了根据CNV分类的结果,一共两列,一列是类别名称, 共8类,但是有一类是参考细胞,所以要去掉参考,剩下7类;另一列是细胞编号。)
- 注释进化树的分支需要:
① HMM_CNV_predictions.HMMi6.rand_trees.hmm_mode-subclusters.Pnorm_0.5.pred_cnv_regions.dat
② HMM_CNV_predictions.HMMi6.rand_trees.hmm_mode-subclusters.Pnorm_0.5.pred_cnv_genes.dat - 注意这两个文件里state的意义:
State 1: 0x: complete loss
State 2: 0.5x: loss of one copy
State 3: 1x: neutral
State 4: 1.5x: addition of one copy
State 5: 2x: addition of two copies
State 6: 3x: essentially a placeholder for >2x copies but modeled as 3x
# 去掉画进化树需要的文件的参考细胞的行
sed '/^all_references/d' < 17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.cell_groupings > trimmed_infercnv.cell_groupings
# 处理后添加header: cell_group_name cell
cat 17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.cell_groupings| grep "all_observations" > trimmed_infercnv.cell_groupings
# 去掉注释进化树需要的文件的参考细胞的行
# genes.dat文件处理后添加header: cell_group_name gene_region_name state gene chr start end
# regions.dat文件处理后添加header: cell_group_name cnv_name state chr start end
cat 17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_genes.dat|grep "all_observations" > trimmed_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_genes.dat
cat 17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_regions.dat|grep "all_observations" > trimmed_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_regions.dat
4. 绘图
使用的时候,将主程序uphyloplot2.py和文件夹Inputs放在一起,上面提到cell_groupings文件放到Inputs文件夹里面。UPhyloplot2 将生成一个“output.svg”矢量图形图。此外,它将生成一个名为“CNV_files”的新文件夹,其中包含每个输入的 CNV 文件,其中包含第 1 列中由 inferCNV 标识的亚克隆 ID、第 2 列中每个亚克隆的细胞百分比以及标记亚克隆的字母第 3 列中的 output.svg 文件。
# 在文件夹下直接运行即可
python uphyloplot2.py
5. 注释图片
筛选每个克隆独特的CNV区域
# unique_cnv_region_for_cellgroup.py
import pandas as pd
# Load the data into a pandas DataFrame
file_path = './trimmed_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_regions.dat'
df = pd.read_csv(file_path, sep='\t')
# Extract unique CNV regions for each cell group
unique_cnv_regions = df.groupby('cell_group_name')['cnv_name'].unique().reset_index()
# Convert the unique CNV regions to a dictionary for easy viewing
unique_cnv_dict = unique_cnv_regions.set_index('cell_group_name').to_dict()['cnv_name']
# Print the unique CNV regions for each cell group
for cell_group, cnv_regions in unique_cnv_dict.items():
print(f"Cell Group: {cell_group}")
print(f"Unique CNV Regions: {', '.join(cnv_regions)}")
print()
筛选每个克隆独特的CNV 基因
# unique_cnv_gene_for_cellgroup.py
import pandas as pd
# Load the data into a pandas DataFrame
file_path = './trimmed_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_genes.dat'
df = pd.read_csv(file_path, sep='\t')
# Extract unique CNV regions for each cell group
unique_cnv_genes = df.groupby('cell_group_name')['gene'].unique().reset_index()
# Convert the unique CNV regions to a dictionary for easy viewing
unique_cnv_dict = unique_cnv_genes.set_index('cell_group_name').to_dict()['gene']
# Print the unique CNV regions for each cell group
for cell_group, cnv_genes in unique_cnv_dict.items():
print(f"Cell Group: {cell_group}")
print(f"Unique CNV Genes: {', '.join(cnv_genes)}")
print()
# 运行并输出结果:
python unique_cnv_region_for_cellgroup.py > unique_cnv_region_for_cellgroup.txt
python unique_cnv_gene_for_cellgroup.py > unique_cnv_gene_for_cellgroup.txt