官网链接:https://tuftsbcb.github.io/RegDiffusion/downstream_with_pyscenic.html
1. Gene Regulatory Network (GRN) Inference (python3.12)
注意:低版本python的scanpy无法读取SeuratV5转换后的h5ad,以下代码仅在python3.12的scanpy运行测试过
import numpy as np
import pandas as pd
import scanpy as sc
import regdiffusion as rd
import loompy as lp
h5ad_path = f'16samples_Merged_AllCells_Annotated_Final.h5ad'
cisdb_path = f'/home/zhoukaiwen/database/cisTarget/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
# Load data
adata = sc.read_h5ad(h5ad_path)
# Filter the gene not present in the cisTargetDB
cisdb = pd.read_feather(cisdb_path)
adata = adata[:, adata.var_names.isin(cisdb.columns)]
adata
# .X has been normalized so we need to get from the counts layer.
# Counts is stored in a sparse matrix so we need to turn it to an array and
# then perform log + 1 transformation.
x = adata.layers["counts"].toarray()
x = np.log(x+1.0)
# run RegDiffusionTrainer
rd_trainer = rd.RegDiffusionTrainer(x)
rd_trainer.train()
# Extract edges from GRN
## Now we focus on edges with weight > 50 percentile.
grn = rd_trainer.get_grn(adata.var_names, top_gene_percentile=50)
## Here for each gene, we are going to extract all edges
edgelist = grn.extract_edgelist(k=-1, workers=4) # By default, k=20. If you want to extract all edges, you can set k=-1
edgelist.columns = ['TF', 'target', 'importance']
edgelist.to_csv(f'grn_naive.tsv', sep='\t', index=False)
## check edgelist.
edgelist
lp.create(
filename=f'mtx.loom',
layers={"": x.transpose()},
row_attrs={"Gene": list(adata.var_names)},
col_attrs={"CellID": list(adata.obs_names)}
)
由于服务器没有开放显卡节点,本数据11w细胞,10核cpu运行需约30h左右
2. Prunning and AUCell Calculation in PySCENIC (python3.7)
此处需要安装在conda中新建python3.7环境安装pySCENIC
#!/bin/bash
#SBATCH --ntasks 4
#SBATCH -N 1
#SBATCH -o RunpySCENIC2.o
#SBATCH -e RunpySCENIC2.o
#SBATCH --job-name=pySCENIC2
source /home/zhoukaiwen/software/anaconda3/bin/activate
conda activate pyscenic
dir=/home/zhoukaiwen/database/cisTarget
tfs=$dir/hs_hgnc_tfs.txt
feather=$dir/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
tbl=$dir/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
ls $tfs $feather $tbl
# 此处导入上一步的结果
grn_fp=grn_naive.tsv
exp_mtx_fp=mtx.loom
pyscenic ctx \
$grn_fp $feather \
--annotations_fname $tbl \
--expression_mtx_fname $exp_mtx_fp \
--mode "dask_multiprocessing" \
--output ctx.csv \
--num_workers 10 \
--mask_dropouts
echo pyscenic step3.2 ctx done
pyscenic aucell \
$exp_mtx_fp \
ctx.csv \
--output 16samples_Merged_AllCells_Annotated_Final_pyscenic_Result.loom \
--num_workers 10
echo pyscenic step3.3 aucell done
后续选择用scanpy或seurat可视化