使用RegDiffusion替代pySCENIC的 GRNBoost2 & GENIE3 进行基因调控网络分析(GRN)

官网链接：https://tuftsbcb.github.io/RegDiffusion/downstream_with_pyscenic.html

1. Gene Regulatory Network (GRN) Inference (python3.12)

注意：低版本python的scanpy无法读取SeuratV5转换后的h5ad，以下代码仅在python3.12的scanpy运行测试过

import numpy as np
import pandas as pd
import scanpy as sc
import regdiffusion as rd
import loompy as lp

h5ad_path = f'16samples_Merged_AllCells_Annotated_Final.h5ad'
cisdb_path = f'/home/zhoukaiwen/database/cisTarget/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'

# Load data
adata = sc.read_h5ad(h5ad_path)

# Filter the gene not present in the cisTargetDB
cisdb = pd.read_feather(cisdb_path)
adata = adata[:, adata.var_names.isin(cisdb.columns)]
adata

# .X has been normalized so we need to get from the counts layer.
# Counts is stored in a sparse matrix so we need to turn it to an array and 
# then perform log + 1 transformation.
x = adata.layers["counts"].toarray()
x = np.log(x+1.0)

# run RegDiffusionTrainer
rd_trainer = rd.RegDiffusionTrainer(x)
rd_trainer.train()

# Extract edges from GRN
## Now we focus on edges with weight > 50 percentile. 
grn = rd_trainer.get_grn(adata.var_names, top_gene_percentile=50)

## Here for each gene, we are going to extract all edges
edgelist = grn.extract_edgelist(k=-1, workers=4) # By default, k=20. If you want to extract all edges, you can set k=-1
edgelist.columns = ['TF', 'target', 'importance']
edgelist.to_csv(f'grn_naive.tsv', sep='\t', index=False)

## check edgelist.  
edgelist

lp.create(
    filename=f'mtx.loom',
    layers={"": x.transpose()},
    row_attrs={"Gene": list(adata.var_names)},
    col_attrs={"CellID": list(adata.obs_names)}
)

由于服务器没有开放显卡节点，本数据11w细胞，10核cpu运行需约30h左右

2. Prunning and AUCell Calculation in PySCENIC (python3.7)

此处需要安装在conda中新建python3.7环境安装pySCENIC

#!/bin/bash
#SBATCH --ntasks 4
#SBATCH -N 1
#SBATCH -o RunpySCENIC2.o
#SBATCH -e RunpySCENIC2.o
#SBATCH --job-name=pySCENIC2
source /home/zhoukaiwen/software/anaconda3/bin/activate
conda activate pyscenic

dir=/home/zhoukaiwen/database/cisTarget
tfs=$dir/hs_hgnc_tfs.txt
feather=$dir/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
tbl=$dir/motifs-v10nr_clust-nr.hgnc-m0.001-o0.0.tbl
ls $tfs  $feather  $tbl

# 此处导入上一步的结果
grn_fp=grn_naive.tsv
exp_mtx_fp=mtx.loom

pyscenic ctx \
$grn_fp $feather \
--annotations_fname $tbl \
--expression_mtx_fname $exp_mtx_fp  \
--mode "dask_multiprocessing" \
--output ctx.csv \
--num_workers 10 \
--mask_dropouts

echo pyscenic step3.2 ctx done

pyscenic aucell \
$exp_mtx_fp \
ctx.csv \
--output 16samples_Merged_AllCells_Annotated_Final_pyscenic_Result.loom \
--num_workers 10

echo pyscenic step3.3 aucell done

后续选择用scanpy或seurat可视化

一	二	三	四	五	六	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30