Add cell cycle status prediction per cell based on marker genes to notebook 2
We need a function that adds a cell cycle prediction to all cells within an anndata. This function should accept an anndata and a list (or file) of cell cycle genes. In the end the prediction should be added as .obs columns. The only change to anndata should be the added .obs columns. Make sure it is not scaled when returned.
def predict_cell_cycle(anndata, gene_set, scale_kw, cc_kw, inplace):
# load gene set if file
# scale data
# add scale keyword arguments
# predict cell cycle
# add cell cycle keyword arguments
# add cell cycle columns to anndata.obs
# depending on inplace
# return anndata or None
Bonus
We already store relevant gene sets within the sctoolbox. Please find a way to make them available for this function.
P.S.: Look here for an example
Legacy code
#calculate cell cycle phase
if cellcycle == "True":
filecellcycle = '/mnt/agnerds/stefan.guenther/'+species+'_cellcycle_genes.txt'
if ( os.path.isfile(filecellcycle) == True ):
# Read cell cycle genes
cell_cycle_genes = pd.read_csv(filecellcycle,sep='\t',header=None,index_col=0,names=['phase'])
adata_cc = adata.copy()
# Scale the date before scoring
scanpy.pp.scale(adata_cc)
# Score the cells by s phase or g2m phase
scanpy.tl.score_genes_cell_cycle(adata_cc, s_genes=cell_cycle_genes[cell_cycle_genes['phase'].isin(['s_genes'])].index.tolist(), g2m_genes=cell_cycle_genes[cell_cycle_genes['phase'].isin(['g2m_genes'])].index.tolist())
adata.obs['S_score'] = adata_cc.obs['S_score']
adata.obs['G2M_score'] = adata_cc.obs['G2M_score']
adata.obs['phase'] = adata_cc.obs['phase']
Edited by Hendrik Schultheis