如何外推列联表分析?
How to extrapolate cross tabulation analysis?
首先,对于基因G,我想为control
和experimental
条件创建一个pandas数据框,其中0:1的比例为10%和20 %,分别。
import pandas as pd
n = 5000
df = pd.DataFrame.from_dict(
{"Cells": (f'Cell{x}' for x in range(1, n+1)), "Control": np.random.choice([1,0], p=[0.1, 0.9], size=n), "Experimental": np.random.choice([1,0], p=[0.1+0.1, 0.9-0.1], size=n)},
orient='columns'
)
df = df.set_index("Cells")
其次,我对G基因进行了列联分析
# Contingency table/array
table = pd.crosstab(df.index, [df["Control"], df["Experimental"]])
table
现在,我想从步骤 1 和 2 中推断出 1000 个基因的条件,然后进行交叉制表。怎么样?
有多种处理方法。最简单的方法之一是:
- 将模拟程序包装在一个函数中,运行它为每个基因创建一个基于结果的基因列表。
- Concat 所有 gene_based 模拟和 运行 crosstab 函数以获取列联表单个 dataframe.
类似于:
n = 100 # no of records per simulation/ gene
nog = 2 # no of genes
gene_list = ["gene_" + str(i) for i in range(0,nog)]
# simulation procedure
def generate_gene_df(gene, n):
df = pd.DataFrame.from_dict(
{"Gene" : gene,
"Cells": (f'Cell{x}' for x in range(1, n+1)),
"Control": np.random.choice([1,0], p=[0.1, 0.9], size=n),
"Experimental": np.random.choice([1,0], p=[0.1+0.1, 0.9-0.1], size=n)},
orient='columns'
)
df = df.set_index(["Gene","Cells"])
return df
# create a list of gene based simulations generated using your procedure
gene_df_list = [generate_gene_df(gene, n) for gene in gene_list]
single_genes_df = pd.concat(gene_df_list)
single_genes_df = single_genes_df.reset_index()
table = pd.crosstab([single_genes_df["Gene"], single_genes_df["Cells"]],
[single_genes_df["Control"], single_genes_df["Experimental"]])
table
首先,对于基因G,我想为control
和experimental
条件创建一个pandas数据框,其中0:1的比例为10%和20 %,分别。
import pandas as pd
n = 5000
df = pd.DataFrame.from_dict(
{"Cells": (f'Cell{x}' for x in range(1, n+1)), "Control": np.random.choice([1,0], p=[0.1, 0.9], size=n), "Experimental": np.random.choice([1,0], p=[0.1+0.1, 0.9-0.1], size=n)},
orient='columns'
)
df = df.set_index("Cells")
其次,我对G基因进行了列联分析
# Contingency table/array
table = pd.crosstab(df.index, [df["Control"], df["Experimental"]])
table
现在,我想从步骤 1 和 2 中推断出 1000 个基因的条件,然后进行交叉制表。怎么样?
有多种处理方法。最简单的方法之一是:
- 将模拟程序包装在一个函数中,运行它为每个基因创建一个基于结果的基因列表。
- Concat 所有 gene_based 模拟和 运行 crosstab 函数以获取列联表单个 dataframe.
类似于:
n = 100 # no of records per simulation/ gene
nog = 2 # no of genes
gene_list = ["gene_" + str(i) for i in range(0,nog)]
# simulation procedure
def generate_gene_df(gene, n):
df = pd.DataFrame.from_dict(
{"Gene" : gene,
"Cells": (f'Cell{x}' for x in range(1, n+1)),
"Control": np.random.choice([1,0], p=[0.1, 0.9], size=n),
"Experimental": np.random.choice([1,0], p=[0.1+0.1, 0.9-0.1], size=n)},
orient='columns'
)
df = df.set_index(["Gene","Cells"])
return df
# create a list of gene based simulations generated using your procedure
gene_df_list = [generate_gene_df(gene, n) for gene in gene_list]
single_genes_df = pd.concat(gene_df_list)
single_genes_df = single_genes_df.reset_index()
table = pd.crosstab([single_genes_df["Gene"], single_genes_df["Cells"]],
[single_genes_df["Control"], single_genes_df["Experimental"]])
table