我在一个 tsv 文件中有 300,000 列,我只需要其中的 10,000 列。
I have 300,000 columns in a tsv file, I only need 10,000 of them.
都以"rsid_set(variable)"开头。我几乎没有编码经验,但一直在尝试使用 R 和 python。有没有什么快速的方法来获得我想要的那些列?
追问:有没有办法把每一列的均值都变成10000个值的正态分布?
# read in
df <- read.tsv("path/to/your/file")
# select only colnames beginning with rsid_set
df <- df[grep("^rsid_set",colnames(df)),]
Your follow-up, I don't understand. You'll have to clarify what you want.
# Take the means of each column:
means <- colMeans(df)
# normal distribution with 10k values
norms <- rnorm(10e3)
都以"rsid_set(variable)"开头。我几乎没有编码经验,但一直在尝试使用 R 和 python。有没有什么快速的方法来获得我想要的那些列?
追问:有没有办法把每一列的均值都变成10000个值的正态分布?
# read in
df <- read.tsv("path/to/your/file")
# select only colnames beginning with rsid_set
df <- df[grep("^rsid_set",colnames(df)),]
Your follow-up, I don't understand. You'll have to clarify what you want.
# Take the means of each column:
means <- colMeans(df)
# normal distribution with 10k values
norms <- rnorm(10e3)