如何从列表列表中编辑数据框中的行名?
how to edit row names in dataframe from list of lists?
我是 R 的新手(我试过搜索;如果在其他地方重复了,我很抱歉!)我需要一些帮助!我正在尝试编辑 data.frame 中的行名称:
我从几个 vcf 文件开始,使用 lapply()
创建一个列表列表,然后使用 unlist()
展平列表,并将提取的指标组合到一个数据框中,但我最终得到以下内容:
> row.names(mydataframe)
[1] "1_S1_annotated_filtered.vcf.gz1" "1_S1_annotated_filtered.vcf.gz2" "1_S1_annotated_filtered.vcf.gz3" "1_S1_annotated_filtered.vcf.gz6"
[5] "1_S1_annotated_filtered.vcf.gz7" "1_S1_annotated_filtered.vcf.gz8"
...
[457] "6_S6_annotated_filtered.vcf.gz877" "6_S6_annotated_filtered.vcf.gz888" "6_S6_annotated_filtered.vcf.gz907" "7_S7_annotated_filtered.vcf.gz309"
[461] "7_S7_annotated_filtered.vcf.gz354" "7_S7_annotated_filtered.vcf.gz477" "7_S7_annotated_filtered.vcf.gz485" "7_S7_annotated_filtered.vcf.gz537"
[465] "7_S7_annotated_filtered.vcf.gz569" "7_S7_annotated_filtered.vcf.gz575" "7_S7_annotated_filtered.vcf.gz721" "7_S7_annotated_filtered.vcf.gz871"
[469] "7_S7_annotated_filtered.vcf.gz892" "8_S8_annotated_filtered.vcf.gz136" "8_S8_annotated_filtered.vcf.gz191" "8_S8_annotated_filtered.vcf.gz967"
而我需要的是
> row.names(mydataframe)
[1] "S1" "S1" "S1" "S1"
[5] "S1" "S1" "S1" "S1"
....
[469] "S7" "S8" "S8" "S8"
有什么建议吗?提前致谢!
我会使用:
library(stringr)
str_extract(row.names(mydataframe),"S[0-9]")
或者,使用 gsub 进行组捕获:
a <- c("6_S6_annotated_filtered.vcf.gz877", "7_S7_annotated_filtered.vcf.gz569")
gsub('[0-9]+_(S[0-9]+)_annotated_filtered.vcf.*',"\1",a)
#[1] "S6" "S7"
建议是:将该信息存储在一个额外的变量中。您不能在数据框中存储非唯一的行名:
df <- data.frame(
A = 1:3,
B = 3:1
)
rownames(df) <- c("D","E","D")
给出:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘D’
所以你可以这样做:
mydataframe$origin <- gsub("\d_(S\d{1})_.+", "\1", rownames(mydataframe))
但您不能将其设置为行名。
我是 R 的新手(我试过搜索;如果在其他地方重复了,我很抱歉!)我需要一些帮助!我正在尝试编辑 data.frame 中的行名称:
我从几个 vcf 文件开始,使用 lapply()
创建一个列表列表,然后使用 unlist()
展平列表,并将提取的指标组合到一个数据框中,但我最终得到以下内容:
> row.names(mydataframe)
[1] "1_S1_annotated_filtered.vcf.gz1" "1_S1_annotated_filtered.vcf.gz2" "1_S1_annotated_filtered.vcf.gz3" "1_S1_annotated_filtered.vcf.gz6"
[5] "1_S1_annotated_filtered.vcf.gz7" "1_S1_annotated_filtered.vcf.gz8"
...
[457] "6_S6_annotated_filtered.vcf.gz877" "6_S6_annotated_filtered.vcf.gz888" "6_S6_annotated_filtered.vcf.gz907" "7_S7_annotated_filtered.vcf.gz309"
[461] "7_S7_annotated_filtered.vcf.gz354" "7_S7_annotated_filtered.vcf.gz477" "7_S7_annotated_filtered.vcf.gz485" "7_S7_annotated_filtered.vcf.gz537"
[465] "7_S7_annotated_filtered.vcf.gz569" "7_S7_annotated_filtered.vcf.gz575" "7_S7_annotated_filtered.vcf.gz721" "7_S7_annotated_filtered.vcf.gz871"
[469] "7_S7_annotated_filtered.vcf.gz892" "8_S8_annotated_filtered.vcf.gz136" "8_S8_annotated_filtered.vcf.gz191" "8_S8_annotated_filtered.vcf.gz967"
而我需要的是
> row.names(mydataframe)
[1] "S1" "S1" "S1" "S1"
[5] "S1" "S1" "S1" "S1"
....
[469] "S7" "S8" "S8" "S8"
有什么建议吗?提前致谢!
我会使用:
library(stringr)
str_extract(row.names(mydataframe),"S[0-9]")
或者,使用 gsub 进行组捕获:
a <- c("6_S6_annotated_filtered.vcf.gz877", "7_S7_annotated_filtered.vcf.gz569")
gsub('[0-9]+_(S[0-9]+)_annotated_filtered.vcf.*',"\1",a)
#[1] "S6" "S7"
建议是:将该信息存储在一个额外的变量中。您不能在数据框中存储非唯一的行名:
df <- data.frame(
A = 1:3,
B = 3:1
)
rownames(df) <- c("D","E","D")
给出:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘D’
所以你可以这样做:
mydataframe$origin <- gsub("\d_(S\d{1})_.+", "\1", rownames(mydataframe))
但您不能将其设置为行名。