根据有序的多因子列拆分数据框
Split a data-frame based in ordered multi factorial column
我想在数据帧列表中拆分一个数据帧。拆分它的原因是我们总是 father
后跟 mother
,然后 offspring
。但是,这些家庭成员可能有不止一行(它们总是后续的。例如 father
数字 1 在第 1 行和第 2 行中)。在我下面的例子中,我有两个家庭,然后我试图获得一个包含两个数据框的列表。
我的输入:
df <- 'Chr Start End Family
1 187546286 187552094 father
3 108028534 108032021 father
1 4864403 4878685 mother
1 18898657 18904908 mother
2 460238 461771 offspring
3 108028534 108032021 offspring
1 71481449 71532983 father
2 74507242 74511395 father
2 181864092 181864690 mother
1 71481449 71532983 offspring
2 181864092 181864690 offspring
3 160057791 160113642 offspring'
df <- read.table(text=df, header=T)
因此,我的预期输出 dfout[[1]]
将如下所示:
dfout <- 'Chr Start End Family
1 187546286 187552094 father
3 108028534 108032021 father
1 4864403 4878685 mother
1 18898657 18904908 mother
2 460238 461771 offspring
3 108028534 108032021 offspring'
dfout - read.table(text=dfout, header=TRUE)
要将每个家庭拆分成一个单独的数据框,您需要一个索引来指示一个家庭在哪里结束,另一个家庭在哪里开始。对于索引,我使用 "father" 作为变化点。但是我们不能简单地使用 indx <- df$Family == "father"
,因为一行中可以有多个 'father' 条目。相反,我们通过搜索等于 1.
的位置来测试从 'offspring' 到 'father' 的切换位置
indx <- cumsum(c(1L, diff(df$Family == "father")) == 1L)
split(df, indx)
# $`1`
# Chr Start End Family
# 1 1 187546286 187552094 father
# 2 3 108028534 108032021 father
# 3 1 4864403 4878685 mother
# 4 1 18898657 18904908 mother
# 5 2 460238 461771 offspring
# 6 3 108028534 108032021 offspring
#
# $`2`
# Chr Start End Family
# 7 1 71481449 71532983 father
# 8 2 74507242 74511395 father
# 9 2 181864092 181864690 mother
# 10 1 71481449 71532983 offspring
# 11 2 181864092 181864690 offspring
# 12 3 160057791 160113642 offspring
如果您发布用于生成实际数据框的代码会更有帮助。我没有时间重做所有内容,但我会向您展示一般情况下它是如何工作的。
gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)
> df
gender values fruit
1 M 20 apple
2 M 22 pear
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
7 M 18 banana
8 M 22 banana
9 M 12 banana
10 M 14 mango
11 F 7 apple
12 F 8 apple
split(df, df$gender)
$F
gender values fruit
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
11 F 7 apple
12 F 8 apple
$M
gender values fruit
1 M 20 apple
2 M 22 pear
7 M 18 banana
8 M 22 banana
9 M 12 banana
10 M 14 mango
我想在数据帧列表中拆分一个数据帧。拆分它的原因是我们总是 father
后跟 mother
,然后 offspring
。但是,这些家庭成员可能有不止一行(它们总是后续的。例如 father
数字 1 在第 1 行和第 2 行中)。在我下面的例子中,我有两个家庭,然后我试图获得一个包含两个数据框的列表。
我的输入:
df <- 'Chr Start End Family
1 187546286 187552094 father
3 108028534 108032021 father
1 4864403 4878685 mother
1 18898657 18904908 mother
2 460238 461771 offspring
3 108028534 108032021 offspring
1 71481449 71532983 father
2 74507242 74511395 father
2 181864092 181864690 mother
1 71481449 71532983 offspring
2 181864092 181864690 offspring
3 160057791 160113642 offspring'
df <- read.table(text=df, header=T)
因此,我的预期输出 dfout[[1]]
将如下所示:
dfout <- 'Chr Start End Family
1 187546286 187552094 father
3 108028534 108032021 father
1 4864403 4878685 mother
1 18898657 18904908 mother
2 460238 461771 offspring
3 108028534 108032021 offspring'
dfout - read.table(text=dfout, header=TRUE)
要将每个家庭拆分成一个单独的数据框,您需要一个索引来指示一个家庭在哪里结束,另一个家庭在哪里开始。对于索引,我使用 "father" 作为变化点。但是我们不能简单地使用 indx <- df$Family == "father"
,因为一行中可以有多个 'father' 条目。相反,我们通过搜索等于 1.
indx <- cumsum(c(1L, diff(df$Family == "father")) == 1L)
split(df, indx)
# $`1`
# Chr Start End Family
# 1 1 187546286 187552094 father
# 2 3 108028534 108032021 father
# 3 1 4864403 4878685 mother
# 4 1 18898657 18904908 mother
# 5 2 460238 461771 offspring
# 6 3 108028534 108032021 offspring
#
# $`2`
# Chr Start End Family
# 7 1 71481449 71532983 father
# 8 2 74507242 74511395 father
# 9 2 181864092 181864690 mother
# 10 1 71481449 71532983 offspring
# 11 2 181864092 181864690 offspring
# 12 3 160057791 160113642 offspring
如果您发布用于生成实际数据框的代码会更有帮助。我没有时间重做所有内容,但我会向您展示一般情况下它是如何工作的。
gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)
> df
gender values fruit
1 M 20 apple
2 M 22 pear
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
7 M 18 banana
8 M 22 banana
9 M 12 banana
10 M 14 mango
11 F 7 apple
12 F 8 apple
split(df, df$gender)
$F
gender values fruit
3 F 24 mango
4 F 19 mango
5 F 9 mango
6 F 17 apple
11 F 7 apple
12 F 8 apple
$M
gender values fruit
1 M 20 apple
2 M 22 pear
7 M 18 banana
8 M 22 banana
9 M 12 banana
10 M 14 mango