根据有序的多因子列拆分数据框

Split a data-frame based in ordered multi factorial column

我想在数据帧列表中拆分一个数据帧。拆分它的原因是我们总是 father 后跟 mother,然后 offspring。但是,这些家庭成员可能有不止一行(它们总是后续的。例如 father 数字 1 在第 1 行和第 2 行中)。在我下面的例子中,我有两个家庭,然后我试图获得一个包含两个数据框的列表。

我的输入:

df <- 'Chr  Start   End Family
1   187546286   187552094   father
3   108028534   108032021   father
1   4864403 4878685 mother
1   18898657    18904908    mother
2   460238  461771  offspring
3   108028534   108032021   offspring
1   71481449    71532983    father
2   74507242    74511395    father
2   181864092   181864690   mother
1   71481449    71532983    offspring
2   181864092   181864690   offspring
3   160057791   160113642   offspring'

df <- read.table(text=df, header=T)

因此,我的预期输出 dfout[[1]] 将如下所示:

dfout <- 'Chr   Start   End Family
1   187546286   187552094   father
3   108028534   108032021   father
1   4864403 4878685 mother
1   18898657    18904908    mother
2   460238  461771  offspring
3   108028534   108032021   offspring'

dfout - read.table(text=dfout, header=TRUE)

要将每个家庭拆分成一个单独的数据框,您需要一个索引来指示一个家庭在哪里结束,另一个家庭在哪里开始。对于索引,我使用 "father" 作为变化点。但是我们不能简单地使用 indx <- df$Family == "father",因为一行中可以有多个 'father' 条目。相反,我们通过搜索等于 1.

的位置来测试从 'offspring' 到 'father' 的切换位置
indx <- cumsum(c(1L, diff(df$Family == "father")) == 1L)
split(df, indx)
# $`1`
#   Chr     Start       End    Family
# 1   1 187546286 187552094    father
# 2   3 108028534 108032021    father
# 3   1   4864403   4878685    mother
# 4   1  18898657  18904908    mother
# 5   2    460238    461771 offspring
# 6   3 108028534 108032021 offspring
# 
# $`2`
#    Chr     Start       End    Family
# 7    1  71481449  71532983    father
# 8    2  74507242  74511395    father
# 9    2 181864092 181864690    mother
# 10   1  71481449  71532983 offspring
# 11   2 181864092 181864690 offspring
# 12   3 160057791 160113642 offspring

如果您发布用于生成实际数据框的代码会更有帮助。我没有时间重做所有内容,但我会向您展示一般情况下它是如何工作的。

gender <- c("M","M","F","F","F","F","M","M","M","M","F","F")
values <- c(20,22,24,19,9,17,18,22,12,14,7,8)
fruit <- c("apple","pear","mango","mango","mango","apple","banana","banana","banana","mango","apple","apple")
df <- data.frame(gender, values, fruit)


> df
   gender values  fruit
1       M     20  apple
2       M     22   pear
3       F     24  mango
4       F     19  mango
5       F      9  mango
6       F     17  apple
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango
11      F      7  apple
12      F      8  apple

split(df, df$gender)

$F
   gender values fruit
3       F     24 mango
4       F     19 mango
5       F      9 mango
6       F     17 apple
11      F      7 apple
12      F      8 apple

$M
   gender values  fruit
1       M     20  apple
2       M     22   pear
7       M     18 banana
8       M     22 banana
9       M     12 banana
10      M     14  mango