如何根据 R 中另一行的条件填充一行的值?
How to populate values of one row conditional of another row in R?
我继承了一个以不寻常方式编码的数据集。我想学习一种不那么冗长的方法来重塑它。数据框如下所示:
# Input.
participant = c(rep("John",6), rep("Mary",6))
day = c(rep(1,3), rep(2,3), rep(1,3), rep(2,3))
likes = c("apples", "apples", "18", "apples", "apples", "7", "bananas", "bananas", "24", "bananas", "bananas", "3")
question = rep(c(1,1,0),4)
number = c(rep(18,3), rep(7,3), rep(24,3), rep(3,3))
df = data.frame(participant, day, question, likes)
participant day question likes
1 John 1 1 apples
2 John 1 1 apples
3 John 1 0 18
4 John 2 1 apples
5 John 2 1 apples
6 John 2 0 7
7 Mary 1 1 bananas
8 Mary 1 1 bananas
9 Mary 1 0 24
10 Mary 2 1 bananas
11 Mary 2 1 bananas
12 Mary 2 0 3
如您所见,喜欢 列是异构的。当 question 等于 0 时,likes 传达参与者选择的数字,而不是他们喜欢的水果。所以我想在新的专栏中重新编码如下:
participant day question likes number
1 John 1 1 apples 18
2 John 1 1 apples 18
3 John 1 0 18 18
4 John 2 1 apples 7
5 John 2 1 apples 7
6 John 2 0 7 7
7 Mary 1 1 bananas 24
8 Mary 1 1 bananas 24
9 Mary 1 0 24 24
10 Mary 2 1 bananas 3
11 Mary 2 1 bananas 3
12 Mary 2 0 3 3
我目前使用 base R 的解决方案包括对初始数据框进行子集化、创建查找 table、更改列名,然后将查找 table 与原始数据框合并。但这涉及几个步骤,我担心应该有一个更简单的解决方案。我认为 tidyr
可能是答案,但我不知道如何使用它在一列中传播值 (likes) 有条件的其他列 (天和问题).
你有什么建议吗?非常感谢!
处理新示例数据:
# create a key column, overwrite it later
df$number <- paste0(df$participant, df$day) # use as a key
# create lookup table
lookup <- df[!is.na(as.numeric(as.character(df$likes))), c("number", "likes")]
# use lookup to overwrite df$number with the appropriate number
df$number <- lookup$likes[match(df$number, lookup$number)]
# participant day question likes number
#1 John 1 1 apples 18
#2 John 1 1 apples 18
#3 John 1 0 18 18
#4 John 2 1 apples 7
#5 John 2 1 apples 7
#6 John 2 0 7 7
#7 Mary 1 1 bananas 24
#8 Mary 1 1 bananas 24
#9 Mary 1 0 24 24
#10 Mary 2 1 bananas 3
#11 Mary 2 1 bananas 3
#12 Mary 2 0 3 3
由于将字符转换为数字 (as.numeric(as.character(df$likes))
),预计会出现有关强制引入 NA 的警告。
如果您的数据像示例中那样排序,您可以使用 zoo
包中的 na.locf
:
library(zoo)
df$age <- na.locf(as.numeric(as.character(df$likes)), fromLast = TRUE)
使用上面的数据集,您可以尝试以下操作。您按 participant
和 day
对数据进行分组,并为每个组查找包含 question == 0
的行。
library(dplyr)
group_by(df, participant, day) %>%
mutate(age = as.numeric(as.character(likes[which(question == 0)])))
或者按照alistaire的建议,你也可以使用grep()
。
group_by(df, participant, day) %>%
mutate(age = as.numeric(grep('\d+', likes, value = TRUE)))
# participant day question likes age
# (fctr) (dbl) (dbl) (fctr) (dbl)
#1 John 1 1 apples 18
#2 John 1 1 apples 18
#3 John 1 0 18 18
#4 John 2 1 apples 7
#5 John 2 1 apples 7
#6 John 2 0 7 7
#7 Mary 1 1 bananas 24
#8 Mary 1 1 bananas 24
#9 Mary 1 0 24 24
#10 Mary 2 1 bananas 3
#11 Mary 2 1 bananas 3
#12 Mary 2 0 3 3
如果你想使用data.table,你可以这样做:
library(data.table)
setDT(df)[, age := as.numeric(as.character(likes[which(question == 0)])),
by = list(participant, day)]
注意
当前数据集是新的。 Jota 的答案适用于已删除的数据集。
我继承了一个以不寻常方式编码的数据集。我想学习一种不那么冗长的方法来重塑它。数据框如下所示:
# Input.
participant = c(rep("John",6), rep("Mary",6))
day = c(rep(1,3), rep(2,3), rep(1,3), rep(2,3))
likes = c("apples", "apples", "18", "apples", "apples", "7", "bananas", "bananas", "24", "bananas", "bananas", "3")
question = rep(c(1,1,0),4)
number = c(rep(18,3), rep(7,3), rep(24,3), rep(3,3))
df = data.frame(participant, day, question, likes)
participant day question likes
1 John 1 1 apples
2 John 1 1 apples
3 John 1 0 18
4 John 2 1 apples
5 John 2 1 apples
6 John 2 0 7
7 Mary 1 1 bananas
8 Mary 1 1 bananas
9 Mary 1 0 24
10 Mary 2 1 bananas
11 Mary 2 1 bananas
12 Mary 2 0 3
如您所见,喜欢 列是异构的。当 question 等于 0 时,likes 传达参与者选择的数字,而不是他们喜欢的水果。所以我想在新的专栏中重新编码如下:
participant day question likes number
1 John 1 1 apples 18
2 John 1 1 apples 18
3 John 1 0 18 18
4 John 2 1 apples 7
5 John 2 1 apples 7
6 John 2 0 7 7
7 Mary 1 1 bananas 24
8 Mary 1 1 bananas 24
9 Mary 1 0 24 24
10 Mary 2 1 bananas 3
11 Mary 2 1 bananas 3
12 Mary 2 0 3 3
我目前使用 base R 的解决方案包括对初始数据框进行子集化、创建查找 table、更改列名,然后将查找 table 与原始数据框合并。但这涉及几个步骤,我担心应该有一个更简单的解决方案。我认为 tidyr
可能是答案,但我不知道如何使用它在一列中传播值 (likes) 有条件的其他列 (天和问题).
你有什么建议吗?非常感谢!
处理新示例数据:
# create a key column, overwrite it later
df$number <- paste0(df$participant, df$day) # use as a key
# create lookup table
lookup <- df[!is.na(as.numeric(as.character(df$likes))), c("number", "likes")]
# use lookup to overwrite df$number with the appropriate number
df$number <- lookup$likes[match(df$number, lookup$number)]
# participant day question likes number
#1 John 1 1 apples 18
#2 John 1 1 apples 18
#3 John 1 0 18 18
#4 John 2 1 apples 7
#5 John 2 1 apples 7
#6 John 2 0 7 7
#7 Mary 1 1 bananas 24
#8 Mary 1 1 bananas 24
#9 Mary 1 0 24 24
#10 Mary 2 1 bananas 3
#11 Mary 2 1 bananas 3
#12 Mary 2 0 3 3
由于将字符转换为数字 (as.numeric(as.character(df$likes))
),预计会出现有关强制引入 NA 的警告。
如果您的数据像示例中那样排序,您可以使用 zoo
包中的 na.locf
:
library(zoo)
df$age <- na.locf(as.numeric(as.character(df$likes)), fromLast = TRUE)
使用上面的数据集,您可以尝试以下操作。您按 participant
和 day
对数据进行分组,并为每个组查找包含 question == 0
的行。
library(dplyr)
group_by(df, participant, day) %>%
mutate(age = as.numeric(as.character(likes[which(question == 0)])))
或者按照alistaire的建议,你也可以使用grep()
。
group_by(df, participant, day) %>%
mutate(age = as.numeric(grep('\d+', likes, value = TRUE)))
# participant day question likes age
# (fctr) (dbl) (dbl) (fctr) (dbl)
#1 John 1 1 apples 18
#2 John 1 1 apples 18
#3 John 1 0 18 18
#4 John 2 1 apples 7
#5 John 2 1 apples 7
#6 John 2 0 7 7
#7 Mary 1 1 bananas 24
#8 Mary 1 1 bananas 24
#9 Mary 1 0 24 24
#10 Mary 2 1 bananas 3
#11 Mary 2 1 bananas 3
#12 Mary 2 0 3 3
如果你想使用data.table,你可以这样做:
library(data.table)
setDT(df)[, age := as.numeric(as.character(likes[which(question == 0)])),
by = list(participant, day)]
注意
当前数据集是新的。 Jota 的答案适用于已删除的数据集。