比较r中多行的无序集
Comparing unordered sets across multiple rows in r
我想比较多行的无序数据集,将每行中的不匹配数据识别到新列中。例如,我的数据结构如下
org_id <- c("1234", "1234", "1234", "1234", "2345","2345", "2345")
original_value <- c("food", "dental care", "diapers", " ", "care", "housing", "utilities")
new_value <- c("dental care", "emergency food", "diapers", "dental care", "housing", "utilities", "care")
date_change <- c("2018-01-31", "2018-01-31", "2018-01-31", "2018-01-31","2018-01-31", "2018-01-31", "2018-01-31")
df <- data.frame(org_id, original_value,new_value, date_change)
其中每一行代表对组织服务的更改,“date_change”表示更改发生的日期。您会注意到,当您查看与第一个组织相关的更改时,其中一些仅代表所列服务顺序的更改,而不是服务的更改(例如,组织“1234”的“牙科护理”)。我想要一个输出来标识新列中实际删除的值和实际添加的值,如下例所示:
org_id2 <- c("1234", "1234")
removed_value <- c("food", " ")
added_value <- c("emergency food","housing")
date_change2 <- c("2018-01-31","2018-01-31")
df2 <- data.frame(org_id2, removed_value, added_value, date_change2)
关于如何解决这个问题有什么想法吗?谢谢!
也许是这样的:
df %>%
pivot_longer(2:3) %>%
group_by(org_id,date_change,value) %>%
filter(n()==1 & trimws(value)!="") %>%
pivot_wider(id_cols = org_id:date_change,names_from = name, values_from = value)
输出:
org_id date_change original_value new_value
<chr> <chr> <chr> <chr>
1 1234 2018-01-31 food emergency food
如果你想要长格式,并且想要保留空字符串等,你可以这样做:
df %>%
pivot_longer(2:3,names_to="action") %>%
group_by(org_id,date_change,value) %>%
filter(n()==1) %>%
mutate(action=if_else(action=="original_value", "removed", "added"))
输出:
org_id date_change action value
<chr> <chr> <chr> <chr>
1 1234 2018-01-31 removed "food"
2 1234 2018-01-31 added "emergency food"
3 1234 2018-01-31 removed " "
我想比较多行的无序数据集,将每行中的不匹配数据识别到新列中。例如,我的数据结构如下
org_id <- c("1234", "1234", "1234", "1234", "2345","2345", "2345")
original_value <- c("food", "dental care", "diapers", " ", "care", "housing", "utilities")
new_value <- c("dental care", "emergency food", "diapers", "dental care", "housing", "utilities", "care")
date_change <- c("2018-01-31", "2018-01-31", "2018-01-31", "2018-01-31","2018-01-31", "2018-01-31", "2018-01-31")
df <- data.frame(org_id, original_value,new_value, date_change)
其中每一行代表对组织服务的更改,“date_change”表示更改发生的日期。您会注意到,当您查看与第一个组织相关的更改时,其中一些仅代表所列服务顺序的更改,而不是服务的更改(例如,组织“1234”的“牙科护理”)。我想要一个输出来标识新列中实际删除的值和实际添加的值,如下例所示:
org_id2 <- c("1234", "1234")
removed_value <- c("food", " ")
added_value <- c("emergency food","housing")
date_change2 <- c("2018-01-31","2018-01-31")
df2 <- data.frame(org_id2, removed_value, added_value, date_change2)
关于如何解决这个问题有什么想法吗?谢谢!
也许是这样的:
df %>%
pivot_longer(2:3) %>%
group_by(org_id,date_change,value) %>%
filter(n()==1 & trimws(value)!="") %>%
pivot_wider(id_cols = org_id:date_change,names_from = name, values_from = value)
输出:
org_id date_change original_value new_value
<chr> <chr> <chr> <chr>
1 1234 2018-01-31 food emergency food
如果你想要长格式,并且想要保留空字符串等,你可以这样做:
df %>%
pivot_longer(2:3,names_to="action") %>%
group_by(org_id,date_change,value) %>%
filter(n()==1) %>%
mutate(action=if_else(action=="original_value", "removed", "added"))
输出:
org_id date_change action value
<chr> <chr> <chr> <chr>
1 1234 2018-01-31 removed "food"
2 1234 2018-01-31 added "emergency food"
3 1234 2018-01-31 removed " "