展开列并按 r 中的 ID 计数
spread columns and count by ID in r
我有一个名为 Lead_DataSource__c
的因子列。我想将每个因素分散到一列中,然后通过 id 为每一行显示的该因素的计数来填补空白。
这是我的数据框的头部;
head(df)
Id Lead_DataSource__c numberoflead leadduration lasttouch firsttouch
<chr> <chr> <int> <drtn> <chr> <chr>
1 0010I000026fxp6QAA NA 1 NA days NA NA
2 0010I000026frM6QAI Walk in 1 0.0000 days Walk in Walk in
3 0010I000026frOQQAY Walk in 1 0.0000 days Walk in Walk in
4 0010I000026frsUQAQ Walk in 3 243.9656 days Walk in Facebook
5 0010I000026frsUQAQ Facebook 3 243.9656 days Walk in Facebook
6 0010I000026frsUQAQ Facebook 3 243.9656 days Walk in Facebook
我需要这个;
Id lastcreateddateoflead lasttouch firsttouch Facebook Walk.in <NA>
1 0010I000026frM6QAI 43575 Walk in Walk in 0 1 0
2 0010I000026frOQQAY 43843 Walk in Walk in 0 1 0
3 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
4 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
5 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
6 0010I000026fsBrQAI 43699 Facebook Facebook 1 0 0
到目前为止,我已经使用 dplyr 进行了尝试,但是我没有得到上面看到的我想要的东西;
df%>%
group_by(Id,Lead_DataSource__c) %>%
mutate(numberofleadsource=n()) %>%
spread(Lead_DataSource__c,numberofleadsource,fill = 0)
这是我的代码的输出;
Id lastcreateddateoflead lasttouch firsttouch Facebook Walk.in <NA>
1 0010I000026frM6QAI 43575 Walk in Walk in 0 1 0
2 0010I000026frOQQAY 43843 Walk in Walk in 0 1 0
3 0010I000026frsUQAQ 43794 Walk in Facebook 2 0 0
4 0010I000026frsUQAQ 43794 Walk in Facebook 2 0 0
5 0010I000026frsUQAQ 43794 Walk in Facebook 0 1 0
6 0010I000026fsBrQAI 43699 Facebook Facebook 1 0 0
谁能帮我解决我在这里遗漏的问题?
输入数据:
structure(list(Id = c("0010I000026fxp6QAA", "0010I000026frM6QAI",
"0010I000026frOQQAY", "0010I000026frsUQAQ", "0010I000026frsUQAQ",
"0010I000026frsUQAQ"), Lead_DataSource__c = c(NA, "Walk in",
"Walk in", "Walk in", "Facebook", "Facebook"), numberoflead = c(1L,
1L, 1L, 3L, 3L, 3L), leadduration = structure(c(NA, 0, 0, 243.9656,
243.9656, 243.9656), class = "difftime", units = "days"), lasttouch = c(NA,
"Walk in", "Walk in", "Walk in", "Walk in", "Walk in"), firsttouch = c(NA,
"Walk in", "Walk in", "Facebook", "Facebook", "Facebook")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
这里我用add_count()
统计每个ID/lead源组合出现了多少次,然后pivot_wider()
展开。最后一行填充了主元的缺失值。
library(dplyr)
library(tidyr)
df %>%
add_count(Id, Lead_DataSource__c) %>%
mutate(tmp = 1:nrow(.)) %>%
pivot_wider(names_from = Lead_DataSource__c, values_from = n) %>%
select(-tmp) %>%
group_by(Id) %>%
mutate_at(c("NA", "Walk in", "Facebook"), ~ifelse(any(!is.na(.)), .[!is.na(.)][1], 0))
# A tibble: 6 x 8
# Groups: Id [4]
Id numberoflead leadduration lasttouch firsttouch `NA` `Walk in` Facebook
<chr> <int> <drtn> <chr> <chr> <dbl> <dbl> <dbl>
1 0010I000026fxp6QAA 1 NA days NA NA 1 0 0
2 0010I000026frM6QAI 1 0.0000 days Walk in Walk in 0 1 0
3 0010I000026frOQQAY 1 0.0000 days Walk in Walk in 0 1 0
4 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
5 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
6 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
我有一个名为 Lead_DataSource__c
的因子列。我想将每个因素分散到一列中,然后通过 id 为每一行显示的该因素的计数来填补空白。
这是我的数据框的头部;
head(df)
Id Lead_DataSource__c numberoflead leadduration lasttouch firsttouch
<chr> <chr> <int> <drtn> <chr> <chr>
1 0010I000026fxp6QAA NA 1 NA days NA NA
2 0010I000026frM6QAI Walk in 1 0.0000 days Walk in Walk in
3 0010I000026frOQQAY Walk in 1 0.0000 days Walk in Walk in
4 0010I000026frsUQAQ Walk in 3 243.9656 days Walk in Facebook
5 0010I000026frsUQAQ Facebook 3 243.9656 days Walk in Facebook
6 0010I000026frsUQAQ Facebook 3 243.9656 days Walk in Facebook
我需要这个;
Id lastcreateddateoflead lasttouch firsttouch Facebook Walk.in <NA>
1 0010I000026frM6QAI 43575 Walk in Walk in 0 1 0
2 0010I000026frOQQAY 43843 Walk in Walk in 0 1 0
3 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
4 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
5 0010I000026frsUQAQ 43794 Walk in Facebook 2 1 0
6 0010I000026fsBrQAI 43699 Facebook Facebook 1 0 0
到目前为止,我已经使用 dplyr 进行了尝试,但是我没有得到上面看到的我想要的东西;
df%>%
group_by(Id,Lead_DataSource__c) %>%
mutate(numberofleadsource=n()) %>%
spread(Lead_DataSource__c,numberofleadsource,fill = 0)
这是我的代码的输出;
Id lastcreateddateoflead lasttouch firsttouch Facebook Walk.in <NA>
1 0010I000026frM6QAI 43575 Walk in Walk in 0 1 0
2 0010I000026frOQQAY 43843 Walk in Walk in 0 1 0
3 0010I000026frsUQAQ 43794 Walk in Facebook 2 0 0
4 0010I000026frsUQAQ 43794 Walk in Facebook 2 0 0
5 0010I000026frsUQAQ 43794 Walk in Facebook 0 1 0
6 0010I000026fsBrQAI 43699 Facebook Facebook 1 0 0
谁能帮我解决我在这里遗漏的问题?
输入数据:
structure(list(Id = c("0010I000026fxp6QAA", "0010I000026frM6QAI",
"0010I000026frOQQAY", "0010I000026frsUQAQ", "0010I000026frsUQAQ",
"0010I000026frsUQAQ"), Lead_DataSource__c = c(NA, "Walk in",
"Walk in", "Walk in", "Facebook", "Facebook"), numberoflead = c(1L,
1L, 1L, 3L, 3L, 3L), leadduration = structure(c(NA, 0, 0, 243.9656,
243.9656, 243.9656), class = "difftime", units = "days"), lasttouch = c(NA,
"Walk in", "Walk in", "Walk in", "Walk in", "Walk in"), firsttouch = c(NA,
"Walk in", "Walk in", "Facebook", "Facebook", "Facebook")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
这里我用add_count()
统计每个ID/lead源组合出现了多少次,然后pivot_wider()
展开。最后一行填充了主元的缺失值。
library(dplyr)
library(tidyr)
df %>%
add_count(Id, Lead_DataSource__c) %>%
mutate(tmp = 1:nrow(.)) %>%
pivot_wider(names_from = Lead_DataSource__c, values_from = n) %>%
select(-tmp) %>%
group_by(Id) %>%
mutate_at(c("NA", "Walk in", "Facebook"), ~ifelse(any(!is.na(.)), .[!is.na(.)][1], 0))
# A tibble: 6 x 8
# Groups: Id [4]
Id numberoflead leadduration lasttouch firsttouch `NA` `Walk in` Facebook
<chr> <int> <drtn> <chr> <chr> <dbl> <dbl> <dbl>
1 0010I000026fxp6QAA 1 NA days NA NA 1 0 0
2 0010I000026frM6QAI 1 0.0000 days Walk in Walk in 0 1 0
3 0010I000026frOQQAY 1 0.0000 days Walk in Walk in 0 1 0
4 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
5 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2
6 0010I000026frsUQAQ 3 243.9656 days Walk in Facebook 0 1 2