使用 reshape() 和 pivot_long() 将宽数据转换为长数据时出错
Error when transforming wide data to long data using reshape () and pivot_long()
我正在尝试将我的数据从宽数据转换为长数据,但不断出现一些错误:
当前数据结构
Country Month New_users Returning_users Active_users Sum_byyear Sum_bycoun
Angola 2020-06-01 5 0 5 17 47
Angola 2020-09-01 6 2 8 17 47
Angola 2020-10-01 3 1 4 17 47
Angola 2021-01-01 9 7 16 30 47
Angola 2021-03-01 5 3 8 30 47
Angola 2021-05-01 5 1 6 30 47
South Africa 2020-06-01 40 9 49 157 814
South Africa 2020-09-01 65 13 78 157 814
South Africa 2020-10-01 20 10 30 157 814
South Africa 2021-01-01 47 37 84 657 814
South Africa 2021-03-01 224 66 290 657 814
South Africa 2021-05-01 206 77 283 657 814
所需的数据结构
Country Month Users Count Sum_by_year Sum_by_country
Angola 2020-06-01 New_users 5 17 47
Angola 2020-06-01 Returning_users 0 17 47
Angola 2020-06-01 Active_users 5 17 47
Angola 2020-09-01 New_users 6 17 47
Angola 2020-09-01 Returning_users 2 17 47
Angola 2020-09-01 Active_users 8 17 47
Angola 2020-10-01 New_users 3 17 47
Angola 2020-10-01 Returning_users 1 17 47
Angola 2020-10-01 Active_users 4 17 47
Angola 2021-01-01 New_users 9 30 47
Angola 2021-01-01 Returning_users 7 30 47
Angola 2021-01-01 Active_users 16 30 47
Angola 2021-03-01 New_users 5 30 47
Angola 2021-03-01 Returning_users 3 30 47
Angola 2021-03-01 Active_users 8 30 47
Angola 2021-05-01 New_users 5 30 47
Angola 2021-05-01 Returning_users 1 30 47
Angola 2021-05-01 Active_users 6 30 47
Ghana 2020-06-01 New_users 40 157 814
Ghana 2020-06-01 Returning_users 9 157 814
Ghana 2020-06-01 Active_users 49 157 814
Ghana 2020-09-01 New_users 65 157 814
Ghana 2020-09-01 Returning_users 13 157 814
Ghana 2020-09-01 Active_users 78 157 814
Ghana 2020-10-01 New_users 20 157 814
Ghana 2020-10-01 Returning_users 10 157 814
Ghana 2020-10-01 Active_users 30 157 814
Ghana 2020-09-01 Active_users 78 157 814
Ghana 2021-01-01 New_users 47 657 814
Ghana 2021-01-01 Returning_users 37 657 814
Ghana 2021-01-01 Active_users 84 657 814
Ghana 2021-03-01 New_users 224 657 814
Ghana 2021-03-01 Returning_users 66 657 814
Ghana 2021-03-01 Active_users 283 657 814
Ghana 2021-05-01 New_users 206 657 814
Ghana 2021-05-01 Returning_users 77 657 814
Ghana 2021-05-01 Active_users 283 657 814
本质上,它应该转换为 long,因国家和日期而异。我已经转换了日期并且所有其他变量的格式都正确。
我尝试过的:
reshape (df, dir = "long", sep = ","
idvar = "Country",
varying= c("Month"))
Error in guess (varying):
Failed to guess time-varying variables from their names
另一种方法
df%>%pivot_longer (Country, names_to = c(".value", "User"), names_pattern = "(.*)(\d+)")
Error in 'stop_vctrs()':
!Can't recycle '..1' (size 656) to match '...3' (size 0)
请协助。
df <- read.table(text = "Country Month New_users Returning_users Active_users Sum_byyear Sum_bycoun
Angola 2020-06-01 5 0 5 17 47
Angola 2020-09-01 6 2 8 17 47
Angola 2020-10-01 3 1 4 17 47
Angola 2021-01-01 9 7 16 30 47
Angola 2021-03-01 5 3 8 30 47
Angola 2021-05-01 5 1 6 30 47
SouthAfrica 2020-06-01 40 9 49 157 814
SouthAfrica 2020-09-01 65 13 78 157 814
SouthAfrica 2020-10-01 20 10 30 157 814
SouthAfrica 2021-01-01 47 37 84 657 814
SouthAfrica 2021-03-01 224 66 290 657 814
SouthAfrica 2021-05-01 206 77 283 657 814", h = T)
library(tidyverse)
df %>%
pivot_longer(c(New_users, Returning_users, Active_users), names_to = "Users", values_to = "Count") %>%
select(Country, Month, Users, Count, Sum_byyear, Sum_bycoun)
# A tibble: 36 x 6
Country Month Users Count Sum_byyear Sum_bycoun
<chr> <chr> <chr> <int> <int> <int>
1 Angola 2020-06-01 New_users 5 17 47
2 Angola 2020-06-01 Returning_users 0 17 47
3 Angola 2020-06-01 Active_users 5 17 47
4 Angola 2020-09-01 New_users 6 17 47
5 Angola 2020-09-01 Returning_users 2 17 47
6 Angola 2020-09-01 Active_users 8 17 47
7 Angola 2020-10-01 New_users 3 17 47
8 Angola 2020-10-01 Returning_users 1 17 47
9 Angola 2020-10-01 Active_users 4 17 47
10 Angola 2021-01-01 New_users 9 30 47
# ... with 26 more rows
我正在尝试将我的数据从宽数据转换为长数据,但不断出现一些错误:
当前数据结构
Country Month New_users Returning_users Active_users Sum_byyear Sum_bycoun
Angola 2020-06-01 5 0 5 17 47
Angola 2020-09-01 6 2 8 17 47
Angola 2020-10-01 3 1 4 17 47
Angola 2021-01-01 9 7 16 30 47
Angola 2021-03-01 5 3 8 30 47
Angola 2021-05-01 5 1 6 30 47
South Africa 2020-06-01 40 9 49 157 814
South Africa 2020-09-01 65 13 78 157 814
South Africa 2020-10-01 20 10 30 157 814
South Africa 2021-01-01 47 37 84 657 814
South Africa 2021-03-01 224 66 290 657 814
South Africa 2021-05-01 206 77 283 657 814
所需的数据结构
Country Month Users Count Sum_by_year Sum_by_country
Angola 2020-06-01 New_users 5 17 47
Angola 2020-06-01 Returning_users 0 17 47
Angola 2020-06-01 Active_users 5 17 47
Angola 2020-09-01 New_users 6 17 47
Angola 2020-09-01 Returning_users 2 17 47
Angola 2020-09-01 Active_users 8 17 47
Angola 2020-10-01 New_users 3 17 47
Angola 2020-10-01 Returning_users 1 17 47
Angola 2020-10-01 Active_users 4 17 47
Angola 2021-01-01 New_users 9 30 47
Angola 2021-01-01 Returning_users 7 30 47
Angola 2021-01-01 Active_users 16 30 47
Angola 2021-03-01 New_users 5 30 47
Angola 2021-03-01 Returning_users 3 30 47
Angola 2021-03-01 Active_users 8 30 47
Angola 2021-05-01 New_users 5 30 47
Angola 2021-05-01 Returning_users 1 30 47
Angola 2021-05-01 Active_users 6 30 47
Ghana 2020-06-01 New_users 40 157 814
Ghana 2020-06-01 Returning_users 9 157 814
Ghana 2020-06-01 Active_users 49 157 814
Ghana 2020-09-01 New_users 65 157 814
Ghana 2020-09-01 Returning_users 13 157 814
Ghana 2020-09-01 Active_users 78 157 814
Ghana 2020-10-01 New_users 20 157 814
Ghana 2020-10-01 Returning_users 10 157 814
Ghana 2020-10-01 Active_users 30 157 814
Ghana 2020-09-01 Active_users 78 157 814
Ghana 2021-01-01 New_users 47 657 814
Ghana 2021-01-01 Returning_users 37 657 814
Ghana 2021-01-01 Active_users 84 657 814
Ghana 2021-03-01 New_users 224 657 814
Ghana 2021-03-01 Returning_users 66 657 814
Ghana 2021-03-01 Active_users 283 657 814
Ghana 2021-05-01 New_users 206 657 814
Ghana 2021-05-01 Returning_users 77 657 814
Ghana 2021-05-01 Active_users 283 657 814
本质上,它应该转换为 long,因国家和日期而异。我已经转换了日期并且所有其他变量的格式都正确。
我尝试过的:
reshape (df, dir = "long", sep = ","
idvar = "Country",
varying= c("Month"))
Error in guess (varying):
Failed to guess time-varying variables from their names
另一种方法
df%>%pivot_longer (Country, names_to = c(".value", "User"), names_pattern = "(.*)(\d+)")
Error in 'stop_vctrs()':
!Can't recycle '..1' (size 656) to match '...3' (size 0)
请协助。
df <- read.table(text = "Country Month New_users Returning_users Active_users Sum_byyear Sum_bycoun
Angola 2020-06-01 5 0 5 17 47
Angola 2020-09-01 6 2 8 17 47
Angola 2020-10-01 3 1 4 17 47
Angola 2021-01-01 9 7 16 30 47
Angola 2021-03-01 5 3 8 30 47
Angola 2021-05-01 5 1 6 30 47
SouthAfrica 2020-06-01 40 9 49 157 814
SouthAfrica 2020-09-01 65 13 78 157 814
SouthAfrica 2020-10-01 20 10 30 157 814
SouthAfrica 2021-01-01 47 37 84 657 814
SouthAfrica 2021-03-01 224 66 290 657 814
SouthAfrica 2021-05-01 206 77 283 657 814", h = T)
library(tidyverse)
df %>%
pivot_longer(c(New_users, Returning_users, Active_users), names_to = "Users", values_to = "Count") %>%
select(Country, Month, Users, Count, Sum_byyear, Sum_bycoun)
# A tibble: 36 x 6
Country Month Users Count Sum_byyear Sum_bycoun
<chr> <chr> <chr> <int> <int> <int>
1 Angola 2020-06-01 New_users 5 17 47
2 Angola 2020-06-01 Returning_users 0 17 47
3 Angola 2020-06-01 Active_users 5 17 47
4 Angola 2020-09-01 New_users 6 17 47
5 Angola 2020-09-01 Returning_users 2 17 47
6 Angola 2020-09-01 Active_users 8 17 47
7 Angola 2020-10-01 New_users 3 17 47
8 Angola 2020-10-01 Returning_users 1 17 47
9 Angola 2020-10-01 Active_users 4 17 47
10 Angola 2021-01-01 New_users 9 30 47
# ... with 26 more rows