重组数据框 R - 在一列中收集年份
Restructure data frame R - gather years in one column
我有一个大数据框,如下所示:
location
type
2005
2006
2007
Sentenced
Female
College
Sentenced
Female
College
Sentenced
Female
College
Paris
1
Yes
No
Yes
No
No
Yes
No
Yes
No
Paris
2
No
No
No
Yes
No
Yes
No
No
Yes
Paris
3
No
Yes
No
Yes
No
Yes
Yes
No
Yes
Madrid
1
Yes
No
No
No
Yes
No
No
Yes
No
Madrid
2
No
Yes
No
No
Yes
No
Yes
No
Yes
Miami
1
Yes
No
Yes
Yes
No
Yes
Yes
No
Yes
我想重组它,使其看起来像这样:
year
location
Type
Sentenced
Female
College
2005
Paris
1
Yes
No
Yes
2005
Paris
2
Yes
No
Yes
2005
Paris
3
Yes
No
Yes
2005
Madrid
1
Yes
No
Yes
2005
Madrid
2
Yes
No
Yes
2005
Miami
1
Yes
No
Yes
2006
Paris
1
Yes
No
Yes
2006
Paris
2
Yes
No
Yes
2006
Paris
3
Yes
No
Yes
2006
Madrid
1
Yes
No
Yes
2006
Madrid
2
Yes
No
Yes
2006
Miami
3
Yes
No
Yes
请不要关注两个表的内部有效性。只是为了视觉化。
我尝试了 R 中的 gather 函数,但失败了,因为它似乎每年只需要一个变量而不是三个(在我的例子中:被判刑、女性、大学)。
有什么建议吗?
谢谢
我试过重现你的例子:
test <- structure(list(location = c(NA, "Paris", "Paris", "Paris", "Madrid",
"Madrid", "Miami"), type = c(NA, 1, 2, 3, 1, 2, 1), `2005...3` = c("Sentenced",
"Yes", "No", "No", "Yes", "No", "Yes"), `2005...4` = c("Female",
"No", "No", "Yes", "No", "Yes", "No"), `2005...5` = c("College",
"Yes", "No", "No", "No", "No", "Yes"), `2006...6` = c("Sentenced",
"No", "Yes", "Yes", "No", "No", "Yes"), `2006...7` = c("Female",
"No", "No", "No", "Yes", "Yes", "No"), `2006...8` = c("College",
"Yes", "Yes", "Yes", "No", "No", "Yes"), `2007...9` = c("Sentenced",
"No", "No", "Yes", "No", "Yes", "Yes"), `2007...10` = c("Female",
"Yes", "No", "No", "Yes", "No", "No"), `2007...11` = c("College",
"No", "Yes", "Yes", "No", "Yes", "Yes")), row.names = c(NA, -7L
), class = c("tbl_df", "tbl", "data.frame"))
您基本上需要合并前两行以形成一个 header 并使用以下代码
names(test) <- paste(names(test),test[1,],sep = "_")
test <- test[-1,]
test <- gather(test,"key","value",3:11)
test <- test %>% separate(key,c("Year","Key"),"_")
test <- test %>% separate(Year,c("Year","Garbage"),"[.]")
test <- test %>% select(-Garbage)
test <- test %>% spread(Key,value)
我有一个大数据框,如下所示:
location | type | 2005 | 2006 | 2007 | ||||||
---|---|---|---|---|---|---|---|---|---|---|
Sentenced | Female | College | Sentenced | Female | College | Sentenced | Female | College | ||
Paris | 1 | Yes | No | Yes | No | No | Yes | No | Yes | No |
Paris | 2 | No | No | No | Yes | No | Yes | No | No | Yes |
Paris | 3 | No | Yes | No | Yes | No | Yes | Yes | No | Yes |
Madrid | 1 | Yes | No | No | No | Yes | No | No | Yes | No |
Madrid | 2 | No | Yes | No | No | Yes | No | Yes | No | Yes |
Miami | 1 | Yes | No | Yes | Yes | No | Yes | Yes | No | Yes |
我想重组它,使其看起来像这样:
year | location | Type | Sentenced | Female | College |
---|---|---|---|---|---|
2005 | Paris | 1 | Yes | No | Yes |
2005 | Paris | 2 | Yes | No | Yes |
2005 | Paris | 3 | Yes | No | Yes |
2005 | Madrid | 1 | Yes | No | Yes |
2005 | Madrid | 2 | Yes | No | Yes |
2005 | Miami | 1 | Yes | No | Yes |
2006 | Paris | 1 | Yes | No | Yes |
2006 | Paris | 2 | Yes | No | Yes |
2006 | Paris | 3 | Yes | No | Yes |
2006 | Madrid | 1 | Yes | No | Yes |
2006 | Madrid | 2 | Yes | No | Yes |
2006 | Miami | 3 | Yes | No | Yes |
请不要关注两个表的内部有效性。只是为了视觉化。
我尝试了 R 中的 gather 函数,但失败了,因为它似乎每年只需要一个变量而不是三个(在我的例子中:被判刑、女性、大学)。
有什么建议吗?
谢谢
我试过重现你的例子:
test <- structure(list(location = c(NA, "Paris", "Paris", "Paris", "Madrid",
"Madrid", "Miami"), type = c(NA, 1, 2, 3, 1, 2, 1), `2005...3` = c("Sentenced",
"Yes", "No", "No", "Yes", "No", "Yes"), `2005...4` = c("Female",
"No", "No", "Yes", "No", "Yes", "No"), `2005...5` = c("College",
"Yes", "No", "No", "No", "No", "Yes"), `2006...6` = c("Sentenced",
"No", "Yes", "Yes", "No", "No", "Yes"), `2006...7` = c("Female",
"No", "No", "No", "Yes", "Yes", "No"), `2006...8` = c("College",
"Yes", "Yes", "Yes", "No", "No", "Yes"), `2007...9` = c("Sentenced",
"No", "No", "Yes", "No", "Yes", "Yes"), `2007...10` = c("Female",
"Yes", "No", "No", "Yes", "No", "No"), `2007...11` = c("College",
"No", "Yes", "Yes", "No", "Yes", "Yes")), row.names = c(NA, -7L
), class = c("tbl_df", "tbl", "data.frame"))
您基本上需要合并前两行以形成一个 header 并使用以下代码
names(test) <- paste(names(test),test[1,],sep = "_")
test <- test[-1,]
test <- gather(test,"key","value",3:11)
test <- test %>% separate(key,c("Year","Key"),"_")
test <- test %>% separate(Year,c("Year","Garbage"),"[.]")
test <- test %>% select(-Garbage)
test <- test %>% spread(Key,value)