如何将列拆分为多列并在 R 中查找频率?
How to split colums into multiple colums and find frequency in R?
ODK 响应
1
1; 2
1; 2; 3
1; 2; 3; 5
1; 2; 4
1; 2; 4; 5; 6
1; 2; 4; 6
1; 2; 4; 7
1 is Crop failure-
2 is Water shortage
3 is Land degradation
4 is Lack of HH Labor
5 is Lack of income from agriculture
6 is Lack of manure / fertilizer
7 is Others
我想要一个这样的table
Crop failure- 8
Water shortage- 7
Land degradation- 6
Lack of HH Labor- 1
Lack of income from agriculture- 2
Lack of manure / fertilizer- 2
Others- 1
我曾在 'Split single column with multiple values into multiple colums' 中尝试过在 R 中使用 dplyr 但无法提供帮助。
使用plyr
,您可以获得:
Condition = c("Crop failure", "Water shortage", "Lang degradation", "Lack of HH Labor", "Lack of income from agriculture", "Lack of manure / fertilizer", "Others")
Type = c(1:7)
df = data.frame(Condition, Type)
vector = c(1,1,2,1,2,3,1,2,3,5,1,2,4,1,2,4,5,6,1,2,4,6,1,2,4,7)
t = plyr::count(vector)
colnames(t) = c("Type","Freq")
df =merge(df,t)
你得到:
> df
Type Condition Freq
1 1 Crop failure 8
2 2 Water shortage 7
3 3 Lang degradation 2
4 4 Lack of HH Labor 4
5 5 Lack of income from agriculture 2
6 6 Lack of manure / fertilizer 2
7 7 Others 1
- 如果您想使用
base R
,以下解决方案可能对您有所帮助。
假设您的输入是
response <- c(1, 1, 2, 1, 2, 3, 1, 2, 3, 5, 1, 2, 4, 1, 2, 4, 5, 6, 1, 2, 4, 6, 1, 2, 4, 7)
然后
status <- c("Crop failure", "Water shortage", "Lang degradation", "Lack of HH Labor", "Lack of income from agriculture", "Lack of manure / fertilizer", "Others")
df <- as.data.frame(table(factor(response,labels = status),dnn = list("Status")))
可以给你这样的输出
> df
Status Freq
1 Crop failure 8
2 Water shortage 7
3 Lang degradation 2
4 Lack of HH Labor 4
5 Lack of income from agriculture 2
6 Lack of manure / fertilizer 2
7 Others 1
- 如果想要详细的table:
假设您的输入是:
r <- list(1, c(1, 2), c(1, 2, 3), c(1, 2, 3, 5), c(1, 2, 4), c(1,
2, 4, 5, 6), c(1, 2, 4), 6, c(1, 2, 4, 7))
type = seq(1,7)
dt <- as.data.frame(t(sapply(r, function(v) sapply(type, function(k) sum(k==v)))))
colnames(M) <- paste0("type",type)
这给出了
> dt
type1 type2 type3 type4 type5 type6 type7
1 1 0 0 0 0 0 0
2 1 1 0 0 0 0 0
3 1 1 1 0 0 0 0
4 1 1 1 0 1 0 0
5 1 1 0 1 0 0 0
6 1 1 0 1 1 1 0
7 1 1 0 1 0 0 0
8 0 0 0 0 0 1 0
9 1 1 0 1 0 0 1
此外,每个类型条目的总和可以通过colSums
计算:
> colSums(dt)
type1 type2 type3 type4 type5 type6 type7
8 7 2 4 2 2 1
或者您可以使用 match()
,即
dt <- as.data.frame(t(sapply(r, function(v) !is.na(match(type,v)))))
> dt
type1 type2 type3 type4 type5 type6 type7
1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE
2 TRUE TRUE FALSE FALSE FALSE FALSE FALSE
3 TRUE TRUE TRUE FALSE FALSE FALSE FALSE
4 TRUE TRUE TRUE FALSE TRUE FALSE FALSE
5 TRUE TRUE FALSE TRUE FALSE FALSE FALSE
6 TRUE TRUE FALSE TRUE TRUE TRUE FALSE
7 TRUE TRUE FALSE TRUE FALSE FALSE FALSE
8 FALSE FALSE FALSE FALSE FALSE TRUE FALSE
9 TRUE TRUE FALSE TRUE FALSE FALSE TRUE
这是我对这个问题的看法。我使用 tidyverse,因为它为我加载了 stringr 和 tidyr
library(tidyverse)
id <- data.frame(Code = 1:7, #Make a coding data frame so you can label the results
Cause = c("Crop failure", "Water shortage", "Land degradation", "Lack of HH Labor", "Lack of income from agriculture", "Lack of manure / fertilizer", "Others"), stringsAsFactors = FALSE))
data <- Book1 %>%
separate(X1, into = paste0("X", 1:7), sep = ";") %>% #split the data by the ;, This induces NA that are removed later
gather(key = "drop", value = "Code") %>% #put it into 1 column to exploit R's vectorization
mutate(Code = as.integer(Code)) %>% #Make the code an integer for the join later
filter(!is.na(Code)) %>% #remove those previous NAs
group_by(Code) %>%
count() %>% # Counts
left_join(., id) #labels
colnames(data) <- c("Code", "Count", "Cause")
它会在单独的行中发出警告,但它只是让您知道它正在用我们稍后删除的 NA 填充额外的单元格。您可能需要更改的唯一内容是 DataFrame 和 X1,具体取决于您为对象命名的内容。
这是我的结果
Code Count Cause
<int> <int> <chr>
1 1 8 Crop failure
2 2 7 Water shortage
3 3 2 Land degradation
4 4 4 Lack of HH Labor
5 5 2 Lack of income from agriculture
6 6 2 Lack of manure / fertilizer
7 7 1 Others
希望对您有所帮助!!
ODK 响应
1
1; 2
1; 2; 3
1; 2; 3; 5
1; 2; 4
1; 2; 4; 5; 6
1; 2; 4; 6
1; 2; 4; 7
1 is Crop failure-
2 is Water shortage
3 is Land degradation
4 is Lack of HH Labor
5 is Lack of income from agriculture
6 is Lack of manure / fertilizer
7 is Others
我想要一个这样的table
Crop failure- 8
Water shortage- 7
Land degradation- 6
Lack of HH Labor- 1
Lack of income from agriculture- 2
Lack of manure / fertilizer- 2
Others- 1
我曾在 'Split single column with multiple values into multiple colums' 中尝试过在 R 中使用 dplyr 但无法提供帮助。
使用plyr
,您可以获得:
Condition = c("Crop failure", "Water shortage", "Lang degradation", "Lack of HH Labor", "Lack of income from agriculture", "Lack of manure / fertilizer", "Others")
Type = c(1:7)
df = data.frame(Condition, Type)
vector = c(1,1,2,1,2,3,1,2,3,5,1,2,4,1,2,4,5,6,1,2,4,6,1,2,4,7)
t = plyr::count(vector)
colnames(t) = c("Type","Freq")
df =merge(df,t)
你得到:
> df
Type Condition Freq
1 1 Crop failure 8
2 2 Water shortage 7
3 3 Lang degradation 2
4 4 Lack of HH Labor 4
5 5 Lack of income from agriculture 2
6 6 Lack of manure / fertilizer 2
7 7 Others 1
- 如果您想使用
base R
,以下解决方案可能对您有所帮助。
假设您的输入是
response <- c(1, 1, 2, 1, 2, 3, 1, 2, 3, 5, 1, 2, 4, 1, 2, 4, 5, 6, 1, 2, 4, 6, 1, 2, 4, 7)
然后
status <- c("Crop failure", "Water shortage", "Lang degradation", "Lack of HH Labor", "Lack of income from agriculture", "Lack of manure / fertilizer", "Others")
df <- as.data.frame(table(factor(response,labels = status),dnn = list("Status")))
可以给你这样的输出
> df
Status Freq
1 Crop failure 8
2 Water shortage 7
3 Lang degradation 2
4 Lack of HH Labor 4
5 Lack of income from agriculture 2
6 Lack of manure / fertilizer 2
7 Others 1
- 如果想要详细的table: 假设您的输入是:
r <- list(1, c(1, 2), c(1, 2, 3), c(1, 2, 3, 5), c(1, 2, 4), c(1,
2, 4, 5, 6), c(1, 2, 4), 6, c(1, 2, 4, 7))
type = seq(1,7)
dt <- as.data.frame(t(sapply(r, function(v) sapply(type, function(k) sum(k==v)))))
colnames(M) <- paste0("type",type)
这给出了
> dt
type1 type2 type3 type4 type5 type6 type7
1 1 0 0 0 0 0 0
2 1 1 0 0 0 0 0
3 1 1 1 0 0 0 0
4 1 1 1 0 1 0 0
5 1 1 0 1 0 0 0
6 1 1 0 1 1 1 0
7 1 1 0 1 0 0 0
8 0 0 0 0 0 1 0
9 1 1 0 1 0 0 1
此外,每个类型条目的总和可以通过colSums
计算:
> colSums(dt)
type1 type2 type3 type4 type5 type6 type7
8 7 2 4 2 2 1
或者您可以使用 match()
,即
dt <- as.data.frame(t(sapply(r, function(v) !is.na(match(type,v)))))
> dt
type1 type2 type3 type4 type5 type6 type7
1 TRUE FALSE FALSE FALSE FALSE FALSE FALSE
2 TRUE TRUE FALSE FALSE FALSE FALSE FALSE
3 TRUE TRUE TRUE FALSE FALSE FALSE FALSE
4 TRUE TRUE TRUE FALSE TRUE FALSE FALSE
5 TRUE TRUE FALSE TRUE FALSE FALSE FALSE
6 TRUE TRUE FALSE TRUE TRUE TRUE FALSE
7 TRUE TRUE FALSE TRUE FALSE FALSE FALSE
8 FALSE FALSE FALSE FALSE FALSE TRUE FALSE
9 TRUE TRUE FALSE TRUE FALSE FALSE TRUE
这是我对这个问题的看法。我使用 tidyverse,因为它为我加载了 stringr 和 tidyr
library(tidyverse)
id <- data.frame(Code = 1:7, #Make a coding data frame so you can label the results
Cause = c("Crop failure", "Water shortage", "Land degradation", "Lack of HH Labor", "Lack of income from agriculture", "Lack of manure / fertilizer", "Others"), stringsAsFactors = FALSE))
data <- Book1 %>%
separate(X1, into = paste0("X", 1:7), sep = ";") %>% #split the data by the ;, This induces NA that are removed later
gather(key = "drop", value = "Code") %>% #put it into 1 column to exploit R's vectorization
mutate(Code = as.integer(Code)) %>% #Make the code an integer for the join later
filter(!is.na(Code)) %>% #remove those previous NAs
group_by(Code) %>%
count() %>% # Counts
left_join(., id) #labels
colnames(data) <- c("Code", "Count", "Cause")
它会在单独的行中发出警告,但它只是让您知道它正在用我们稍后删除的 NA 填充额外的单元格。您可能需要更改的唯一内容是 DataFrame 和 X1,具体取决于您为对象命名的内容。
这是我的结果
Code Count Cause
<int> <int> <chr>
1 1 8 Crop failure
2 2 7 Water shortage
3 3 2 Land degradation
4 4 4 Lack of HH Labor
5 5 2 Lack of income from agriculture
6 6 2 Lack of manure / fertilizer
7 7 1 Others
希望对您有所帮助!!