R 在列中拆分分隔字符串并作为新列插入(二进制)
R Split delimited strings in a column and insert as new column (in binary)
我的数据框如下
+---+-----------+
|lot|Combination|
+---+-----------+
|A01|A,B,C,D,E,F|
|A01|A,B,C |
|A02|B,C,D,E |
|A03|A,B,D,F |
|A04|A,C,D,E,F |
+---+-----------+
每个字母表都是一个用逗号分隔的字符,我想在每个逗号上拆分 'Combination' 并将拆分的字符串作为新列以二进制形式插入。例如,所需的输出将是:
+---+-+-+-+-+-+-+
|lot|A|B|C|D|E|F|
+---+-+-+-+-+-+-+
|A01|1|1|1|1|1|1|
|A01|1|1|1|0|0|0|
|A02|0|1|1|1|1|0|
|A03|1|1|0|1|0|1|
|A04|1|0|1|1|1|1|
+---+-+-+-+-+-+-+
任何帮助将不胜感激:)
请以可直接用作回答者输入的形式提供您的示例输入数据。我自己在这里添加了相同的样本数据。希望有所帮助。
library(tidyr)
library(dplyr)
lot <- c("A01", "A02", "A03","A04")
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,C")
df <- data.frame(lot, Combination)
df
separate(df, Combination, into=paste("V",1:6, sep=""), sep=",") %>%
gather(key, value,-lot) %>%
filter(!is.na(value)) %>%
mutate(yesno = 1) %>%
distinct %>%
spread(value, yesno, fill = 0) %>% select(-key)
要了解此处发生的情况,运行 从单独的 separate() 开始的每个步骤。 %>% 是管道运算符 shorthand 用于将上一行的结果添加为下一行的第一个参数。
使用 dplyr
和 tidyr
的解决方案。 dt2
是最终输出。
# Load packages
library(dplyr)
library(tidyr)
# Create example data frame
dt <- lot <- c("A01", "A01", "A02", "A03","A04")
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,B,D,F", "A,C,D,E,F")
dt <- data_frame(lot, Combination)
# Process the data
dt2 <- dt %>%
mutate(ID = 1:n()) %>%
mutate(Combination = strsplit(Combination, split = ",")) %>%
unnest() %>%
mutate(Value = 1) %>%
spread(Combination, Value, fill = 0) %>%
select(-ID)
另一个选项,使用方便的 separate_rows()
函数:
df <- read.table( text = "lot|Combination
A01|A,B,C,D,E,F
A01|A,B,C
A02|B,C,D,E
A03|A,B,D,F
A04|A,C,D,E,F", sep ="|", header = TRUE)
library(tidyverse)
df %>%
mutate(id = row_number(), flg = 1) %>%
separate_rows(Combination, sep = ",") %>%
spread(Combination, flg)
给出:
lot id A B C D E F
1 A01 1 1 1 1 1 1 1
2 A01 2 1 1 1 NA NA NA
3 A02 3 NA 1 1 1 1 NA
4 A03 4 1 1 NA 1 NA 1
5 A04 5 1 NA 1 1 1 1
我的数据框如下
+---+-----------+
|lot|Combination|
+---+-----------+
|A01|A,B,C,D,E,F|
|A01|A,B,C |
|A02|B,C,D,E |
|A03|A,B,D,F |
|A04|A,C,D,E,F |
+---+-----------+
每个字母表都是一个用逗号分隔的字符,我想在每个逗号上拆分 'Combination' 并将拆分的字符串作为新列以二进制形式插入。例如,所需的输出将是:
+---+-+-+-+-+-+-+
|lot|A|B|C|D|E|F|
+---+-+-+-+-+-+-+
|A01|1|1|1|1|1|1|
|A01|1|1|1|0|0|0|
|A02|0|1|1|1|1|0|
|A03|1|1|0|1|0|1|
|A04|1|0|1|1|1|1|
+---+-+-+-+-+-+-+
任何帮助将不胜感激:)
请以可直接用作回答者输入的形式提供您的示例输入数据。我自己在这里添加了相同的样本数据。希望有所帮助。
library(tidyr)
library(dplyr)
lot <- c("A01", "A02", "A03","A04")
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,C")
df <- data.frame(lot, Combination)
df
separate(df, Combination, into=paste("V",1:6, sep=""), sep=",") %>%
gather(key, value,-lot) %>%
filter(!is.na(value)) %>%
mutate(yesno = 1) %>%
distinct %>%
spread(value, yesno, fill = 0) %>% select(-key)
要了解此处发生的情况,运行 从单独的 separate() 开始的每个步骤。 %>% 是管道运算符 shorthand 用于将上一行的结果添加为下一行的第一个参数。
使用 dplyr
和 tidyr
的解决方案。 dt2
是最终输出。
# Load packages
library(dplyr)
library(tidyr)
# Create example data frame
dt <- lot <- c("A01", "A01", "A02", "A03","A04")
Combination <- c("A,B,C,D,E,F", "A,B,C","B,C,D,E", "A,B,D,F", "A,C,D,E,F")
dt <- data_frame(lot, Combination)
# Process the data
dt2 <- dt %>%
mutate(ID = 1:n()) %>%
mutate(Combination = strsplit(Combination, split = ",")) %>%
unnest() %>%
mutate(Value = 1) %>%
spread(Combination, Value, fill = 0) %>%
select(-ID)
另一个选项,使用方便的 separate_rows()
函数:
df <- read.table( text = "lot|Combination
A01|A,B,C,D,E,F
A01|A,B,C
A02|B,C,D,E
A03|A,B,D,F
A04|A,C,D,E,F", sep ="|", header = TRUE)
library(tidyverse)
df %>%
mutate(id = row_number(), flg = 1) %>%
separate_rows(Combination, sep = ",") %>%
spread(Combination, flg)
给出:
lot id A B C D E F
1 A01 1 1 1 1 1 1 1
2 A01 2 1 1 1 NA NA NA
3 A02 3 NA 1 1 1 1 NA
4 A03 4 1 1 NA 1 NA 1
5 A04 5 1 NA 1 1 1 1