R每隔一列重塑数据框
R reshape dataframe every other column
我很难整理一些以奇怪方式获取的数据...它有一些患者标识符,然后是带有测试日期的列,然后是带有相应测量值的列。但它随着时间的推移重复进行相同的测试,并且数据在后续列中。
数据框是这样的:
df1 <- data.frame(id = c("A","B"),
test1 = c("10-12-16", "12-10-17"),
test1_result = c("20", "3"),
test2 = c("10-01-17", "11-12-17"),
test2_result = c("18", "4"),
test3 = c("12-03-18", "NA"),
test3_result = c("300", "NA"))
我想获得这样的东西:
df2 <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
tests = c("10-12-16", "10-01-17", "12-03-18", "12-10-17", "11-12-17", "NA"),
results = c("20", "18", "300", "3", "4", "NA")
)
我想不出转换它的方法,任何帮助将不胜感激。
谢谢!
您可以尝试 melt
来自 data.table
:
library(data.table)
setDT(df1)
df2 <- melt(df1, id = 'id', measure = patterns('test\d$', '_result'))[
, .(id, tests = value1, results = value2)]
# id tests results
# 1: A 10-12-16 20
# 2: B 12-10-17 3
# 3: A 10-01-17 18
# 4: B 11-12-17 4
# 5: A 12-03-18 300
# 6: B NA NA
这里有一个可能性,使用 dplyr
:
library(tidyverse);
df1 %>%
gather(k1, results, contains("_result")) %>%
mutate(k1 = gsub("_result", "", k1)) %>%
gather(k2, tests, contains("test")) %>%
filter(k1 == k2) %>%
select(id, tests, results)
# id tests results
#1 A 10-12-16 20
#2 B 12-10-17 3
#3 A 10-01-17 18
#4 B 11-12-17 4
#5 A 12-03-18 300
#6 B NA NA
我很难整理一些以奇怪方式获取的数据...它有一些患者标识符,然后是带有测试日期的列,然后是带有相应测量值的列。但它随着时间的推移重复进行相同的测试,并且数据在后续列中。
数据框是这样的:
df1 <- data.frame(id = c("A","B"),
test1 = c("10-12-16", "12-10-17"),
test1_result = c("20", "3"),
test2 = c("10-01-17", "11-12-17"),
test2_result = c("18", "4"),
test3 = c("12-03-18", "NA"),
test3_result = c("300", "NA"))
我想获得这样的东西:
df2 <- data.frame(id = c("A", "A", "A", "B", "B", "B"),
tests = c("10-12-16", "10-01-17", "12-03-18", "12-10-17", "11-12-17", "NA"),
results = c("20", "18", "300", "3", "4", "NA")
)
我想不出转换它的方法,任何帮助将不胜感激。
谢谢!
您可以尝试 melt
来自 data.table
:
library(data.table)
setDT(df1)
df2 <- melt(df1, id = 'id', measure = patterns('test\d$', '_result'))[
, .(id, tests = value1, results = value2)]
# id tests results
# 1: A 10-12-16 20
# 2: B 12-10-17 3
# 3: A 10-01-17 18
# 4: B 11-12-17 4
# 5: A 12-03-18 300
# 6: B NA NA
这里有一个可能性,使用 dplyr
:
library(tidyverse);
df1 %>%
gather(k1, results, contains("_result")) %>%
mutate(k1 = gsub("_result", "", k1)) %>%
gather(k2, tests, contains("test")) %>%
filter(k1 == k2) %>%
select(id, tests, results)
# id tests results
#1 A 10-12-16 20
#2 B 12-10-17 3
#3 A 10-01-17 18
#4 B 11-12-17 4
#5 A 12-03-18 300
#6 B NA NA