将广泛的数据收集/融合到不同的值列中
gather / melt wide data into different value columns
我有宽格式的数据,其中有两组不同的值列:包含质量(Mass1、Mass2 等)的列和包含相应日期( Mass1_date、Mass2_date 等)。
library(tidyr)
library(dplyr)
library(lubridate)
df <- structure(list(Year = 2004, Nest_no = 21, Mass1 = 2325, Mass1_date = structure(1081987200, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Mass2 = 2000, Mass2_date = structure(1082851200, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Mass3 = 1750, Mass3_date = structure(1083715200, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .Names = c("Year", "Nest_no", "Mass1",
"Mass1_date", "Mass2", "Mass2_date", "Mass3", "Mass3_date"))
df
## Source: local data frame [1 x 8]
##
## Year Nest_no Mass1 Mass1_date Mass2 Mass2_date Mass3 Mass3_date
## (dbl) (dbl) (dbl) (time) (dbl) (time) (dbl) (time)
## 1 2004 21 2325 2004-04-15 2000 2004-04-25 1750 2004-05-05
我想把数据"tidy"变成长格式,其中两组数值列被gather
ed(melt
ed)变成二不同的值列 ,一列包含 'Mass columns' 的值,一列包含 'date columns' 的值:
## Source: local data frame [3 x 5]
##
## Year Nest_no capture date weight
## (dbl) (dbl) (dbl) (date) (dbl)
## 1 2004 21 1 2004-04-15 2325
## 2 2004 21 2 2004-04-25 2000
## 3 2004 21 3 2004-05-05 1750
起初,我以为我可以使用tidyr
,分两步完成。
gather(df, capture, date, contains("Date")) %>%
gather(capture2, weight, contains("Mass"))
## Source: local data frame [9 x 6]
##
## Year Nest_no capture date capture2 weight
## (dbl) (dbl) (chr) (time) (chr) (dbl)
## 1 2004 21 Mass1_date 2004-04-15 Mass1 2325
## 2 2004 21 Mass2_date 2004-04-25 Mass1 2325
## 3 2004 21 Mass3_date 2004-05-05 Mass1 2325
## 4 2004 21 Mass1_date 2004-04-15 Mass2 2000
## 5 2004 21 Mass2_date 2004-04-25 Mass2 2000
## 6 2004 21 Mass3_date 2004-05-05 Mass2 2000
## 7 2004 21 Mass1_date 2004-04-15 Mass3 1750
## 8 2004 21 Mass2_date 2004-04-25 Mass3 1750
## 9 2004 21 Mass3_date 2004-05-05 Mass3 1750
但是,它没有按预期工作。经过几次尝试,我上来了
使用此解决方案:
df <- gather(df, capture2, weight, contains("Mass"), convert = T) %>%
mutate(capture = extract_numeric(capture2))
## Warning: attributes are not identical across measure variables; they will
## be dropped
df$capture2 <- ifelse(grepl("date", df$capture2), "date", "weight")
df <- spread(df, capture2, weight) %>%
mutate(date = as.Date(as.POSIXct(date, origin = "1970-01-01")))
df
## Source: local data frame [3 x 5]
##
## Year Nest_no capture date weight
## (dbl) (dbl) (dbl) (date) (dbl)
## 1 2004 21 1 2004-04-15 2325
## 2 2004 21 2 2004-04-25 2000
## 3 2004 21 3 2004-05-05 1750
我想知道是否有更好的方法来实现这个目标?
谢谢你,菲利普
我们可以使用 data.table
中的 melt
轻松做到这一点。 measure
可以采用多个 patterns
列名并将 'wide' 转换为 'long' 格式。
library(data.table)
melt(as.data.table(df), measure=patterns('\d$', 'date$'),
variable.name='capture', value.name= c('weight', 'date'))
# Year Nest_no capture weight date
#1: 2004 21 1 2325 2004-04-15
#2: 2004 21 2 2000 2004-04-25
#3: 2004 21 3 1750 2004-05-05
我有宽格式的数据,其中有两组不同的值列:包含质量(Mass1、Mass2 等)的列和包含相应日期( Mass1_date、Mass2_date 等)。
library(tidyr)
library(dplyr)
library(lubridate)
df <- structure(list(Year = 2004, Nest_no = 21, Mass1 = 2325, Mass1_date = structure(1081987200, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Mass2 = 2000, Mass2_date = structure(1082851200, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Mass3 = 1750, Mass3_date = structure(1083715200, class = c("POSIXct",
"POSIXt"), tzone = "UTC")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -1L), .Names = c("Year", "Nest_no", "Mass1",
"Mass1_date", "Mass2", "Mass2_date", "Mass3", "Mass3_date"))
df
## Source: local data frame [1 x 8]
##
## Year Nest_no Mass1 Mass1_date Mass2 Mass2_date Mass3 Mass3_date
## (dbl) (dbl) (dbl) (time) (dbl) (time) (dbl) (time)
## 1 2004 21 2325 2004-04-15 2000 2004-04-25 1750 2004-05-05
我想把数据"tidy"变成长格式,其中两组数值列被gather
ed(melt
ed)变成二不同的值列 ,一列包含 'Mass columns' 的值,一列包含 'date columns' 的值:
## Source: local data frame [3 x 5]
##
## Year Nest_no capture date weight
## (dbl) (dbl) (dbl) (date) (dbl)
## 1 2004 21 1 2004-04-15 2325
## 2 2004 21 2 2004-04-25 2000
## 3 2004 21 3 2004-05-05 1750
起初,我以为我可以使用tidyr
,分两步完成。
gather(df, capture, date, contains("Date")) %>%
gather(capture2, weight, contains("Mass"))
## Source: local data frame [9 x 6]
##
## Year Nest_no capture date capture2 weight
## (dbl) (dbl) (chr) (time) (chr) (dbl)
## 1 2004 21 Mass1_date 2004-04-15 Mass1 2325
## 2 2004 21 Mass2_date 2004-04-25 Mass1 2325
## 3 2004 21 Mass3_date 2004-05-05 Mass1 2325
## 4 2004 21 Mass1_date 2004-04-15 Mass2 2000
## 5 2004 21 Mass2_date 2004-04-25 Mass2 2000
## 6 2004 21 Mass3_date 2004-05-05 Mass2 2000
## 7 2004 21 Mass1_date 2004-04-15 Mass3 1750
## 8 2004 21 Mass2_date 2004-04-25 Mass3 1750
## 9 2004 21 Mass3_date 2004-05-05 Mass3 1750
但是,它没有按预期工作。经过几次尝试,我上来了 使用此解决方案:
df <- gather(df, capture2, weight, contains("Mass"), convert = T) %>%
mutate(capture = extract_numeric(capture2))
## Warning: attributes are not identical across measure variables; they will
## be dropped
df$capture2 <- ifelse(grepl("date", df$capture2), "date", "weight")
df <- spread(df, capture2, weight) %>%
mutate(date = as.Date(as.POSIXct(date, origin = "1970-01-01")))
df
## Source: local data frame [3 x 5]
##
## Year Nest_no capture date weight
## (dbl) (dbl) (dbl) (date) (dbl)
## 1 2004 21 1 2004-04-15 2325
## 2 2004 21 2 2004-04-25 2000
## 3 2004 21 3 2004-05-05 1750
我想知道是否有更好的方法来实现这个目标?
谢谢你,菲利普
我们可以使用 data.table
中的 melt
轻松做到这一点。 measure
可以采用多个 patterns
列名并将 'wide' 转换为 'long' 格式。
library(data.table)
melt(as.data.table(df), measure=patterns('\d$', 'date$'),
variable.name='capture', value.name= c('weight', 'date'))
# Year Nest_no capture weight date
#1: 2004 21 1 2325 2004-04-15
#2: 2004 21 2 2000 2004-04-25
#3: 2004 21 3 1750 2004-05-05