R 将宽变长,使用对象作为输入
R reshape wide to long, using objects for the inputs
我有一个宽格式的数据框列表,第 1 列中有一个因子变量,第 2 列中有年化数据。我想绘制这些数据。这样做需要将它们重塑为长格式。这是一个数据框的示例:
# SAMPLE DATA
x <- structure(list(State = structure(1:3, .Label = c("Alabama", "Alaska", "Arizona", "Arkansas"), class = "factor"), Green.And.Blue.Score.2001 = c(0L, 40L, 65L), Green.And.Blue.Score.2002 = c(20L, 5L, 60L), Green.And.Blue.Score.2003 = c(35L, 15L, 30L)), .Names = c("State", "Green.And.Blue.Score.2001", "Green.And.Blue.Score.2002", "Green.And.Blue.Score.2003"), row.names = c(NA, 3L), class = "data.frame")
x
# State Green.And.Blue.Score.2001 Green.And.Blue.Score.2002 Green.And.Blue.Score.2003
#1 Alabama 0 20 35
#2 Alaska 40 5 15
#3 Arizona 65 60 30
我通常使用 reshape()
执行此操作。例如,这很好用:
# RESHAPE WIDE TO LONG (MANUALLY)
y <- reshape(x,
idvar = 'State',
varying = c('Green.And.Blue.Score.2001', 'Green.And.Blue.Score.2002', 'Green.And.Blue.Score.2003'),
v.names = 'Green.And.Blue.Score.',
times = c('2001', '2002', '2003'),
direction = 'long')
y
# State time Green.And.Blue.Score.
# Alabama 2001 0
# Alaska 2001 40
# Arizona 2001 65
# Alabama 2002 20
# Alaska 2002 5
# Arizona 2002 60
# Alabama 2003 35
# Alaska 2003 15
# Arizona 2003 30
但是,我不想为我的几十个数据框手动输入idvar
、varying
、v.name
和times
变量有,并且因为一些列名非常长和复杂,并且在数据帧之间有很大差异,所以简单的 reshape()
命令不能自动解析它们。我的想法是创建一个函数来从数据框中获取这些输入,其前身如下:
# RESHAPE WIDE TO LONG (FUNCTIONALIZED)
id <- noquote(paste("'", names(x[1]), "'", sep = ""))
va <- noquote(paste("c('", paste(names(x)[2:length(x)], collapse = "', '"), "')", sep = ""))
vn <- noquote(paste("'", sub("(\..*)$", ".", names(x)[2]) , "'", sep = ""))
ti <- noquote(paste("c('", paste(sub(".*(\d{4})$", "\1", names(x[2:length(x)])), collapse = "', '"), "')", sep = ""))
其中每一个的输出都匹配上面 #RESHAPE WIDE TO LONG (MANUALLY)
的 idvar
、varying
、v.name
和 times
输入:
id
# 'State'
va
# c(''Green.And.Blue.Score.2001', ''Green.And.Blue.Score.2002', ''Green.And.Blue.Score.2003')
vn
# ''Green.And.Blue.Score.'
ti
# c('2001', '2002', '2003')
但是,当我尝试在 reshape()
函数中使用这些对象时,我收到一条错误消息:
y <- reshape(x,
idvar = id,
varying = va,
v.names = vn,
times = ti,
direction = 'long')
Error in [.data.frame(data, , varying[[i]][1L]) : undefined
columns selected
我确定我对 'functionalize' reshape()
的解决方案并不理想。我应该怎么做?
在从名称中提取的 material 周围加上引号的努力导致了错误。这是该代码的简化。请注意,我删除了 v.names 和时间,因为当列名由“.”正确分隔时,这些是自动计算的。
y <- reshape(x,
idvar = names(x)[1],
varying = names(x)[-1],
direction = 'long')
y
#-----
State time Score
Alabama.2001 Alabama 2001 0
Alaska.2001 Alaska 2001 40
Arizona.2001 Arizona 2001 65
Alabama.2002 Alabama 2002 20
Alaska.2002 Alaska 2002 5
Arizona.2002 Arizona 2002 60
Alabama.2003 Alabama 2003 35
Alaska.2003 Alaska 2003 15
Arizona.2003 Arizona 2003 30
如果我们在您的新示例中使用它,我们可以在“.S”处得到一个 "split",从而给出一个合理的结果。第一个句点和 "split" 模式之间的列名文本被移动到列名,而领先的州名称和年份作为行名一起附加:
y <- reshape(x,
idvar = names(x)[1],
varying = names(x)[-1],
split = list(regexp = "\.S", include = TRUE),
direction = 'long')
y
State time Green.And.Blue.
Alabama.Score.2001 Alabama Score.2001 0
Alaska.Score.2001 Alaska Score.2001 40
Arizona.Score.2001 Arizona Score.2001 65
Alabama.Score.2002 Alabama Score.2002 20
Alaska.Score.2002 Alaska Score.2002 5
Arizona.Score.2002 Arizona Score.2002 60
Alabama.Score.2003 Alabama Score.2003 35
Alaska.Score.2003 Alaska Score.2003 15
Arizona.Score.2003 Arizona Score.2003 30
我有一个宽格式的数据框列表,第 1 列中有一个因子变量,第 2 列中有年化数据。我想绘制这些数据。这样做需要将它们重塑为长格式。这是一个数据框的示例:
# SAMPLE DATA
x <- structure(list(State = structure(1:3, .Label = c("Alabama", "Alaska", "Arizona", "Arkansas"), class = "factor"), Green.And.Blue.Score.2001 = c(0L, 40L, 65L), Green.And.Blue.Score.2002 = c(20L, 5L, 60L), Green.And.Blue.Score.2003 = c(35L, 15L, 30L)), .Names = c("State", "Green.And.Blue.Score.2001", "Green.And.Blue.Score.2002", "Green.And.Blue.Score.2003"), row.names = c(NA, 3L), class = "data.frame")
x
# State Green.And.Blue.Score.2001 Green.And.Blue.Score.2002 Green.And.Blue.Score.2003
#1 Alabama 0 20 35
#2 Alaska 40 5 15
#3 Arizona 65 60 30
我通常使用 reshape()
执行此操作。例如,这很好用:
# RESHAPE WIDE TO LONG (MANUALLY)
y <- reshape(x,
idvar = 'State',
varying = c('Green.And.Blue.Score.2001', 'Green.And.Blue.Score.2002', 'Green.And.Blue.Score.2003'),
v.names = 'Green.And.Blue.Score.',
times = c('2001', '2002', '2003'),
direction = 'long')
y
# State time Green.And.Blue.Score.
# Alabama 2001 0
# Alaska 2001 40
# Arizona 2001 65
# Alabama 2002 20
# Alaska 2002 5
# Arizona 2002 60
# Alabama 2003 35
# Alaska 2003 15
# Arizona 2003 30
但是,我不想为我的几十个数据框手动输入idvar
、varying
、v.name
和times
变量有,并且因为一些列名非常长和复杂,并且在数据帧之间有很大差异,所以简单的 reshape()
命令不能自动解析它们。我的想法是创建一个函数来从数据框中获取这些输入,其前身如下:
# RESHAPE WIDE TO LONG (FUNCTIONALIZED)
id <- noquote(paste("'", names(x[1]), "'", sep = ""))
va <- noquote(paste("c('", paste(names(x)[2:length(x)], collapse = "', '"), "')", sep = ""))
vn <- noquote(paste("'", sub("(\..*)$", ".", names(x)[2]) , "'", sep = ""))
ti <- noquote(paste("c('", paste(sub(".*(\d{4})$", "\1", names(x[2:length(x)])), collapse = "', '"), "')", sep = ""))
其中每一个的输出都匹配上面 #RESHAPE WIDE TO LONG (MANUALLY)
的 idvar
、varying
、v.name
和 times
输入:
id
# 'State'
va
# c(''Green.And.Blue.Score.2001', ''Green.And.Blue.Score.2002', ''Green.And.Blue.Score.2003')
vn
# ''Green.And.Blue.Score.'
ti
# c('2001', '2002', '2003')
但是,当我尝试在 reshape()
函数中使用这些对象时,我收到一条错误消息:
y <- reshape(x,
idvar = id,
varying = va,
v.names = vn,
times = ti,
direction = 'long')
Error in [.data.frame(data, , varying[[i]][1L]) : undefined columns selected
我确定我对 'functionalize' reshape()
的解决方案并不理想。我应该怎么做?
在从名称中提取的 material 周围加上引号的努力导致了错误。这是该代码的简化。请注意,我删除了 v.names 和时间,因为当列名由“.”正确分隔时,这些是自动计算的。
y <- reshape(x,
idvar = names(x)[1],
varying = names(x)[-1],
direction = 'long')
y
#-----
State time Score
Alabama.2001 Alabama 2001 0
Alaska.2001 Alaska 2001 40
Arizona.2001 Arizona 2001 65
Alabama.2002 Alabama 2002 20
Alaska.2002 Alaska 2002 5
Arizona.2002 Arizona 2002 60
Alabama.2003 Alabama 2003 35
Alaska.2003 Alaska 2003 15
Arizona.2003 Arizona 2003 30
如果我们在您的新示例中使用它,我们可以在“.S”处得到一个 "split",从而给出一个合理的结果。第一个句点和 "split" 模式之间的列名文本被移动到列名,而领先的州名称和年份作为行名一起附加:
y <- reshape(x,
idvar = names(x)[1],
varying = names(x)[-1],
split = list(regexp = "\.S", include = TRUE),
direction = 'long')
y
State time Green.And.Blue.
Alabama.Score.2001 Alabama Score.2001 0
Alaska.Score.2001 Alaska Score.2001 40
Arizona.Score.2001 Arizona Score.2001 65
Alabama.Score.2002 Alabama Score.2002 20
Alaska.Score.2002 Alaska Score.2002 5
Arizona.Score.2002 Arizona Score.2002 60
Alabama.Score.2003 Alabama Score.2003 35
Alaska.Score.2003 Alaska Score.2003 15
Arizona.Score.2003 Arizona Score.2003 30