将行转换为 R 中的列
Convert rows to Columns in R
我的Dataframe
:
> head(scotland_weather)
JAN Year.1 FEB Year.2 MAR Year.3 APR Year.4 MAY Year.5 JUN Year.6 JUL Year.7 AUG Year.8 SEP Year.9 OCT Year.10
1 293.8 1993 278.1 1993 238.5 1993 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935
2 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954
3 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014
4 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938
5 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983
6 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001
NOV Year.11 DEC Year.12 WIN Year.13 SPR Year.14 SUM Year.15 AUT Year.16 ANN Year.17
1 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
2 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
3 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
4 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
5 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
6 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
Year.X
列不是 ordered
。我希望将其转换为以下格式:
month year rainfall_mm
Jan 1993 293.8
Feb 1993 278.1
Mar 1993 238.5
...
Nov 2015 230.0
我试过 t()
但它使 year
列分开。
也尝试了 reshape2
recast(data, formula, ..., id.var, measure.var)
但缺少一些东西。因为 month
和 Year.X
列都是 numeric
和 int
> str(scotland_weather)
'data.frame': 106 obs. of 34 variables:
$ JAN : num 294 292 276 252 246 ...
$ Year.1 : int 1993 1928 2008 2015 1974 1975 2005 2007 1990 1983 ...
$ FEB : num 278 259 245 228 225 ...
$ Year.2 : int 1990 1997 2002 1989 2014 1995 1998 2000 1920 1918 ...
$ MAR : num 238 233 201 200 180 ...
$ Year.3 : int 1994 1990 1992 1967 1979 1989 1921 1913 2015 1978 ...
$ APR : num 191 149 147 142 134 ...
根据 'scotland_weather' 中 'YearX' 列的交替列模式,一种方法是通过回收利用 c(TRUE, FALSE)
到 select 交替列,类似于 seq(1, ncol(scotland_weather), by =2)
。通过使用 c(FALSE, TRUE)
,我们得到 seq(2, ncol(scotland_weather), by =2)
。这对于提取这些列、获取转置 (t
) 并将 (c
) 连接到向量很有用。完成此操作后,下一步将提取不是 'Year' 的列名。为此,可以使用 grep
。然后,我们使用 data.frame
将向量绑定到 data.frame
.
res <- data.frame(month= names(scotland_weather)[!grepl('Year',
names(scotland_weather))], year=c(t(scotland_weather[c(FALSE,TRUE)])),
rainfall_mm= c(t(scotland_weather[c(TRUE,FALSE)])))
head(res,4)
# month year rainfall_mm
#1 JAN 1993 293.8
#2 FEB 1993 278.1
#3 MAR 1993 238.5
#4 APR 1947 191.1
您遇到的问题不仅是您需要转换数据,您还遇到第一列的年份在第二列,第三列的年份在第四列,依此类推...
这是使用 tidyr 的解决方案。
library(tidyr)
match <- Vectorize(function(x,y) grep(x,names(df)) - grep(y,names(df) == 1))
years <- grep("Year",names(scotland_weather))
df %>% gather("month","rainfall_mm",-years) %>%
gather("yearname","year",-c(months,time)) %>%
filter(match(month,yearname)) %>%
select(-yearname)
我的Dataframe
:
> head(scotland_weather)
JAN Year.1 FEB Year.2 MAR Year.3 APR Year.4 MAY Year.5 JUN Year.6 JUL Year.7 AUG Year.8 SEP Year.9 OCT Year.10
1 293.8 1993 278.1 1993 238.5 1993 191.1 1947 191.4 2011 155.0 1938 185.6 1940 216.5 1985 267.6 1950 258.1 1935
2 292.2 1928 258.8 1997 233.4 1990 149.0 1910 168.7 1986 137.9 2002 181.4 1988 211.9 1992 221.2 1981 254.0 1954
3 275.6 2008 244.7 2002 201.3 1992 146.8 1934 155.9 1925 137.8 1948 170.1 1939 202.3 2009 193.9 1982 248.8 2014
4 252.3 2015 227.9 1989 200.2 1967 142.1 1949 149.5 2015 137.7 1931 165.8 2010 191.4 1962 189.7 2011 247.7 1938
5 246.2 1974 224.9 2014 180.2 1979 133.5 1950 137.4 2003 135.0 1966 162.9 1956 190.3 2014 189.7 1927 242.3 1983
6 245.0 1975 195.6 1995 180.0 1989 132.9 1932 129.7 2007 131.7 2004 159.9 1985 189.1 2004 189.6 1985 240.9 2001
NOV Year.11 DEC Year.12 WIN Year.13 SPR Year.14 SUM Year.15 AUT Year.16 ANN Year.17
1 262.0 2009 300.7 2013 743.6 2014 409.5 1986 455.6 1985 661.2 1981 1886.4 2011
2 244.8 1938 268.5 1986 649.5 1995 401.3 2015 435.6 1948 633.8 1954 1828.1 1990
3 242.2 2006 267.2 1929 645.4 2000 393.7 1994 427.8 2009 615.8 1938 1756.8 2014
4 231.3 1917 265.4 2011 638.3 2007 393.2 1967 422.6 1956 594.5 1935 1735.8 1938
5 229.9 1981 264.0 2006 608.9 1990 391.7 1992 397.0 2004 590.6 1982 1720.0 2008
6 224.9 1951 261.0 1912 592.8 2015 389.1 1913 390.1 1938 589.2 2006 1716.5 1954
Year.X
列不是 ordered
。我希望将其转换为以下格式:
month year rainfall_mm
Jan 1993 293.8
Feb 1993 278.1
Mar 1993 238.5
...
Nov 2015 230.0
我试过 t()
但它使 year
列分开。
也尝试了 reshape2
recast(data, formula, ..., id.var, measure.var)
但缺少一些东西。因为 month
和 Year.X
列都是 numeric
和 int
> str(scotland_weather)
'data.frame': 106 obs. of 34 variables:
$ JAN : num 294 292 276 252 246 ...
$ Year.1 : int 1993 1928 2008 2015 1974 1975 2005 2007 1990 1983 ...
$ FEB : num 278 259 245 228 225 ...
$ Year.2 : int 1990 1997 2002 1989 2014 1995 1998 2000 1920 1918 ...
$ MAR : num 238 233 201 200 180 ...
$ Year.3 : int 1994 1990 1992 1967 1979 1989 1921 1913 2015 1978 ...
$ APR : num 191 149 147 142 134 ...
根据 'scotland_weather' 中 'YearX' 列的交替列模式,一种方法是通过回收利用 c(TRUE, FALSE)
到 select 交替列,类似于 seq(1, ncol(scotland_weather), by =2)
。通过使用 c(FALSE, TRUE)
,我们得到 seq(2, ncol(scotland_weather), by =2)
。这对于提取这些列、获取转置 (t
) 并将 (c
) 连接到向量很有用。完成此操作后,下一步将提取不是 'Year' 的列名。为此,可以使用 grep
。然后,我们使用 data.frame
将向量绑定到 data.frame
.
res <- data.frame(month= names(scotland_weather)[!grepl('Year',
names(scotland_weather))], year=c(t(scotland_weather[c(FALSE,TRUE)])),
rainfall_mm= c(t(scotland_weather[c(TRUE,FALSE)])))
head(res,4)
# month year rainfall_mm
#1 JAN 1993 293.8
#2 FEB 1993 278.1
#3 MAR 1993 238.5
#4 APR 1947 191.1
您遇到的问题不仅是您需要转换数据,您还遇到第一列的年份在第二列,第三列的年份在第四列,依此类推... 这是使用 tidyr 的解决方案。
library(tidyr)
match <- Vectorize(function(x,y) grep(x,names(df)) - grep(y,names(df) == 1))
years <- grep("Year",names(scotland_weather))
df %>% gather("month","rainfall_mm",-years) %>%
gather("yearname","year",-c(months,time)) %>%
filter(match(month,yearname)) %>%
select(-yearname)