使用虚拟对象使用 dcast 重新排列数据帧
Rearrange dataframe using dcast using a dummy
我想对 returns 重塑数据框使用 dcast 函数(reshape2 包),但不起作用。在我的例子中:
#Data set
X<-c(804519.4,804519.6,804519.6,804519.4,804519.4,804519.4,804519.6,804519.6,804519.4,804519.4)
Y<-c(7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833)
band<-c("band1","band1","band1","band1","band1","band2","band2","band2","band2","band2")# My original data set are 31 bands
reflec<-c(9.608848,10.504454,8.648237,9.935091,11.282750,9.608848,10.504454,8.648237,9.935091,11.282750)
dummy<-1:10
RES3<-data.frame(X,Y,band,reflec,dummy)
RES3
X Y band reflec dummy
1 804519.4 7673833 band1 9.608848 1
2 804519.6 7673833 band1 10.504454 2
3 804519.6 7673833 band1 8.648237 3
4 804519.4 7673833 band1 9.935091 4
5 804519.4 7673833 band1 11.282750 5
6 804519.4 7673833 band2 9.608848 6
7 804519.6 7673833 band2 10.504454 7
8 804519.6 7673833 band2 8.648237 8
9 804519.4 7673833 band2 9.935091 9
10 804519.4 7673833 band2 11.282750 10
RES3<-as.data.frame(RES3)
colnames(RES3)<-c("X","Y","band","reflec","dummy")
dcast(RES3, X + Y + dummy ~ band,
fun.aggregate = length,
value.var="reflec")
不起作用,我的输出是:
X Y dummy band1 band2
1 804519.4 7673833 1 1 0
2 804519.4 7673833 4 1 0
3 804519.4 7673833 5 1 0
4 804519.4 7673833 6 0 1
5 804519.4 7673833 9 0 1
6 804519.4 7673833 10 0 1
7 804519.6 7673833 2 1 0
8 804519.6 7673833 3 1 0
9 804519.6 7673833 7 0 1
10 804519.6 7673833 8 0 1
我预计:
X Y band1 band2
1 804519.4 7673833 9.608848 9.608848
2 804519.6 7673833 10.504454 10.504454
3 804519.6 7673833 8.648237 8.648237
4 804519.4 7673833 9.935091 9.935091
5 804519.4 7673833 11.282750 11.282750
任何成员都可以帮助我,因为我的原始数据集是 31 个波段作为级别,我想按列转换?谢谢!
value.var
应该是指定列名的字符串。根据?dcast
value.var - name of column which stores values
列的和name
为字符串
通过指定不加引号,它从值中搜索列名
dcast(RES3, X + Y + dummy ~ band,
fun.aggregate = length,
value.var="reflec")
RES4<-dcast(RES3, ... ~ band,
value.var="reflec")
就像我提到的,reshape2
已经 deprecated in favor of tidyr
within the tidyverse
packages. In my opinion (and that of the packages' authors), tidyr
's spread
and gather
are a bit more clear than reshape2
's cast
and melt
—no formula notation, cleaner ways to specify values. (Some context for that is here)。
此外,正如我提到的,您还有几行 - data.frame
将根据进入其中的向量的名称添加列名称。
我更新了这个答案以匹配您发布的新数据。我的原始解决方案适用于您的原始数据,但需要为您的新数据执行更多步骤,我正在使用 dplyr
函数。
此时我还没有完全理解 dummy
列,因为它不在您预期的输出中。我用 dplyr::select(-dummy)
删除它。 tidyr::spread
的一件棘手事情是您需要一些方法来唯一地标记行——这很烦人,但也可以防止在重塑数据时出错。所以我按 band
分组,然后按如下方式添加行号:
library(tidyr)
library(dplyr)
res3 <- data.frame(X, Y, band, reflec, dummy)
res3 %>%
select(-dummy) %>%
group_by(band) %>%
mutate(row = row_number())
#> # A tibble: 10 x 5
#> # Groups: band [2]
#> X Y band reflec row
#> <dbl> <dbl> <fct> <dbl> <int>
#> 1 804519. 7673833 band1 9.61 1
#> 2 804520. 7673833 band1 10.5 2
#> 3 804520. 7673833 band1 8.65 3
#> 4 804519. 7673833 band1 9.94 4
#> 5 804519. 7673833 band1 11.3 5
#> 6 804519. 7673833 band2 9.61 1
#> 7 804520. 7673833 band2 10.5 2
#> 8 804520. 7673833 band2 8.65 3
#> 9 804519. 7673833 band2 9.94 4
#> 10 804519. 7673833 band2 11.3 5
这样,第 1 行将具有波段 1 值和波段 2 值,依此类推。然后我调用 spread
以 band 作为键成为列和反射值来填充这些列,最后删除行号列。
res3 %>%
select(-dummy) %>%
group_by(band) %>%
mutate(row = row_number()) %>%
spread(key = band, value = reflec) %>%
select(-row)
#> # A tibble: 5 x 4
#> X Y band1 band2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 804519. 7673833 9.61 9.61
#> 2 804519. 7673833 9.94 9.94
#> 3 804519. 7673833 11.3 11.3
#> 4 804520. 7673833 10.5 10.5
#> 5 804520. 7673833 8.65 8.65
由 reprex package (v0.2.1)
于 2019-01-28 创建
我想对 returns 重塑数据框使用 dcast 函数(reshape2 包),但不起作用。在我的例子中:
#Data set
X<-c(804519.4,804519.6,804519.6,804519.4,804519.4,804519.4,804519.6,804519.6,804519.4,804519.4)
Y<-c(7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833,7673833)
band<-c("band1","band1","band1","band1","band1","band2","band2","band2","band2","band2")# My original data set are 31 bands
reflec<-c(9.608848,10.504454,8.648237,9.935091,11.282750,9.608848,10.504454,8.648237,9.935091,11.282750)
dummy<-1:10
RES3<-data.frame(X,Y,band,reflec,dummy)
RES3
X Y band reflec dummy
1 804519.4 7673833 band1 9.608848 1
2 804519.6 7673833 band1 10.504454 2
3 804519.6 7673833 band1 8.648237 3
4 804519.4 7673833 band1 9.935091 4
5 804519.4 7673833 band1 11.282750 5
6 804519.4 7673833 band2 9.608848 6
7 804519.6 7673833 band2 10.504454 7
8 804519.6 7673833 band2 8.648237 8
9 804519.4 7673833 band2 9.935091 9
10 804519.4 7673833 band2 11.282750 10
RES3<-as.data.frame(RES3)
colnames(RES3)<-c("X","Y","band","reflec","dummy")
dcast(RES3, X + Y + dummy ~ band,
fun.aggregate = length,
value.var="reflec")
不起作用,我的输出是:
X Y dummy band1 band2
1 804519.4 7673833 1 1 0
2 804519.4 7673833 4 1 0
3 804519.4 7673833 5 1 0
4 804519.4 7673833 6 0 1
5 804519.4 7673833 9 0 1
6 804519.4 7673833 10 0 1
7 804519.6 7673833 2 1 0
8 804519.6 7673833 3 1 0
9 804519.6 7673833 7 0 1
10 804519.6 7673833 8 0 1
我预计:
X Y band1 band2
1 804519.4 7673833 9.608848 9.608848
2 804519.6 7673833 10.504454 10.504454
3 804519.6 7673833 8.648237 8.648237
4 804519.4 7673833 9.935091 9.935091
5 804519.4 7673833 11.282750 11.282750
任何成员都可以帮助我,因为我的原始数据集是 31 个波段作为级别,我想按列转换?谢谢!
value.var
应该是指定列名的字符串。根据?dcast
列的value.var - name of column which stores values
和name
为字符串
通过指定不加引号,它从值中搜索列名
dcast(RES3, X + Y + dummy ~ band,
fun.aggregate = length,
value.var="reflec")
RES4<-dcast(RES3, ... ~ band,
value.var="reflec")
就像我提到的,reshape2
已经 deprecated in favor of tidyr
within the tidyverse
packages. In my opinion (and that of the packages' authors), tidyr
's spread
and gather
are a bit more clear than reshape2
's cast
and melt
—no formula notation, cleaner ways to specify values. (Some context for that is here)。
此外,正如我提到的,您还有几行 - data.frame
将根据进入其中的向量的名称添加列名称。
我更新了这个答案以匹配您发布的新数据。我的原始解决方案适用于您的原始数据,但需要为您的新数据执行更多步骤,我正在使用 dplyr
函数。
此时我还没有完全理解 dummy
列,因为它不在您预期的输出中。我用 dplyr::select(-dummy)
删除它。 tidyr::spread
的一件棘手事情是您需要一些方法来唯一地标记行——这很烦人,但也可以防止在重塑数据时出错。所以我按 band
分组,然后按如下方式添加行号:
library(tidyr)
library(dplyr)
res3 <- data.frame(X, Y, band, reflec, dummy)
res3 %>%
select(-dummy) %>%
group_by(band) %>%
mutate(row = row_number())
#> # A tibble: 10 x 5
#> # Groups: band [2]
#> X Y band reflec row
#> <dbl> <dbl> <fct> <dbl> <int>
#> 1 804519. 7673833 band1 9.61 1
#> 2 804520. 7673833 band1 10.5 2
#> 3 804520. 7673833 band1 8.65 3
#> 4 804519. 7673833 band1 9.94 4
#> 5 804519. 7673833 band1 11.3 5
#> 6 804519. 7673833 band2 9.61 1
#> 7 804520. 7673833 band2 10.5 2
#> 8 804520. 7673833 band2 8.65 3
#> 9 804519. 7673833 band2 9.94 4
#> 10 804519. 7673833 band2 11.3 5
这样,第 1 行将具有波段 1 值和波段 2 值,依此类推。然后我调用 spread
以 band 作为键成为列和反射值来填充这些列,最后删除行号列。
res3 %>%
select(-dummy) %>%
group_by(band) %>%
mutate(row = row_number()) %>%
spread(key = band, value = reflec) %>%
select(-row)
#> # A tibble: 5 x 4
#> X Y band1 band2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 804519. 7673833 9.61 9.61
#> 2 804519. 7673833 9.94 9.94
#> 3 804519. 7673833 11.3 11.3
#> 4 804520. 7673833 10.5 10.5
#> 5 804520. 7673833 8.65 8.65
由 reprex package (v0.2.1)
于 2019-01-28 创建