报告具有不同行名称的数据集
report datasets with different row names
我有两个不同的数据集,例如 df1 和 df2,只是行名不同。我怎样才能有效地打印它?非常感谢。
df1 <- mtcars[1:6, 1:3]; rownames(df1)
df2 <- df1; rownames(df2) <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive",
"Hornet Sportabout","NEW.NAME"); rownames(df2)
df3 <- cbind(df1,df2); df3
预期结果,
mpg cyl disp mpg cyl disp
Mazda RX4 21.0 6 160 21.0 6 160
Mazda RX4 Wag 21.0 6 160 21.0 6 160
Datsun 710 22.8 4 108 22.8 4 108
Hornet 4 Drive 21.4 6 258 21.4 6 258
Hornet Sportabout 18.7 8 360 18.7 8 360
Valiant 18.1 6 225 \ \ \
New.NAME \ \ \ 18.1 6 225
我不是行名的忠实粉丝(我什至认为这是不好的做法或 evil)。
有一种简单的方法可以使用 data.table
.
将行名称信息提取到新列中
在你的情况下,我会选择:
library(data.table)
library(hablar)
setDT(df1, keep.rownames = TRUE)
setDT(df2, keep.rownames = TRUE)
# Bind and keep unique rows
df3 <- unique(rbind(df1, df2))
df3
#> rn mpg cyl disp
#> 1: Mazda RX4 21.0 6 160
#> 2: Mazda RX4 Wag 21.0 6 160
#> 3: Datsun 710 22.8 4 108
#> 4: Hornet 4 Drive 21.4 6 258
#> 5: Hornet Sportabout 18.7 8 360
#> 6: Valiant 18.1 6 225
#> 7: NEW.NAME 18.1 6 225
如果您想保留原始来源,我会这样做:
# create df identify columns
old <- setdiff(names(df1), "rn")
new <- paste0(old, "_df1")
setnames(df1, old, new)
new <- paste0(old, "_df2")
setnames(df2, old, new)
# Different column names
df3 <- unique(rbind(df1, df2, fill = TRUE))
# sum_ from package hablar to keep NA
df3 <-
df3[, lapply(lapply(.SD, hablar::sum_), as.numeric), by = "rn"]
df3
#> rn mpg_df1 cyl_df1 disp_df1 mpg_df2 cyl_df2 disp_df2
#> 1: Mazda RX4 21.0 6 160 21.0 6 160
#> 2: Mazda RX4 Wag 21.0 6 160 21.0 6 160
#> 3: Datsun 710 22.8 4 108 22.8 4 108
#> 4: Hornet 4 Drive 21.4 6 258 21.4 6 258
#> 5: Hornet Sportabout 18.7 8 360 18.7 8 360
#> 6: Valiant 18.1 6 225 NA NA NA
#> 7: NEW.NAME NA NA NA 18.1 6 225
我有两个不同的数据集,例如 df1 和 df2,只是行名不同。我怎样才能有效地打印它?非常感谢。
df1 <- mtcars[1:6, 1:3]; rownames(df1)
df2 <- df1; rownames(df2) <- c("Mazda RX4","Mazda RX4 Wag","Datsun 710","Hornet 4 Drive",
"Hornet Sportabout","NEW.NAME"); rownames(df2)
df3 <- cbind(df1,df2); df3
预期结果,
mpg cyl disp mpg cyl disp
Mazda RX4 21.0 6 160 21.0 6 160
Mazda RX4 Wag 21.0 6 160 21.0 6 160
Datsun 710 22.8 4 108 22.8 4 108
Hornet 4 Drive 21.4 6 258 21.4 6 258
Hornet Sportabout 18.7 8 360 18.7 8 360
Valiant 18.1 6 225 \ \ \
New.NAME \ \ \ 18.1 6 225
我不是行名的忠实粉丝(我什至认为这是不好的做法或 evil)。
有一种简单的方法可以使用 data.table
.
在你的情况下,我会选择:
library(data.table)
library(hablar)
setDT(df1, keep.rownames = TRUE)
setDT(df2, keep.rownames = TRUE)
# Bind and keep unique rows
df3 <- unique(rbind(df1, df2))
df3
#> rn mpg cyl disp
#> 1: Mazda RX4 21.0 6 160
#> 2: Mazda RX4 Wag 21.0 6 160
#> 3: Datsun 710 22.8 4 108
#> 4: Hornet 4 Drive 21.4 6 258
#> 5: Hornet Sportabout 18.7 8 360
#> 6: Valiant 18.1 6 225
#> 7: NEW.NAME 18.1 6 225
如果您想保留原始来源,我会这样做:
# create df identify columns
old <- setdiff(names(df1), "rn")
new <- paste0(old, "_df1")
setnames(df1, old, new)
new <- paste0(old, "_df2")
setnames(df2, old, new)
# Different column names
df3 <- unique(rbind(df1, df2, fill = TRUE))
# sum_ from package hablar to keep NA
df3 <-
df3[, lapply(lapply(.SD, hablar::sum_), as.numeric), by = "rn"]
df3
#> rn mpg_df1 cyl_df1 disp_df1 mpg_df2 cyl_df2 disp_df2
#> 1: Mazda RX4 21.0 6 160 21.0 6 160
#> 2: Mazda RX4 Wag 21.0 6 160 21.0 6 160
#> 3: Datsun 710 22.8 4 108 22.8 4 108
#> 4: Hornet 4 Drive 21.4 6 258 21.4 6 258
#> 5: Hornet Sportabout 18.7 8 360 18.7 8 360
#> 6: Valiant 18.1 6 225 NA NA NA
#> 7: NEW.NAME NA NA NA 18.1 6 225