R - 将非数字和数字数据组合在数据框中的同一单元格中
R - Combine non numerical and numerical data together in the same cell in a dataframe
我想将单元格值与数字和字符数据合并到同一个单元格中。
具体来说,我试图将命名列与 'SEM' 列结合起来,中间有一个 ± 符号,最终成为 table 我可以用 latex
发布
# A tibble: 4 x 10
Variety n Probes Probes SEM Walks Walks SEM Cleans Cleans SEM Off_Leaf Off_SEM
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 10LB mean 41 1.40 0.140 0.710 0.170 0.460 0.140 0.120 0.0520
2 3LB mean 48 1.50 0.130 0.880 0.170 0.310 0.0900 0.190 0.0710
3 4LB mean 43 1.80 0.160 1.10 0.190 0.370 0.120 0.280 0.190
4 RB mean 44 2.80 0.390 1.50 0.260 0.180 0.0750 0.0910 0.0440
有没有办法让之前的table变成这样:
# A tibble: 4 x 6
Variety n Probes Walks Cleans Off Leaf
<chr> <int> <chr> <chr> <chr> <chr>
1 10LB 41 1.4 ± 0.1 0.7 ± 0.2 0.5 ± 0.1 0.1 ± 0.05
2 3LB 48 1.5 ± 0.1 0. 9± 0.2 0.3 ± 0.09 0.2 ± 0.07
3 4LB 43 1.8 ± 0.2 1.1 ± 0.2 0.4 ± 0.1 0.3 ± 0.2
4 RB 44 2.8 ± 0.4 1.5 ± 0.3 0.2 ± 0.07 0.09 ± 0.04
同时留在 R 中?
使用以下数据集。
ds <- tibble::tribble(
~Variety, ~n, ~Probes, ~`Probes SEM`, ~`Walks`, ~`Walks SEM`, ~`Cleans`, ~`Cleans SEM`, ~`Off_Leaf`, ~`Off_SEM`,
"10LB mean" , 41L, 1.40, 0.140, 0.710, 0.170, 0.460, 0.140 , 0.120 , 0.0520,
"3LB mean" , 48L, 1.50, 0.130, 0.880, 0.170, 0.310, 0.0900 , 0.190 , 0.0710,
"4LB mean" , 43L, 1.80, 0.160, 1.10 , 0.190, 0.370, 0.120 , 0.280 , 0.190 ,
"RB mean" , 44L, 2.80, 0.390, 1.50 , 0.260, 0.180, 0.0750 , 0.0910, 0.0440
)
字符将生成 plus/minus 的方式有望跨具有不同编码的文件移植。
library(magrittr)
ds %>%
dplyr::mutate(
Probes = sprintf("%2.1f \u00B1 %3.3f", .data$Probes , .data$`Probes SEM`),
Walks = sprintf("%3.2f \u00B1 %3.3f", .data$Walks , .data$`Walks SEM` ),
Cleans = sprintf("%3.2f \u00B1 %3.3f", .data$Cleans , .data$`Cleans SEM`),
Off_Leaf = sprintf("%3.2f \u00B1 %2.2f", .data$Off_Leaf, .data$Off_SEM )
) %>%
dplyr::select(
-`Probes SEM`, -`Walks SEM`, -`Cleans SEM`, -Off_SEM
)
虽然这不是您的问题的一部分,但我建议使用 sprintf()
之类的方法来确保元素在一列中一直具有相同的数字位数。填充零看起来更好,并且它减轻了 LaTeX 的一些负担以正确对齐它。
输出:
# A tibble: 4 x 6
Variety n Probes Walks Cleans Off_Leaf
<chr> <int> <chr> <chr> <chr> <chr>
1 10LB mean 41 1.4 ± 0.140 0.71 ± 0.170 0.46 ± 0.140 0.12 ± 0.05
2 3LB mean 48 1.5 ± 0.130 0.88 ± 0.170 0.31 ± 0.090 0.19 ± 0.07
3 4LB mean 43 1.8 ± 0.160 1.10 ± 0.190 0.37 ± 0.120 0.28 ± 0.19
4 RB mean 44 2.8 ± 0.390 1.50 ± 0.260 0.18 ± 0.075 0.09 ± 0.04
还要注意 LaTeX \pm
command if you use a table-formatting package like kableExtra or xtable 可能会以不同的方式处理 unicode,但允许您转义 \pm
.
我们可以通过 melt
将 measure
参数与 patterns
指定为 'long' 格式,然后执行 data.table
来做到这一点=17=]
library(data.table)
melt(setDT(df1), measure = patterns("Probes", "Walks", "Cleans", "Off"),
value.name = c("Probes", "Walks", "Cleans", "Off"))[,
lapply(.SD, function(x) paste(round(x[variable == 1], 1),
round(x[variable ==2], 2), sep=" ± ")) ,
by = .(Variety, n), .SDcols = Probes:Off]
# Variety n Probes Walks Cleans Off
#1: 10LB mean 41 1.4 ± 0.14 0.7 ± 0.17 0.5 ± 0.14 0.1 ± 0.05
#2: 3LB mean 48 1.5 ± 0.13 0.9 ± 0.17 0.3 ± 0.09 0.2 ± 0.07
#3: 4LB mean 43 1.8 ± 0.16 1.1 ± 0.19 0.4 ± 0.12 0.3 ± 0.19
#4: RB mean 44 2.8 ± 0.39 1.5 ± 0.26 0.2 ± 0.08 0.1 ± 0.04
如果我们使用 tidyverse
,类似的方法(虽然它不需要任何 patterns
- 因为值都是 numeric
类型)将是 gather
变成'long'格式,然后spread
library(tidyverse)
df1 %>%
gather(key, val, Probes:Off_SEM) %>%
separate(key, into = c('key1', 'key2')) %>%
group_by(Variety, n, key1) %>%
summarise(val = paste(first(val), last(val), sep= " ± ")) %>%
spread(key1, val)
# A tibble: 4 x 6
# Groups: Variety, n [4]
# Variety n Cleans Off Probes Walks
#* <chr> <int> <chr> <chr> <chr> <chr>
#1 10LB mean 41 0.46 ± 0.14 0.12 ± 0.052 1.4 ± 0.14 0.71 ± 0.17
#2 3LB mean 48 0.31 ± 0.09 0.19 ± 0.071 1.5 ± 0.13 0.88 ± 0.17
#3 4LB mean 43 0.37 ± 0.12 0.28 ± 0.19 1.8 ± 0.16 1.1 ± 0.19
#4 RB mean 44 0.18 ± 0.075 0.091 ± 0.044 2.8 ± 0.39 1.5 ± 0.26
数据
df1 <- structure(list(Variety = c("10LB mean", "3LB mean", "4LB mean",
"RB mean"), n = c(41L, 48L, 43L, 44L), Probes = c(1.4, 1.5, 1.8,
2.8), Probes_SEM = c(0.14, 0.13, 0.16, 0.39), Walks = c(0.71,
0.88, 1.1, 1.5), Walks_SEM = c(0.17, 0.17, 0.19, 0.26), Cleans = c(0.46,
0.31, 0.37, 0.18), Cleans_SEM = c(0.14, 0.09, 0.12, 0.075), Off_Leaf = c(0.12,
0.19, 0.28, 0.091), Off_SEM = c(0.052, 0.071, 0.19, 0.044)), .Names = c("Variety",
"n", "Probes", "Probes_SEM", "Walks", "Walks_SEM", "Cleans",
"Cleans_SEM", "Off_Leaf", "Off_SEM"), class = "data.frame", row.names = c("1",
"2", "3", "4"))
我们也可以使用 base R:
df2=df1 #I am creating a copy in order not to mess with the original
u=grep("SEM",names(df1))#Find the columns that have the SEM word
df2[,u]=round(df2[,u],1)# Round te Sem columns.
m=gsub("(\d+\S+)\s(\d+\S+)?","\1±\2",do.call(paste,c(df2[-(1:2)])))#Colapse the columns
cbind(df2[1:2],read.table(text=m))#Read the columns and cbind them to df1[1:2]
Variety n V1 V2 V3 V4
1 10LB mean 41 1.4±0.1 0.71±0.2 0.46±0.1 0.12±0.1
2 3LB mean 48 1.5±0.1 0.88±0.2 0.31±0.1 0.19±0.1
3 4LB mean 43 1.8±0.2 1.1±0.2 0.37±0.1 0.28±0.2
4 RB mean 44 2.8±0.4 1.5±0.3 0.18±0.1 0.091±0
您还可以设置列的名称。
setNames(cbind(df2[1:2],read.table(text=m)),names(df1[-u]))
Variety n Probes Walks Cleans Off_Leaf
1 10LB mean 41 1.4±0.1 0.71±0.2 0.46±0.1 0.12±0.1
2 3LB mean 48 1.5±0.1 0.88±0.2 0.31±0.1 0.19±0.1
3 4LB mean 43 1.8±0.2 1.1±0.2 0.37±0.1 0.28±0.2
4 RB mean 44 2.8±0.4 1.5±0.3 0.18±0.1 0.091±0
如果你不舍入,也许你还需要符号之间的间距:
u=grep("SEM",names(df1))
m=gsub("(\d+[.]\d+):(\d+[.]\d+)","\1 ± \2",do.call(paste,c(df1[-(1:2)],sep=":")))
setNames(cbind(df1[1:2],read.table(text=m,sep=":")),names(df1[-u]))
Variety n Probes Walks Cleans Off_Leaf
1 10LB mean 41 1.4 ± 0.14 0.71 ± 0.17 0.46 ± 0.14 0.12 ± 0.052
2 3LB mean 48 1.5 ± 0.13 0.88 ± 0.17 0.31 ± 0.09 0.19 ± 0.071
3 4LB mean 43 1.8 ± 0.16 1.1 ± 0.19 0.37 ± 0.12 0.28 ± 0.19
4 RB mean 44 2.8 ± 0.39 1.5 ± 0.26 0.18 ± 0.075 0.091 ± 0.044
我想将单元格值与数字和字符数据合并到同一个单元格中。
具体来说,我试图将命名列与 'SEM' 列结合起来,中间有一个 ± 符号,最终成为 table 我可以用 latex
发布# A tibble: 4 x 10
Variety n Probes Probes SEM Walks Walks SEM Cleans Cleans SEM Off_Leaf Off_SEM
<chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 10LB mean 41 1.40 0.140 0.710 0.170 0.460 0.140 0.120 0.0520
2 3LB mean 48 1.50 0.130 0.880 0.170 0.310 0.0900 0.190 0.0710
3 4LB mean 43 1.80 0.160 1.10 0.190 0.370 0.120 0.280 0.190
4 RB mean 44 2.80 0.390 1.50 0.260 0.180 0.0750 0.0910 0.0440
有没有办法让之前的table变成这样:
# A tibble: 4 x 6
Variety n Probes Walks Cleans Off Leaf
<chr> <int> <chr> <chr> <chr> <chr>
1 10LB 41 1.4 ± 0.1 0.7 ± 0.2 0.5 ± 0.1 0.1 ± 0.05
2 3LB 48 1.5 ± 0.1 0. 9± 0.2 0.3 ± 0.09 0.2 ± 0.07
3 4LB 43 1.8 ± 0.2 1.1 ± 0.2 0.4 ± 0.1 0.3 ± 0.2
4 RB 44 2.8 ± 0.4 1.5 ± 0.3 0.2 ± 0.07 0.09 ± 0.04
同时留在 R 中?
使用以下数据集。
ds <- tibble::tribble(
~Variety, ~n, ~Probes, ~`Probes SEM`, ~`Walks`, ~`Walks SEM`, ~`Cleans`, ~`Cleans SEM`, ~`Off_Leaf`, ~`Off_SEM`,
"10LB mean" , 41L, 1.40, 0.140, 0.710, 0.170, 0.460, 0.140 , 0.120 , 0.0520,
"3LB mean" , 48L, 1.50, 0.130, 0.880, 0.170, 0.310, 0.0900 , 0.190 , 0.0710,
"4LB mean" , 43L, 1.80, 0.160, 1.10 , 0.190, 0.370, 0.120 , 0.280 , 0.190 ,
"RB mean" , 44L, 2.80, 0.390, 1.50 , 0.260, 0.180, 0.0750 , 0.0910, 0.0440
)
library(magrittr)
ds %>%
dplyr::mutate(
Probes = sprintf("%2.1f \u00B1 %3.3f", .data$Probes , .data$`Probes SEM`),
Walks = sprintf("%3.2f \u00B1 %3.3f", .data$Walks , .data$`Walks SEM` ),
Cleans = sprintf("%3.2f \u00B1 %3.3f", .data$Cleans , .data$`Cleans SEM`),
Off_Leaf = sprintf("%3.2f \u00B1 %2.2f", .data$Off_Leaf, .data$Off_SEM )
) %>%
dplyr::select(
-`Probes SEM`, -`Walks SEM`, -`Cleans SEM`, -Off_SEM
)
虽然这不是您的问题的一部分,但我建议使用 sprintf()
之类的方法来确保元素在一列中一直具有相同的数字位数。填充零看起来更好,并且它减轻了 LaTeX 的一些负担以正确对齐它。
输出:
# A tibble: 4 x 6
Variety n Probes Walks Cleans Off_Leaf
<chr> <int> <chr> <chr> <chr> <chr>
1 10LB mean 41 1.4 ± 0.140 0.71 ± 0.170 0.46 ± 0.140 0.12 ± 0.05
2 3LB mean 48 1.5 ± 0.130 0.88 ± 0.170 0.31 ± 0.090 0.19 ± 0.07
3 4LB mean 43 1.8 ± 0.160 1.10 ± 0.190 0.37 ± 0.120 0.28 ± 0.19
4 RB mean 44 2.8 ± 0.390 1.50 ± 0.260 0.18 ± 0.075 0.09 ± 0.04
还要注意 LaTeX \pm
command if you use a table-formatting package like kableExtra or xtable 可能会以不同的方式处理 unicode,但允许您转义 \pm
.
我们可以通过 melt
将 measure
参数与 patterns
指定为 'long' 格式,然后执行 data.table
来做到这一点=17=]
library(data.table)
melt(setDT(df1), measure = patterns("Probes", "Walks", "Cleans", "Off"),
value.name = c("Probes", "Walks", "Cleans", "Off"))[,
lapply(.SD, function(x) paste(round(x[variable == 1], 1),
round(x[variable ==2], 2), sep=" ± ")) ,
by = .(Variety, n), .SDcols = Probes:Off]
# Variety n Probes Walks Cleans Off
#1: 10LB mean 41 1.4 ± 0.14 0.7 ± 0.17 0.5 ± 0.14 0.1 ± 0.05
#2: 3LB mean 48 1.5 ± 0.13 0.9 ± 0.17 0.3 ± 0.09 0.2 ± 0.07
#3: 4LB mean 43 1.8 ± 0.16 1.1 ± 0.19 0.4 ± 0.12 0.3 ± 0.19
#4: RB mean 44 2.8 ± 0.39 1.5 ± 0.26 0.2 ± 0.08 0.1 ± 0.04
如果我们使用 tidyverse
,类似的方法(虽然它不需要任何 patterns
- 因为值都是 numeric
类型)将是 gather
变成'long'格式,然后spread
library(tidyverse)
df1 %>%
gather(key, val, Probes:Off_SEM) %>%
separate(key, into = c('key1', 'key2')) %>%
group_by(Variety, n, key1) %>%
summarise(val = paste(first(val), last(val), sep= " ± ")) %>%
spread(key1, val)
# A tibble: 4 x 6
# Groups: Variety, n [4]
# Variety n Cleans Off Probes Walks
#* <chr> <int> <chr> <chr> <chr> <chr>
#1 10LB mean 41 0.46 ± 0.14 0.12 ± 0.052 1.4 ± 0.14 0.71 ± 0.17
#2 3LB mean 48 0.31 ± 0.09 0.19 ± 0.071 1.5 ± 0.13 0.88 ± 0.17
#3 4LB mean 43 0.37 ± 0.12 0.28 ± 0.19 1.8 ± 0.16 1.1 ± 0.19
#4 RB mean 44 0.18 ± 0.075 0.091 ± 0.044 2.8 ± 0.39 1.5 ± 0.26
数据
df1 <- structure(list(Variety = c("10LB mean", "3LB mean", "4LB mean",
"RB mean"), n = c(41L, 48L, 43L, 44L), Probes = c(1.4, 1.5, 1.8,
2.8), Probes_SEM = c(0.14, 0.13, 0.16, 0.39), Walks = c(0.71,
0.88, 1.1, 1.5), Walks_SEM = c(0.17, 0.17, 0.19, 0.26), Cleans = c(0.46,
0.31, 0.37, 0.18), Cleans_SEM = c(0.14, 0.09, 0.12, 0.075), Off_Leaf = c(0.12,
0.19, 0.28, 0.091), Off_SEM = c(0.052, 0.071, 0.19, 0.044)), .Names = c("Variety",
"n", "Probes", "Probes_SEM", "Walks", "Walks_SEM", "Cleans",
"Cleans_SEM", "Off_Leaf", "Off_SEM"), class = "data.frame", row.names = c("1",
"2", "3", "4"))
我们也可以使用 base R:
df2=df1 #I am creating a copy in order not to mess with the original
u=grep("SEM",names(df1))#Find the columns that have the SEM word
df2[,u]=round(df2[,u],1)# Round te Sem columns.
m=gsub("(\d+\S+)\s(\d+\S+)?","\1±\2",do.call(paste,c(df2[-(1:2)])))#Colapse the columns
cbind(df2[1:2],read.table(text=m))#Read the columns and cbind them to df1[1:2]
Variety n V1 V2 V3 V4
1 10LB mean 41 1.4±0.1 0.71±0.2 0.46±0.1 0.12±0.1
2 3LB mean 48 1.5±0.1 0.88±0.2 0.31±0.1 0.19±0.1
3 4LB mean 43 1.8±0.2 1.1±0.2 0.37±0.1 0.28±0.2
4 RB mean 44 2.8±0.4 1.5±0.3 0.18±0.1 0.091±0
您还可以设置列的名称。
setNames(cbind(df2[1:2],read.table(text=m)),names(df1[-u]))
Variety n Probes Walks Cleans Off_Leaf
1 10LB mean 41 1.4±0.1 0.71±0.2 0.46±0.1 0.12±0.1
2 3LB mean 48 1.5±0.1 0.88±0.2 0.31±0.1 0.19±0.1
3 4LB mean 43 1.8±0.2 1.1±0.2 0.37±0.1 0.28±0.2
4 RB mean 44 2.8±0.4 1.5±0.3 0.18±0.1 0.091±0
如果你不舍入,也许你还需要符号之间的间距:
u=grep("SEM",names(df1))
m=gsub("(\d+[.]\d+):(\d+[.]\d+)","\1 ± \2",do.call(paste,c(df1[-(1:2)],sep=":")))
setNames(cbind(df1[1:2],read.table(text=m,sep=":")),names(df1[-u]))
Variety n Probes Walks Cleans Off_Leaf
1 10LB mean 41 1.4 ± 0.14 0.71 ± 0.17 0.46 ± 0.14 0.12 ± 0.052
2 3LB mean 48 1.5 ± 0.13 0.88 ± 0.17 0.31 ± 0.09 0.19 ± 0.071
3 4LB mean 43 1.8 ± 0.16 1.1 ± 0.19 0.37 ± 0.12 0.28 ± 0.19
4 RB mean 44 2.8 ± 0.39 1.5 ± 0.26 0.18 ± 0.075 0.091 ± 0.044