如何使用 mapply 获取数据帧输出
How to get a dataframe output using mapply
我有一个 for 循环,我正在尝试转换为 mapply,因为我读到它比 for 快(for 循环大约需要 2 分钟)。
循环执行此操作:它创建按列“OrdenFab”的唯一名称过滤的子集,然后将未重复的值保留在“Valor”列中。
然后它将这个过滤后的子集添加到一个新的数据框,并随着循环的进行不断添加它们,得到一个过滤后的数据框,在“OrdenFab”列的每个唯一值的“Valor”列中没有重复值。
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
for (j in i){
datapesomoldetemp<-datapesomolde%>%
filter(OrdenFab==j)%>%
filter(!duplicated(Valor))
datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp)
}
原始数据框是这个(前 20 行,它有 20626):
> datapesomolde
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
3 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
4 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
5 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
6 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
7 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
8 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
9 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
10 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
11 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
12 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
13 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
14 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
15 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
16 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
17 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
18 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
19 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
20 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
过滤后的结果是这个(前10行):
>datapesomoldefiltered
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
3 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
4 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
5 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
6 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
7 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
8 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
9 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
10 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
我正在努力将其转换为 mapply,我得到的是矩阵而不是数据框。
我试过这个:
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
limpiarof<-function(i){
subset<-filter(datapesomolde,OrdenFab==i)
datapesomoldetemp<-filter(subset,!duplicated(subset$Valor))
return(datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp))
}
datapesomoldefiltered<-mapply(limpiarof,i)
通过这次尝试,我得到了一个 2.2GB 的矩阵,它只有“OrdenFab”列的每个唯一值的所有列的值。
result of mapply
你能帮帮我吗?提前致谢。
我建议使用更抽象的方法来解决这个问题,例如使用tidyverse
:
这应该会更快更清楚:
library(tidyverse)
datapesomoldefiltered <-
datapesomolde |>
group_by(OrdenFab) |>
distinct(Valor, .keep_all = TRUE) |>
ungroup()
datapesomoldefiltered
这里有两种方法。不同之处在于,在第一个解决方案中,原始行顺序保留在最终结果中。如果这无关紧要,第二个解决方案将跳过创建临时列表 sp
.
x <- " PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 '2022-04-25 07:18:00' 12.65 14.71 13.68
2 11012501 226549204 14.50000 '2022-04-25 07:18:00' 12.65 14.71 13.68
3 11013610 226548648 47.30000 '2022-04-25 05:52:00' 42.38 49.26 45.82
4 11013047 226548234 15.20000 '2022-04-23 02:47:00' 14.43 16.77 15.60
5 11013047 226548234 15.20000 '2022-04-23 02:47:00' 14.43 16.77 15.60
6 11013047 226548234 15.20000 '2022-04-23 02:48:00' 14.43 16.77 15.60
7 11013047 226548234 15.20000 '2022-04-23 02:48:00' 14.43 16.77 15.60
8 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
9 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
10 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
11 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
12 11012501 226548204 14.70000 '2022-04-23 01:44:00' 12.65 14.71 13.68
13 11012501 226548204 14.70000 '2022-04-23 01:44:00' 12.65 14.71 13.68
14 11012501 226548200 14.55000 '2022-04-23 01:43:00' 12.65 14.71 13.68
15 11012501 226548200 14.55000 '2022-04-23 01:43:00' 12.65 14.71 13.68
16 11012501 226548201 14.65000 '2022-04-23 01:42:00' 12.65 14.71 13.68
17 11012501 226548201 14.65000 '2022-04-23 01:42:00' 12.65 14.71 13.68
18 11013943 226548154 134.00000 '2022-04-23 00:07:00' 131.76 153.13 142.44
19 11013943 226547066 144.00000 '2022-04-22 23:31:00' 131.76 153.13 142.44
20 11013050 226547200 15.10000 '2022-04-22 23:27:00' 14.34 16.66 15.50"
datapesomolde <- read.table(textConnection(x), header = TRUE)
suppressPackageStartupMessages({
library(dplyr)
library(purrr)
})
datapesomolde$Fecha_Registro <- as.POSIXct(datapesomolde$Fecha_Registro)
sp <- split(datapesomolde, datapesomolde$OrdenFab)
sp %>%
map_dfr( ~ .x %>% filter(!duplicated(Valor))) %>%
arrange(as.integer(row.names(.)))
#> PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
#> 1 11012501 226549204 14.50 2022-04-25 07:18:00 12.65 14.71 13.68
#> 3 11013610 226548648 47.30 2022-04-25 05:52:00 42.38 49.26 45.82
#> 4 11013047 226548234 15.20 2022-04-23 02:47:00 14.43 16.77 15.60
#> 8 11013052 226548332 16.30 2022-04-23 01:49:00 15.63 18.17 16.90
#> 12 11012501 226548204 14.70 2022-04-23 01:44:00 12.65 14.71 13.68
#> 14 11012501 226548200 14.55 2022-04-23 01:43:00 12.65 14.71 13.68
#> 16 11012501 226548201 14.65 2022-04-23 01:42:00 12.65 14.71 13.68
#> 18 11013943 226548154 134.00 2022-04-23 00:07:00 131.76 153.13 142.44
#> 19 11013943 226547066 144.00 2022-04-22 23:31:00 131.76 153.13 142.44
#> 20 11013050 226547200 15.10 2022-04-22 23:27:00 14.34 16.66 15.50
rm(sp) # tidy up
由 reprex package (v2.0.1)
创建于 2022-06-01
datapesomolde %>%
group_split(OrdenFab) %>%
map_dfr( ~ .x %>% filter(!duplicated(Valor)))
#> # A tibble: 10 × 7
#> PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
#> <int> <int> <dbl> <dttm> <dbl> <dbl> <dbl>
#> 1 11013943 226547066 144 2022-04-22 23:31:00 132. 153. 142.
#> 2 11013050 226547200 15.1 2022-04-22 23:27:00 14.3 16.7 15.5
#> 3 11013943 226548154 134 2022-04-23 00:07:00 132. 153. 142.
#> 4 11012501 226548200 14.6 2022-04-23 01:43:00 12.6 14.7 13.7
#> 5 11012501 226548201 14.6 2022-04-23 01:42:00 12.6 14.7 13.7
#> 6 11012501 226548204 14.7 2022-04-23 01:44:00 12.6 14.7 13.7
#> 7 11013047 226548234 15.2 2022-04-23 02:47:00 14.4 16.8 15.6
#> 8 11013052 226548332 16.3 2022-04-23 01:49:00 15.6 18.2 16.9
#> 9 11013610 226548648 47.3 2022-04-25 05:52:00 42.4 49.3 45.8
#> 10 11012501 226549204 14.5 2022-04-25 07:18:00 12.6 14.7 13.7
由 reprex package (v2.0.1)
创建于 2022-06-01
我有一个 for 循环,我正在尝试转换为 mapply,因为我读到它比 for 快(for 循环大约需要 2 分钟)。
循环执行此操作:它创建按列“OrdenFab”的唯一名称过滤的子集,然后将未重复的值保留在“Valor”列中。 然后它将这个过滤后的子集添加到一个新的数据框,并随着循环的进行不断添加它们,得到一个过滤后的数据框,在“OrdenFab”列的每个唯一值的“Valor”列中没有重复值。
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
for (j in i){
datapesomoldetemp<-datapesomolde%>%
filter(OrdenFab==j)%>%
filter(!duplicated(Valor))
datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp)
}
原始数据框是这个(前 20 行,它有 20626):
> datapesomolde
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
3 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
4 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
5 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
6 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
7 11013047 226548234 15.20000 2022-04-23 02:48:00 14.43 16.77 15.60
8 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
9 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
10 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
11 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
12 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
13 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
14 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
15 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
16 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
17 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
18 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
19 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
20 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
过滤后的结果是这个(前10行):
>datapesomoldefiltered
PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 2022-04-25 07:18:00 12.65 14.71 13.68
2 11013610 226548648 47.30000 2022-04-25 05:52:00 42.38 49.26 45.82
3 11013047 226548234 15.20000 2022-04-23 02:47:00 14.43 16.77 15.60
4 11013052 226548332 16.30000 2022-04-23 01:49:00 15.63 18.17 16.90
5 11012501 226548204 14.70000 2022-04-23 01:44:00 12.65 14.71 13.68
6 11012501 226548200 14.55000 2022-04-23 01:43:00 12.65 14.71 13.68
7 11012501 226548201 14.65000 2022-04-23 01:42:00 12.65 14.71 13.68
8 11013943 226548154 134.00000 2022-04-23 00:07:00 131.76 153.13 142.44
9 11013943 226547066 144.00000 2022-04-22 23:31:00 131.76 153.13 142.44
10 11013050 226547200 15.10000 2022-04-22 23:27:00 14.34 16.66 15.50
我正在努力将其转换为 mapply,我得到的是矩阵而不是数据框。 我试过这个:
i<-unique(datapesomolde$OrdenFab)
datapesomoldefiltered<-data.frame()
limpiarof<-function(i){
subset<-filter(datapesomolde,OrdenFab==i)
datapesomoldetemp<-filter(subset,!duplicated(subset$Valor))
return(datapesomoldefiltered<-rbind(datapesomoldefiltered,datapesomoldetemp))
}
datapesomoldefiltered<-mapply(limpiarof,i)
通过这次尝试,我得到了一个 2.2GB 的矩阵,它只有“OrdenFab”列的每个唯一值的所有列的值。
result of mapply
你能帮帮我吗?提前致谢。
我建议使用更抽象的方法来解决这个问题,例如使用tidyverse
:
这应该会更快更清楚:
library(tidyverse)
datapesomoldefiltered <-
datapesomolde |>
group_by(OrdenFab) |>
distinct(Valor, .keep_all = TRUE) |>
ungroup()
datapesomoldefiltered
这里有两种方法。不同之处在于,在第一个解决方案中,原始行顺序保留在最终结果中。如果这无关紧要,第二个解决方案将跳过创建临时列表 sp
.
x <- " PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
1 11012501 226549204 14.50000 '2022-04-25 07:18:00' 12.65 14.71 13.68
2 11012501 226549204 14.50000 '2022-04-25 07:18:00' 12.65 14.71 13.68
3 11013610 226548648 47.30000 '2022-04-25 05:52:00' 42.38 49.26 45.82
4 11013047 226548234 15.20000 '2022-04-23 02:47:00' 14.43 16.77 15.60
5 11013047 226548234 15.20000 '2022-04-23 02:47:00' 14.43 16.77 15.60
6 11013047 226548234 15.20000 '2022-04-23 02:48:00' 14.43 16.77 15.60
7 11013047 226548234 15.20000 '2022-04-23 02:48:00' 14.43 16.77 15.60
8 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
9 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
10 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
11 11013052 226548332 16.30000 '2022-04-23 01:49:00' 15.63 18.17 16.90
12 11012501 226548204 14.70000 '2022-04-23 01:44:00' 12.65 14.71 13.68
13 11012501 226548204 14.70000 '2022-04-23 01:44:00' 12.65 14.71 13.68
14 11012501 226548200 14.55000 '2022-04-23 01:43:00' 12.65 14.71 13.68
15 11012501 226548200 14.55000 '2022-04-23 01:43:00' 12.65 14.71 13.68
16 11012501 226548201 14.65000 '2022-04-23 01:42:00' 12.65 14.71 13.68
17 11012501 226548201 14.65000 '2022-04-23 01:42:00' 12.65 14.71 13.68
18 11013943 226548154 134.00000 '2022-04-23 00:07:00' 131.76 153.13 142.44
19 11013943 226547066 144.00000 '2022-04-22 23:31:00' 131.76 153.13 142.44
20 11013050 226547200 15.10000 '2022-04-22 23:27:00' 14.34 16.66 15.50"
datapesomolde <- read.table(textConnection(x), header = TRUE)
suppressPackageStartupMessages({
library(dplyr)
library(purrr)
})
datapesomolde$Fecha_Registro <- as.POSIXct(datapesomolde$Fecha_Registro)
sp <- split(datapesomolde, datapesomolde$OrdenFab)
sp %>%
map_dfr( ~ .x %>% filter(!duplicated(Valor))) %>%
arrange(as.integer(row.names(.)))
#> PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
#> 1 11012501 226549204 14.50 2022-04-25 07:18:00 12.65 14.71 13.68
#> 3 11013610 226548648 47.30 2022-04-25 05:52:00 42.38 49.26 45.82
#> 4 11013047 226548234 15.20 2022-04-23 02:47:00 14.43 16.77 15.60
#> 8 11013052 226548332 16.30 2022-04-23 01:49:00 15.63 18.17 16.90
#> 12 11012501 226548204 14.70 2022-04-23 01:44:00 12.65 14.71 13.68
#> 14 11012501 226548200 14.55 2022-04-23 01:43:00 12.65 14.71 13.68
#> 16 11012501 226548201 14.65 2022-04-23 01:42:00 12.65 14.71 13.68
#> 18 11013943 226548154 134.00 2022-04-23 00:07:00 131.76 153.13 142.44
#> 19 11013943 226547066 144.00 2022-04-22 23:31:00 131.76 153.13 142.44
#> 20 11013050 226547200 15.10 2022-04-22 23:27:00 14.34 16.66 15.50
rm(sp) # tidy up
由 reprex package (v2.0.1)
创建于 2022-06-01datapesomolde %>%
group_split(OrdenFab) %>%
map_dfr( ~ .x %>% filter(!duplicated(Valor)))
#> # A tibble: 10 × 7
#> PartNumber OrdenFab Valor Fecha_Registro LimInf LimSup Nominal
#> <int> <int> <dbl> <dttm> <dbl> <dbl> <dbl>
#> 1 11013943 226547066 144 2022-04-22 23:31:00 132. 153. 142.
#> 2 11013050 226547200 15.1 2022-04-22 23:27:00 14.3 16.7 15.5
#> 3 11013943 226548154 134 2022-04-23 00:07:00 132. 153. 142.
#> 4 11012501 226548200 14.6 2022-04-23 01:43:00 12.6 14.7 13.7
#> 5 11012501 226548201 14.6 2022-04-23 01:42:00 12.6 14.7 13.7
#> 6 11012501 226548204 14.7 2022-04-23 01:44:00 12.6 14.7 13.7
#> 7 11013047 226548234 15.2 2022-04-23 02:47:00 14.4 16.8 15.6
#> 8 11013052 226548332 16.3 2022-04-23 01:49:00 15.6 18.2 16.9
#> 9 11013610 226548648 47.3 2022-04-25 05:52:00 42.4 49.3 45.8
#> 10 11012501 226549204 14.5 2022-04-25 07:18:00 12.6 14.7 13.7
由 reprex package (v2.0.1)
创建于 2022-06-01