我有一个有效的 mapply 函数 - 如何将其转换为 mlply?
I have a mapply function that works - how do I convert this to a mlply?
我编写了以下 R 代码,它将最近的邮政编码分配给一组北/东坐标:
# Set of northing / easting coordinates that I need to assign a postcode to
x1 <- c(1,2,4,6,7)
y1 <- c(5,2,4,7,8)
# Postcode with northing / easting coordinates
postcode <- c("Postcode A", "Postcode B", "Postcode C", "Postcode D")
x2 <- c(5,3,4,2)
y2 <- c(8,1,2,4)
# Function that attributes closest postcode to (x1, y1) coordinates
algo <- function(x, y)
{
dist <- which.min(sqrt(((x2 - x)^2) + ((y2 - y)^2)))
}
# mapply to run the function, and find the closest coordinates
postcode[mapply(algo, x1, y1, SIMPLIFY = T)]
[1] "Postcode D" "Postcode B" "Postcode C" "Postcode A" "Postcode A"
由于我有超过500,000 (x1, y1) 坐标和超过1,000,000 (x2, y2) 坐标,所以这个mapply函数需要很长时间才能运行,我想监控进度。我知道 mlply 有一个进度条功能,但我无法达到 运行。我所做的是:
# Using mlply to run the function, and find the closest coordinates with progress bar
library(plyr)
postcode[mlply(cbind(x1, y1), .fun = algo, .progress = "tk")]
我做错了什么?希望 mlply(或其他 m*ply 函数)的正确 R 代码,以及上述不正确原因的解释。
非常感谢您的宝贵时间和关注。
我至少发现了两个问题。
首先,数据框中列的名称与函数中参数的名称不匹配。以下代码在没有警告的情况下运行。
mlply(cbind(x= x1, y =y1), .fun = algo, .progress = "tk")
第二个,mlply returns 一个列表,其中的元素不能用于对您的邮政编码矢量进行子集化:
mlply(.data = cbind(x = x1, y = y1), .fun = algo, .progress = "tk")
$1
[1] 4
$2
[1] 2
$3
[1] 3
$4
[1] 1
$5
[1] 1
属性(,"split_type")
[1] "array"
属性(,"split_labels")
坐标
1 1 5
2 2 2
3 4 4
4 6 7
5 7 8
为了解决这个问题,我建议:
postcode[unlist(mlply(.data = cbind(x = x1, y = y1),
.fun = algo, .progress = "tk"))[1:length(x1)]]
Last,如果你试图寻找最小距离,你可以考虑直接寻找最小平方距离(你避免计算一百万次平方根,这应该会提高时间)。
我编写了以下 R 代码,它将最近的邮政编码分配给一组北/东坐标:
# Set of northing / easting coordinates that I need to assign a postcode to
x1 <- c(1,2,4,6,7)
y1 <- c(5,2,4,7,8)
# Postcode with northing / easting coordinates
postcode <- c("Postcode A", "Postcode B", "Postcode C", "Postcode D")
x2 <- c(5,3,4,2)
y2 <- c(8,1,2,4)
# Function that attributes closest postcode to (x1, y1) coordinates
algo <- function(x, y)
{
dist <- which.min(sqrt(((x2 - x)^2) + ((y2 - y)^2)))
}
# mapply to run the function, and find the closest coordinates
postcode[mapply(algo, x1, y1, SIMPLIFY = T)]
[1] "Postcode D" "Postcode B" "Postcode C" "Postcode A" "Postcode A"
由于我有超过500,000 (x1, y1) 坐标和超过1,000,000 (x2, y2) 坐标,所以这个mapply函数需要很长时间才能运行,我想监控进度。我知道 mlply 有一个进度条功能,但我无法达到 运行。我所做的是:
# Using mlply to run the function, and find the closest coordinates with progress bar
library(plyr)
postcode[mlply(cbind(x1, y1), .fun = algo, .progress = "tk")]
我做错了什么?希望 mlply(或其他 m*ply 函数)的正确 R 代码,以及上述不正确原因的解释。
非常感谢您的宝贵时间和关注。
我至少发现了两个问题。
首先,数据框中列的名称与函数中参数的名称不匹配。以下代码在没有警告的情况下运行。
mlply(cbind(x= x1, y =y1), .fun = algo, .progress = "tk")
第二个,mlply returns 一个列表,其中的元素不能用于对您的邮政编码矢量进行子集化:
mlply(.data = cbind(x = x1, y = y1), .fun = algo, .progress = "tk")
$1
[1] 4
$2
[1] 2
$3
[1] 3
$4
[1] 1
$5
[1] 1
属性(,"split_type") [1] "array" 属性(,"split_labels") 坐标 1 1 5 2 2 2 3 4 4 4 6 7 5 7 8
为了解决这个问题,我建议:
postcode[unlist(mlply(.data = cbind(x = x1, y = y1),
.fun = algo, .progress = "tk"))[1:length(x1)]]
Last,如果你试图寻找最小距离,你可以考虑直接寻找最小平方距离(你避免计算一百万次平方根,这应该会提高时间)。