使用 R 中的 dcast 重塑 EPA 风速和风向数据
Reshaping EPA wind speed & direction data with dcast in R
我正在尝试将长格式风数据转换为宽格式。 Parameter.Name 列中列出了风速和风向。这些值需要由 Local.Site.Name 和 Date.Local 变量转换。
如果每个唯一 Local.Site.Name + Date.Local 行有多个观察值,那么我需要这些观察值的平均值。内置参数 'fun.aggregate = mean' 适用于风速,但不能以这种方式计算平均风向,因为值以度为单位。例如,靠近北 (350, 10) 的两个风向的平均值将输出为南 (180)。例如:((350 + 10)/2 = 180),尽管极平均值为 360 或 0。
'circular' 包将允许我们计算平均风向而无需执行任何三角函数,但我无法尝试将此附加函数嵌套在 'fun.aggregate' 参数中。我认为一个简单的 else if 语句就可以解决问题,但我 运行 遇到以下错误:
Error in vaggregate(.value = value, .group = overall, .fun = fun.aggregate, : could not find function ".fun"
In addition: Warning messages:
1: In if (wind$Parameter.Name == "Wind Direction - Resultant") { :
the condition has length > 1 and only the first element will be used
2: In if (wind$Parameter.Name == "Wind Speed - Resultant") { :
the condition has length > 1 and only the first element will be used
3: In mean.default(wind$"Wind Speed - Resultant") :
argument is not numeric or logical: returning NA
目标是能够将 fun.aggregate = mean
用于风速,而将 mean(circular(Wind Direction, units = 'degrees')
用于风向。
这是原始数据(>100MB):
https://drive.google.com/open?id=0By6o_bZ8CGwuUUhGdk9ONTgtT0E
这是数据的一个子集(前 100 行):
https://drive.google.com/open?id=0By6o_bZ8CGwucVZGT0pBQlFzT2M
这是我的脚本:
library(reshape2)
library(dplyr)
library(circular)
#read in the long format data:
wind <- read.csv("<INSERT_FILE_PATH_HERE>", header = TRUE)
#cast into wide format:
wind.w <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = (
if (wind$Parameter.Name == "Wind Direction - Resultant") {
mean(circular(wind$"Wind Direction - Resultant", units = 'degrees'))
}
else if (wind$Parameter.Name == "Wind Speed - Resultant") {
mean(wind$"Wind Speed - Resultant")
}),
na.rm = TRUE)
如有任何帮助,我们将不胜感激!
-间隔火花
编辑:这是解决方案:
library(reshape2)
library(SDMTools)
library(dplyr)
#read in the EPA wind data:
#This data is publicly accessible, and can be found here: https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html
wind <- read.csv("daily_WIND_2016.csv", sep = ',', header = TRUE, stringsAsFactors = FALSE)
#convert long format wind speed data by date and site id:
wind_speed <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
mean(x, na.rm=TRUE)
},
subset = .(Parameter.Name == "Wind Speed - Resultant")
)
#convert long format wind direction data into wide format by date and local site id:
wind_direction <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
if(length(x) > 0)
circular.averaging(x, deg = TRUE)
else
-1
},
subset= .(Parameter.Name == "Wind Direction - Resultant")
)
#join the wide format split wind_speed and wind_direction dataframes
wind.w <- merge(wind_speed, wind_direction)
您在定义 wind.w
的代码中使用了 wind.w
- 这是行不通的!
您还使用了斜引号 (`) 而不是直引号 (')。应该使用直引号来划定字符串。
您可以在 dcast 中使用子集来应用这两个函数并获得单独的数据帧然后合并它们
library(reshape2)
library(dplyr)
library(circular)
#cast into wide format:
wind_speed <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
mean(x, na.rm=TRUE)
},
subset=.(Parameter.Name == "Wind Speed - Resultant")
)
wind_direction <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
if(length(x) > 0)
mean(circular(c(x), units="degrees"), na.rm=TRUE)
else
-1
},
subset=.(Parameter.Name == "Wind Direction - Resultant")
)
wind.w <- merge(wind_speed, wind_direction)
好的,感谢您的帮助,我设法解决了这个讨厌的风向问题。有时解决问题只是知道要问正确的问题。就我而言,学习术语 'vector-averaging' 就是我所需要的! SDMTools
包中有一个名为 circular.averaging()
的 built-in vector-averaging 函数,它平均风向并产生仍在 0-359 度之间的输出!我最后做的是附加 tjjjohnson 的脚本。我将 fun.aggregate
参数从 mean(circular(c(x), units = "degrees"), na.rm = TRUE)
更改为 circular.averaging(x, deg = TRUE)
这是 raw and aggregated 数据的直方图!一切看起来都很好,谢谢大家!
我正在尝试将长格式风数据转换为宽格式。 Parameter.Name 列中列出了风速和风向。这些值需要由 Local.Site.Name 和 Date.Local 变量转换。
如果每个唯一 Local.Site.Name + Date.Local 行有多个观察值,那么我需要这些观察值的平均值。内置参数 'fun.aggregate = mean' 适用于风速,但不能以这种方式计算平均风向,因为值以度为单位。例如,靠近北 (350, 10) 的两个风向的平均值将输出为南 (180)。例如:((350 + 10)/2 = 180),尽管极平均值为 360 或 0。
'circular' 包将允许我们计算平均风向而无需执行任何三角函数,但我无法尝试将此附加函数嵌套在 'fun.aggregate' 参数中。我认为一个简单的 else if 语句就可以解决问题,但我 运行 遇到以下错误:
Error in vaggregate(.value = value, .group = overall, .fun = fun.aggregate, : could not find function ".fun"
In addition: Warning messages:
1: In if (wind$Parameter.Name == "Wind Direction - Resultant") { :
the condition has length > 1 and only the first element will be used
2: In if (wind$Parameter.Name == "Wind Speed - Resultant") { :
the condition has length > 1 and only the first element will be used
3: In mean.default(wind$"Wind Speed - Resultant") :
argument is not numeric or logical: returning NA
目标是能够将 fun.aggregate = mean
用于风速,而将 mean(circular(Wind Direction, units = 'degrees')
用于风向。
这是原始数据(>100MB): https://drive.google.com/open?id=0By6o_bZ8CGwuUUhGdk9ONTgtT0E
这是数据的一个子集(前 100 行): https://drive.google.com/open?id=0By6o_bZ8CGwucVZGT0pBQlFzT2M
这是我的脚本:
library(reshape2)
library(dplyr)
library(circular)
#read in the long format data:
wind <- read.csv("<INSERT_FILE_PATH_HERE>", header = TRUE)
#cast into wide format:
wind.w <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = (
if (wind$Parameter.Name == "Wind Direction - Resultant") {
mean(circular(wind$"Wind Direction - Resultant", units = 'degrees'))
}
else if (wind$Parameter.Name == "Wind Speed - Resultant") {
mean(wind$"Wind Speed - Resultant")
}),
na.rm = TRUE)
如有任何帮助,我们将不胜感激!
-间隔火花
编辑:这是解决方案:
library(reshape2)
library(SDMTools)
library(dplyr)
#read in the EPA wind data:
#This data is publicly accessible, and can be found here: https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download_files.html
wind <- read.csv("daily_WIND_2016.csv", sep = ',', header = TRUE, stringsAsFactors = FALSE)
#convert long format wind speed data by date and site id:
wind_speed <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
mean(x, na.rm=TRUE)
},
subset = .(Parameter.Name == "Wind Speed - Resultant")
)
#convert long format wind direction data into wide format by date and local site id:
wind_direction <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
if(length(x) > 0)
circular.averaging(x, deg = TRUE)
else
-1
},
subset= .(Parameter.Name == "Wind Direction - Resultant")
)
#join the wide format split wind_speed and wind_direction dataframes
wind.w <- merge(wind_speed, wind_direction)
您在定义 wind.w
的代码中使用了 wind.w
- 这是行不通的!
您还使用了斜引号 (`) 而不是直引号 (')。应该使用直引号来划定字符串。
您可以在 dcast 中使用子集来应用这两个函数并获得单独的数据帧然后合并它们
library(reshape2)
library(dplyr)
library(circular)
#cast into wide format:
wind_speed <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
mean(x, na.rm=TRUE)
},
subset=.(Parameter.Name == "Wind Speed - Resultant")
)
wind_direction <- dcast(wind,
Local.Site.Name + Date.Local ~ Parameter.Name,
value.var = "Arithmetic.Mean",
fun.aggregate = function(x) {
if(length(x) > 0)
mean(circular(c(x), units="degrees"), na.rm=TRUE)
else
-1
},
subset=.(Parameter.Name == "Wind Direction - Resultant")
)
wind.w <- merge(wind_speed, wind_direction)
好的,感谢您的帮助,我设法解决了这个讨厌的风向问题。有时解决问题只是知道要问正确的问题。就我而言,学习术语 'vector-averaging' 就是我所需要的! SDMTools
包中有一个名为 circular.averaging()
的 built-in vector-averaging 函数,它平均风向并产生仍在 0-359 度之间的输出!我最后做的是附加 tjjjohnson 的脚本。我将 fun.aggregate
参数从 mean(circular(c(x), units = "degrees"), na.rm = TRUE)
更改为 circular.averaging(x, deg = TRUE)
这是 raw and aggregated 数据的直方图!一切看起来都很好,谢谢大家!