从每日到每月汇总数据,其中每一列包含一天的数据,每一行代表坐标
Aggregating data from daily to monthly, where each column contains data for a day, and each row represents coordinates
我有一个大数据框,其中前两列是经度和纬度值(使每一行代表特定经度和纬度的一个点),随后的列是很长一段时间的每日数据(一对年)。因此,数据框非常大。
看起来像这样:
large dataframe
有没有办法将此数据框转换为每月数据框?
- 首先我想得到每年每个月的总和,
- 然后我想得到每个月的总数的平均值,最终得到一个包含 14 列的数据框:经度、纬度、一月、二月、...、十二月。
这可能吗?
当我的数据仅跨越一年时,我可以使用以下代码做到这一点:
mymonthlydata$`1`=apply(mydailydata[3:33],1,sum)
我每个月都这样做(例如,第 3 至 33 列是一月份的日子),但现在我有很多年了,我不太确定该怎么做。任何帮助将不胜感激,即使只是为我指明正确的方向。谢谢。
一些模仿OP截图的示例数据可以使用
dt <- data.table::data.table(lon = (0:9)+abs(rnorm(10)), lat = -89.5)[, as.character(seq.Date(as.Date("2015-01-01"), as.Date("2021-12-31"), by = "1 days")) := abs(rnorm(10)/10^8)]
str(dt)
# Classes ‘data.table’ and 'data.frame': 10 obs. of 2559 variables:
# $ lon : num 0.702 1.77 3.55 4.294 4.712 ...
# $ lat : num -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5
# $ 2015-01-01: num 0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
# $ 2015-01-02: num 0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
# $ 2015-01-03: num 0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
模仿 OP 屏幕截图的示例数据
dt <- data.table(lon = (0:9)+abs(rnorm(10)), lat = -89.5)[, as.character(seq.Date(as.Date("2015-01-01"), as.Date("2021-12-31"), by = "1 days")) := abs(rnorm(10)/10^8)]
data.table解法
library(data.table)
library(lubridate)
# setDT(dt) needed only if you have a data.frame (my sample is already a data.table)
dt_long <- melt(dt, id.vars = c("lon", "lat"), variable.name = "date")
dt_long[, `:=` (year = year(date), month = month(date, label = T, abbr = F)), by = date]
dt_long_sum <- dt_long[, .(sum = sum(value)), by = .(lon, lat, year, month)]
dt_long_mean <- dt_long_sum[, .(mean = mean(sum)), by = .(lon, lat, month)]
dcast(dt_long_mean, lon + lat ~ month, value.var = "mean")
结果
# lon lat January February March April May June July August September October
# 1: 0.7016 -89.5 0.000000150444 0.000000137272 0.000000150444 0.000000145591 0.000000150444 0.000000145591 0.000000150444 0.000000150444 0.000000145591 0.000000150444
# 2: 1.7702 -89.5 0.000000147013 0.000000134141 0.000000147013 0.000000142270 0.000000147013 0.000000142270 0.000000147013 0.000000147013 0.000000142270 0.000000147013
# 3: 3.5504 -89.5 0.000000313048 0.000000285639 0.000000313048 0.000000302950 0.000000313048 0.000000302950 0.000000313048 0.000000313048 0.000000302950 0.000000313048
# 4: 4.2941 -89.5 0.000000318180 0.000000290321 0.000000318180 0.000000307916 0.000000318180 0.000000307916 0.000000318180 0.000000318180 0.000000307916 0.000000318180
# 5: 4.7121 -89.5 0.000000087354 0.000000079705 0.000000087354 0.000000084536 0.000000087354 0.000000084536 0.000000087354 0.000000087354 0.000000084536 0.000000087354
# 6: 5.6789 -89.5 0.000000202693 0.000000184945 0.000000202693 0.000000196154 0.000000202693 0.000000196154 0.000000202693 0.000000202693 0.000000196154 0.000000202693
# 7: 6.4409 -89.5 0.000000168332 0.000000153593 0.000000168332 0.000000162902 0.000000168332 0.000000162902 0.000000168332 0.000000168332 0.000000162902 0.000000168332
# 8: 8.3095 -89.5 0.000000037445 0.000000034166 0.000000037445 0.000000036237 0.000000037445 0.000000036237 0.000000037445 0.000000037445 0.000000036237 0.000000037445
# 9: 8.8061 -89.5 0.000000051242 0.000000046756 0.000000051242 0.000000049589 0.000000051242 0.000000049589 0.000000051242 0.000000051242 0.000000049589 0.000000051242
# 10: 9.2667 -89.5 0.000000338410 0.000000308779 0.000000338410 0.000000327493 0.000000338410 0.000000327493 0.000000338410 0.000000338410 0.000000327493 0.000000338410
# November December
# 1: 0.000000145591 0.000000150444
# 2: 0.000000142270 0.000000147013
# 3: 0.000000302950 0.000000313048
# 4: 0.000000307916 0.000000318180
# 5: 0.000000084536 0.000000087354
# 6: 0.000000196154 0.000000202693
# 7: 0.000000162902 0.000000168332
# 8: 0.000000036237 0.000000037445
# 9: 0.000000049589 0.000000051242
# 10: 0.000000327493 0.000000338410
我有一个大数据框,其中前两列是经度和纬度值(使每一行代表特定经度和纬度的一个点),随后的列是很长一段时间的每日数据(一对年)。因此,数据框非常大。
看起来像这样: large dataframe
有没有办法将此数据框转换为每月数据框?
- 首先我想得到每年每个月的总和,
- 然后我想得到每个月的总数的平均值,最终得到一个包含 14 列的数据框:经度、纬度、一月、二月、...、十二月。
这可能吗?
当我的数据仅跨越一年时,我可以使用以下代码做到这一点:
mymonthlydata$`1`=apply(mydailydata[3:33],1,sum)
我每个月都这样做(例如,第 3 至 33 列是一月份的日子),但现在我有很多年了,我不太确定该怎么做。任何帮助将不胜感激,即使只是为我指明正确的方向。谢谢。
一些模仿OP截图的示例数据可以使用
dt <- data.table::data.table(lon = (0:9)+abs(rnorm(10)), lat = -89.5)[, as.character(seq.Date(as.Date("2015-01-01"), as.Date("2021-12-31"), by = "1 days")) := abs(rnorm(10)/10^8)]
str(dt)
# Classes ‘data.table’ and 'data.frame': 10 obs. of 2559 variables:
# $ lon : num 0.702 1.77 3.55 4.294 4.712 ...
# $ lat : num -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5 -89.5
# $ 2015-01-01: num 0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
# $ 2015-01-02: num 0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
# $ 2015-01-03: num 0.00000000485 0.00000000474 0.0000000101 0.00000001026 0.00000000282 ...
模仿 OP 屏幕截图的示例数据
dt <- data.table(lon = (0:9)+abs(rnorm(10)), lat = -89.5)[, as.character(seq.Date(as.Date("2015-01-01"), as.Date("2021-12-31"), by = "1 days")) := abs(rnorm(10)/10^8)]
data.table解法
library(data.table)
library(lubridate)
# setDT(dt) needed only if you have a data.frame (my sample is already a data.table)
dt_long <- melt(dt, id.vars = c("lon", "lat"), variable.name = "date")
dt_long[, `:=` (year = year(date), month = month(date, label = T, abbr = F)), by = date]
dt_long_sum <- dt_long[, .(sum = sum(value)), by = .(lon, lat, year, month)]
dt_long_mean <- dt_long_sum[, .(mean = mean(sum)), by = .(lon, lat, month)]
dcast(dt_long_mean, lon + lat ~ month, value.var = "mean")
结果
# lon lat January February March April May June July August September October
# 1: 0.7016 -89.5 0.000000150444 0.000000137272 0.000000150444 0.000000145591 0.000000150444 0.000000145591 0.000000150444 0.000000150444 0.000000145591 0.000000150444
# 2: 1.7702 -89.5 0.000000147013 0.000000134141 0.000000147013 0.000000142270 0.000000147013 0.000000142270 0.000000147013 0.000000147013 0.000000142270 0.000000147013
# 3: 3.5504 -89.5 0.000000313048 0.000000285639 0.000000313048 0.000000302950 0.000000313048 0.000000302950 0.000000313048 0.000000313048 0.000000302950 0.000000313048
# 4: 4.2941 -89.5 0.000000318180 0.000000290321 0.000000318180 0.000000307916 0.000000318180 0.000000307916 0.000000318180 0.000000318180 0.000000307916 0.000000318180
# 5: 4.7121 -89.5 0.000000087354 0.000000079705 0.000000087354 0.000000084536 0.000000087354 0.000000084536 0.000000087354 0.000000087354 0.000000084536 0.000000087354
# 6: 5.6789 -89.5 0.000000202693 0.000000184945 0.000000202693 0.000000196154 0.000000202693 0.000000196154 0.000000202693 0.000000202693 0.000000196154 0.000000202693
# 7: 6.4409 -89.5 0.000000168332 0.000000153593 0.000000168332 0.000000162902 0.000000168332 0.000000162902 0.000000168332 0.000000168332 0.000000162902 0.000000168332
# 8: 8.3095 -89.5 0.000000037445 0.000000034166 0.000000037445 0.000000036237 0.000000037445 0.000000036237 0.000000037445 0.000000037445 0.000000036237 0.000000037445
# 9: 8.8061 -89.5 0.000000051242 0.000000046756 0.000000051242 0.000000049589 0.000000051242 0.000000049589 0.000000051242 0.000000051242 0.000000049589 0.000000051242
# 10: 9.2667 -89.5 0.000000338410 0.000000308779 0.000000338410 0.000000327493 0.000000338410 0.000000327493 0.000000338410 0.000000338410 0.000000327493 0.000000338410
# November December
# 1: 0.000000145591 0.000000150444
# 2: 0.000000142270 0.000000147013
# 3: 0.000000302950 0.000000313048
# 4: 0.000000307916 0.000000318180
# 5: 0.000000084536 0.000000087354
# 6: 0.000000196154 0.000000202693
# 7: 0.000000162902 0.000000168332
# 8: 0.000000036237 0.000000037445
# 9: 0.000000049589 0.000000051242
# 10: 0.000000327493 0.000000338410