从具有每日数据的数据框中绘制每月时间序列
Plot monthly Time series from a data frame with daily data
我有一个从 2014 年 1 月 1 日到 2012 年 12 月 31 日期间纽约市每天发生的机动车事故的数据集。我想在单个图中绘制受伤骑车人和驾车者数量的时间序列。
我的数据是这样的:
Date Time Location Cyclists injured Motorists injured
2014-1-1 12:05 Bronx 0 1
2014-1-1 12:34 Bronx 1 2
2014-1-2 6:05 Bronx 0 0
2014-1-3 8:01 Bronx 1 2
2014-1-3 12:05 Manhattan 0 1
2014-1-3 12:56 Manhattan 0 2
依此类推,直到 2014 年 12 月 31 日。
现在要为此绘制月度时间序列,我知道我首先需要计算每个月的每个总和,然后绘制月度总计。但是我不知道怎么办。
我使用这段代码使用了聚合函数,但是它给出了每天而不是每月的总和。请帮忙
cyclist <- aggregate(NUMBER.OF.CYCLIST.INJURED ~ DATE, data = final_data,sum)
谢谢:)
Mannat 这里是一个使用data.table
包帮助你聚合的答案。使用 install.packages(data.table)
首先将其放入您的 R.
library(data.table)
# For others
# I copied your data into a csv file, Mannat you will not need this step,
# other helpers look at data in DATA section below
final_data <- as.data.table(read.csv(file.path(mypath, "SOaccidents.csv"),
header = TRUE,
stringsAsFactors = FALSE))
# For Mannat
# Mannat you will need to convert your existing data.frame to data.table
final_data <- as.data.table(final_data)
# check data formats, dates are strings
# and field is Date not DATE
str(final_data)
final_data$Date <- as.Date(final_data$Date, "%m/%d/%Y")
# use data table to aggregate on months
# First lets add a field plot date with Year and Month YYYYMM 201401
final_data[, PlotDate := as.numeric(format(Date, "%Y%m"))]
# key by this plot date
setkeyv(final_data, "PlotDate")
# second we aggregate with by , and label columns
plotdata <- final_data[, .(Cyclists.monthly = sum(Cyclists.injured),
Motorists.monthly = sum(Motorists.injured)), by = PlotDate]
# PlotDate Cyclists.monthly Motorists.monthly
#1: 201401 2 8
# You can then plot this (makes more sense with more data)
# for example, for cyclists
plot(plotdata$PlotDate, plotdata$Cyclists.monthly)
Mannat如果你不熟悉data.table
,请看cheatsheet
数据
对于希望从事此工作的其他人。这是 dput 的结果:
final_data <- data.table(Date = c("01/01/2014", "01/01/2014", "01/01/2014",
"01/01/2014", "1/19/2014", "1/19/2014"),
Time = c("12:05", "12:34","06:05", "08:01", "12:05", "12:56"),
Location = c("Bronx", "Bronx","Bronx", "Bronx",
"Manhattan", "Manhattan"),
Cyclists.injured = c(0L, 1L, 0L, 1L, 0L, 0L),
Motorists.injured = c(1L, 2L, 0L, 2L, 1L, 2L))
情节
要么使用ggplot2
包
或者对于绘图,请参阅 Plot multiple lines (data series) each with unique color in R 以获得绘图帮助。
# I do not have your full data so one point line charts not working
# I needed another month for testing, so added a fake February
testfeb <- data.table(PlotDate = 201402, Cyclists.monthly = 4,
Motorists.monthly = 10)
plotdata <- rbindlist(list(plotdata, testfeb))
# PlotDate Cyclists.monthly Motorists.monthly
#1 201401 2 8
#2 201402 4 10
# Plot code, modify the limits as you see fit
plot(1, type = "n",
xlim = c(201401,201412),
ylim = c(0, max(plotdata$Motorists.monthly)),
ylab = 'monthly accidents',
xlab = 'months')
lines(plotdata$PlotDate, plotdata$Motorists.monthly, col = "blue")
lines(plotdata$PlotDate, plotdata$Cyclists.monthly, col = "red")
# to add legend
legend(x = "topright", legend = c("Motorists","Cyclists"),
lty=c(1,1,1), lwd=c(2.5,2.5,2.5),
col=c("blue", "red"))
# or set legend inset x to another position e.g. "bottom" or "bottomleft"
我有一个从 2014 年 1 月 1 日到 2012 年 12 月 31 日期间纽约市每天发生的机动车事故的数据集。我想在单个图中绘制受伤骑车人和驾车者数量的时间序列。
我的数据是这样的:
Date Time Location Cyclists injured Motorists injured
2014-1-1 12:05 Bronx 0 1
2014-1-1 12:34 Bronx 1 2
2014-1-2 6:05 Bronx 0 0
2014-1-3 8:01 Bronx 1 2
2014-1-3 12:05 Manhattan 0 1
2014-1-3 12:56 Manhattan 0 2
依此类推,直到 2014 年 12 月 31 日。
现在要为此绘制月度时间序列,我知道我首先需要计算每个月的每个总和,然后绘制月度总计。但是我不知道怎么办。
我使用这段代码使用了聚合函数,但是它给出了每天而不是每月的总和。请帮忙
cyclist <- aggregate(NUMBER.OF.CYCLIST.INJURED ~ DATE, data = final_data,sum)
谢谢:)
Mannat 这里是一个使用data.table
包帮助你聚合的答案。使用 install.packages(data.table)
首先将其放入您的 R.
library(data.table)
# For others
# I copied your data into a csv file, Mannat you will not need this step,
# other helpers look at data in DATA section below
final_data <- as.data.table(read.csv(file.path(mypath, "SOaccidents.csv"),
header = TRUE,
stringsAsFactors = FALSE))
# For Mannat
# Mannat you will need to convert your existing data.frame to data.table
final_data <- as.data.table(final_data)
# check data formats, dates are strings
# and field is Date not DATE
str(final_data)
final_data$Date <- as.Date(final_data$Date, "%m/%d/%Y")
# use data table to aggregate on months
# First lets add a field plot date with Year and Month YYYYMM 201401
final_data[, PlotDate := as.numeric(format(Date, "%Y%m"))]
# key by this plot date
setkeyv(final_data, "PlotDate")
# second we aggregate with by , and label columns
plotdata <- final_data[, .(Cyclists.monthly = sum(Cyclists.injured),
Motorists.monthly = sum(Motorists.injured)), by = PlotDate]
# PlotDate Cyclists.monthly Motorists.monthly
#1: 201401 2 8
# You can then plot this (makes more sense with more data)
# for example, for cyclists
plot(plotdata$PlotDate, plotdata$Cyclists.monthly)
Mannat如果你不熟悉data.table
,请看cheatsheet
数据
对于希望从事此工作的其他人。这是 dput 的结果:
final_data <- data.table(Date = c("01/01/2014", "01/01/2014", "01/01/2014",
"01/01/2014", "1/19/2014", "1/19/2014"),
Time = c("12:05", "12:34","06:05", "08:01", "12:05", "12:56"),
Location = c("Bronx", "Bronx","Bronx", "Bronx",
"Manhattan", "Manhattan"),
Cyclists.injured = c(0L, 1L, 0L, 1L, 0L, 0L),
Motorists.injured = c(1L, 2L, 0L, 2L, 1L, 2L))
情节
要么使用ggplot2
包
或者对于绘图,请参阅 Plot multiple lines (data series) each with unique color in R 以获得绘图帮助。
# I do not have your full data so one point line charts not working
# I needed another month for testing, so added a fake February
testfeb <- data.table(PlotDate = 201402, Cyclists.monthly = 4,
Motorists.monthly = 10)
plotdata <- rbindlist(list(plotdata, testfeb))
# PlotDate Cyclists.monthly Motorists.monthly
#1 201401 2 8
#2 201402 4 10
# Plot code, modify the limits as you see fit
plot(1, type = "n",
xlim = c(201401,201412),
ylim = c(0, max(plotdata$Motorists.monthly)),
ylab = 'monthly accidents',
xlab = 'months')
lines(plotdata$PlotDate, plotdata$Motorists.monthly, col = "blue")
lines(plotdata$PlotDate, plotdata$Cyclists.monthly, col = "red")
# to add legend
legend(x = "topright", legend = c("Motorists","Cyclists"),
lty=c(1,1,1), lwd=c(2.5,2.5,2.5),
col=c("blue", "red"))
# or set legend inset x to another position e.g. "bottom" or "bottomleft"