R中数据帧中dput(head(data, 20))的输出
The output of dput(head(data, 20)) in data frame in R
我有一个如下所示的数据集 (name:data),其中包含多个国家/地区,在 3 列和 251453 行中的不同日期具有多种事件类型。我想计算每个国家/地区的每月事件。比如我想看《八月》中《也门》的《战役》有多少?我总共有 6 种不同的事件类型和 8 个不同的国家。
尽管花了好几个小时,但还是无法取得任何进展。感谢任何指导。
|event_date| |event_type| |country|
12 March 2021 Explosions/Remote violence; Yemen;
12 March 2021 Explosions/Remote violence Yemen
12 March 2021 Battles Afghanistan;
12 March 2021 Battles Afghanistan
12 March 2021 Protests Yemen
12 March 2021 Protests Yemen
dput(样本)的输出
dput(head(data, 20))
structure(list(event_date = structure(c(420L, 420L, 420L, 420L,
420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L,
420L, 420L, 420L, 420L, 420L), .Label = c("01 April 2018", "01 April 2019",
"01 April 2020", "01 August 2018", "01 August 2019", "01 August 2020",
"01 December 2018", "01 December 2019", "01 December 2020", "01 February 2019",
event_type = structure(c(2L, 2L, 1L, 1L, 3L, 3L, 3L, 3L,
4L, 1L, 1L, 3L, 4L, 3L, 1L, 1L, 4L, 6L, 6L, 3L), .Label = c("Battles",
"Explosions/Remote violence", "Protests", "Riots", "Strategic developments",
"Violence against civilians"), class = "factor"), country = structure(c(8L,
8L, 1L, 1L, 8L, 8L, 3L, 5L, 8L, 8L, 8L, 5L, 5L, 5L, 1L, 1L,
5L, 8L, 8L, 4L), .Label = c("Afghanistan", "Colombia", "India",
"Iraq", "Lebanon", "Libya", "Mali", "Yemen"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")
这可以用 aggregate
完成,只要日期是实际日期。
首先,将列 event_date
强制转换为 class "Date"
。
data$event_date <- as.Date(data$event_date, format = "%d %B %Y")
现在有两种方法,第一种不考虑年按月计算,第二种按年月计算。
month <- format(data$event_date, "%B")
aggregate(event_type ~ month + country, data, length)
yearmonth <- format(data$event_date, "%Y %B")
aggregate(event_type ~ yearmonth + country, data, length)
我有一个如下所示的数据集 (name:data),其中包含多个国家/地区,在 3 列和 251453 行中的不同日期具有多种事件类型。我想计算每个国家/地区的每月事件。比如我想看《八月》中《也门》的《战役》有多少?我总共有 6 种不同的事件类型和 8 个不同的国家。
尽管花了好几个小时,但还是无法取得任何进展。感谢任何指导。
|event_date| |event_type| |country|
12 March 2021 Explosions/Remote violence; Yemen;
12 March 2021 Explosions/Remote violence Yemen
12 March 2021 Battles Afghanistan;
12 March 2021 Battles Afghanistan
12 March 2021 Protests Yemen
12 March 2021 Protests Yemen
dput(样本)的输出
dput(head(data, 20))
structure(list(event_date = structure(c(420L, 420L, 420L, 420L,
420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L, 420L,
420L, 420L, 420L, 420L, 420L), .Label = c("01 April 2018", "01 April 2019",
"01 April 2020", "01 August 2018", "01 August 2019", "01 August 2020",
"01 December 2018", "01 December 2019", "01 December 2020", "01 February 2019",
event_type = structure(c(2L, 2L, 1L, 1L, 3L, 3L, 3L, 3L,
4L, 1L, 1L, 3L, 4L, 3L, 1L, 1L, 4L, 6L, 6L, 3L), .Label = c("Battles",
"Explosions/Remote violence", "Protests", "Riots", "Strategic developments",
"Violence against civilians"), class = "factor"), country = structure(c(8L,
8L, 1L, 1L, 8L, 8L, 3L, 5L, 8L, 8L, 8L, 5L, 5L, 5L, 1L, 1L,
5L, 8L, 8L, 4L), .Label = c("Afghanistan", "Colombia", "India",
"Iraq", "Lebanon", "Libya", "Mali", "Yemen"), class = "factor")), row.names = c(NA,
20L), class = "data.frame")
这可以用 aggregate
完成,只要日期是实际日期。
首先,将列 event_date
强制转换为 class "Date"
。
data$event_date <- as.Date(data$event_date, format = "%d %B %Y")
现在有两种方法,第一种不考虑年按月计算,第二种按年月计算。
month <- format(data$event_date, "%B")
aggregate(event_type ~ month + country, data, length)
yearmonth <- format(data$event_date, "%Y %B")
aggregate(event_type ~ yearmonth + country, data, length)