考虑到日期范围,在 R 中创建从长格式到宽格式的时间序列列
Creating Time Series columns in R from Long to Wide format considering Date Range
首先,我已经成功地将我的数据从长格式转换为宽格式。
数据如下
+======+==========+======+======+
| Name | Date | Val1 | Val2 |
+======+==========+======+======+
| A | 1/1/2018 | 1 | 2 |
+------+----------+------+------+
| B | 1/1/2018 | 2 | 3 |
+------+----------+------+------+
| C | 1/1/2018 | 3 | 4 |
+------+----------+------+------+
| D | 1/4/2018 | 4 | 5 |
+------+----------+------+------+
| A | 1/4/2018 | 5 | 6 |
+------+----------+------+------+
| B | 1/4/2018 | 6 | 7 |
+------+----------+------+------+
| C | 1/4/2018 | 7 | 8 |
+------+----------+------+------+
为了将上面的 table 从长格式转换为宽格式,我使用了以下代码行:
test_wide <- reshape(test_data, idvar = 'Name', timevar = 'Date', direction = "wide" )
以上代码的结果如下:
+======+===============+===============+===============+===============+
| Name | Val1.1/1/2018 | Val2.1/1/2018 | Val1.1/4/2018 | Val2.1/4/2018 |
+======+===============+===============+===============+===============+
| A | 1 | 2 | 5 | 6 |
+------+---------------+---------------+---------------+---------------+
| B | 2 | 3 | 6 | 7 |
+------+---------------+---------------+---------------+---------------+
| C | 3 | 4 | 7 | 8 |
+------+---------------+---------------+---------------+---------------+
| D | NA | NA | 4 | 5 |
+------+---------------+---------------+---------------+---------------+
我面临的问题是我需要 R 考虑日期格式的 Date
列。日期列的范围从 1/1/2018
到 1/4/2018
,因为日期 1/2/2018
和 1/3/2018
中没有值,我不会看到任何列,如 Val1.1/2/2018
、Val2.1/3/2018
、Val3.1/2/2018
和 Val3.1/3/2018
.
我想转换为宽格式,以便我可以获得日期 1/2/2018
和 1/3/2018
的列,即使这些列仅包含 NULLS。
这样做的原因是我需要将数据用作时间序列。
编辑:
复制粘贴的初始数据:
Name Date Val1 Val2
A 1/1/2018 1 2
B 1/1/2018 2 3
C 1/1/2018 3 4
D 1/4/2018 4 5
A 1/4/2018 5 6
B 1/4/2018 6 7
C 1/4/2018 7 8
", header=TRUE)
转换后的数据复制粘贴:
Name,Val1.1/1/2018,Val2.1/1/2018,Val1.1/4/2018,Val2.1/4/2018
A,1,2,5,6
B,2,3,6,7
C,3,4,7,8
D,NA,NA,4,5
dput(test_data) 结果:
structure(list(Name = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("A",
"B ", "C", "D"), class = "factor"), Date = structure(c(1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("1/1/2018", "1/4/2018"), class = "factor"),
Val1 = 1:7, Val2 = 2:8), class = "data.frame", row.names = c(NA,
-7L))
library(dplyr)
library(tidyr) #complete
library(data.table) #dcast and setDT
df %>% mutate(Date=as.Date(Date,'%m/%d/%Y')) %>%
complete(Name, nesting(Date=full_seq(Date,1))) %>%
setDT(.) %>% dcast(Name ~ Date, value.var=c('Val2','Val1'))
一个tidyverse
选项
library(lubridate)
library(tidyverse)
df %>%
mutate(Date=mdy(Date)) %>%
#Or you can do as.Date(Date,'%m/%d/%Y') to avoid loading `lubridate`
complete(Name, Date = seq(min(Date), max(Date), 1)) %>%
gather(key, value, -Name, -Date) %>%
unite(Date, key, Date, sep = ".") %>%
spread(Date, value)
首先,我已经成功地将我的数据从长格式转换为宽格式。 数据如下
+======+==========+======+======+
| Name | Date | Val1 | Val2 |
+======+==========+======+======+
| A | 1/1/2018 | 1 | 2 |
+------+----------+------+------+
| B | 1/1/2018 | 2 | 3 |
+------+----------+------+------+
| C | 1/1/2018 | 3 | 4 |
+------+----------+------+------+
| D | 1/4/2018 | 4 | 5 |
+------+----------+------+------+
| A | 1/4/2018 | 5 | 6 |
+------+----------+------+------+
| B | 1/4/2018 | 6 | 7 |
+------+----------+------+------+
| C | 1/4/2018 | 7 | 8 |
+------+----------+------+------+
为了将上面的 table 从长格式转换为宽格式,我使用了以下代码行:
test_wide <- reshape(test_data, idvar = 'Name', timevar = 'Date', direction = "wide" )
以上代码的结果如下:
+======+===============+===============+===============+===============+
| Name | Val1.1/1/2018 | Val2.1/1/2018 | Val1.1/4/2018 | Val2.1/4/2018 |
+======+===============+===============+===============+===============+
| A | 1 | 2 | 5 | 6 |
+------+---------------+---------------+---------------+---------------+
| B | 2 | 3 | 6 | 7 |
+------+---------------+---------------+---------------+---------------+
| C | 3 | 4 | 7 | 8 |
+------+---------------+---------------+---------------+---------------+
| D | NA | NA | 4 | 5 |
+------+---------------+---------------+---------------+---------------+
我面临的问题是我需要 R 考虑日期格式的 Date
列。日期列的范围从 1/1/2018
到 1/4/2018
,因为日期 1/2/2018
和 1/3/2018
中没有值,我不会看到任何列,如 Val1.1/2/2018
、Val2.1/3/2018
、Val3.1/2/2018
和 Val3.1/3/2018
.
我想转换为宽格式,以便我可以获得日期 1/2/2018
和 1/3/2018
的列,即使这些列仅包含 NULLS。
这样做的原因是我需要将数据用作时间序列。
编辑:
复制粘贴的初始数据:
Name Date Val1 Val2
A 1/1/2018 1 2
B 1/1/2018 2 3
C 1/1/2018 3 4
D 1/4/2018 4 5
A 1/4/2018 5 6
B 1/4/2018 6 7
C 1/4/2018 7 8
", header=TRUE)
转换后的数据复制粘贴:
Name,Val1.1/1/2018,Val2.1/1/2018,Val1.1/4/2018,Val2.1/4/2018
A,1,2,5,6
B,2,3,6,7
C,3,4,7,8
D,NA,NA,4,5
dput(test_data) 结果:
structure(list(Name = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L), .Label = c("A",
"B ", "C", "D"), class = "factor"), Date = structure(c(1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("1/1/2018", "1/4/2018"), class = "factor"),
Val1 = 1:7, Val2 = 2:8), class = "data.frame", row.names = c(NA,
-7L))
library(dplyr)
library(tidyr) #complete
library(data.table) #dcast and setDT
df %>% mutate(Date=as.Date(Date,'%m/%d/%Y')) %>%
complete(Name, nesting(Date=full_seq(Date,1))) %>%
setDT(.) %>% dcast(Name ~ Date, value.var=c('Val2','Val1'))
一个tidyverse
选项
library(lubridate)
library(tidyverse)
df %>%
mutate(Date=mdy(Date)) %>%
#Or you can do as.Date(Date,'%m/%d/%Y') to avoid loading `lubridate`
complete(Name, Date = seq(min(Date), max(Date), 1)) %>%
gather(key, value, -Name, -Date) %>%
unite(Date, key, Date, sep = ".") %>%
spread(Date, value)