将特定元素嵌套到 r 中的另一个列表中
Nesting Specific Elements into another list in r
我的数据集有 5 个 ID
,跨度从 01-01-2010
到 12-31-2013
。我首先 split
数据 ID
,最后得到一个列表对象。然后我创建另一个列表,创建 10 天的间隔并按 ID
.
排列
我想根据间隔元素中标记的 ID
将这些间隔嵌套到第一个 ID
列表中。
例如:
主列表由 ID
元素组成。 [1]
,[2]
,[3]
是嵌套在ID
的区间,例如[A]
中的区间都是ID
A,[B]
是B,[C]
是C,等等。
[A]
[1]
[2]
[3]
[B]
[1]
[2]
[3]
[C]
[1]
[2]
[3]
[D]
[1]
[2]
[3]
[E]
[1]
[2]
[3]
下面的代码将区间嵌套到 ID
列表中,但它嵌套了所有 ID
,而不是它应该在的特定区间。
set.seed(12345)
library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500)
ID <- rep(c("A","B","C","D", "E"), 100)
df <- data.frame(date = date,
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000),
ID)
df_ID <- split(df, df$ID)
df_nested <- lapply(df_ID, function(x){
x %>%
arrange(ID) %>%
# Creates a new column assigning the first day in the 10-day interval in which
# the date falls under (e.g., 01-01-2010 would be in the first 10-day interval
# so the `floor_date` assigned to it would be 01-01-2010)
mutate(new = floor_date(date, "10 days")) %>%
# For any months that has 31 days, the 31st day would normally be assigned its
# own interval. The code below takes the 31st day and joins it with the
# previous interval.
mutate(new = if_else(day(new) == 31, new - days(10), new)) %>%
group_by(new, .add = TRUE) %>%
group_split()
})
我会这样做:
set.seed(12345)
library(lubridate)
library(tidyverse)
f = function(data){
data %>% mutate(
new = floor_date(data$date, "10 days"),
new = if_else(day(new) == 31, new - days(10), new)
)
}
tibble(
ID = rep(c("A","B","C","D", "E"), 100),
date = rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500),
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000)
) %>% group_by(ID) %>%
nest() %>%
mutate(data = map(data, f)) %>%
unnest(data)
输出
# A tibble: 500 x 5
# Groups: ID [5]
ID date x y new
<chr> <date> <dbl> <dbl> <date>
1 A 2010-01-01 74418. 820935. 2010-01-01
2 A 2010-01-06 63327. 885896. 2010-01-01
3 A 2010-01-11 60691. 873949. 2010-01-11
4 A 2010-01-16 69250. 868411. 2010-01-11
5 A 2010-01-21 69075. 876142. 2010-01-21
6 A 2010-01-26 67797. 829892. 2010-01-21
7 A 2010-01-31 75860. 843542. 2010-01-21
8 A 2010-02-05 67233. 882318. 2010-02-01
9 A 2010-02-10 75644. 826283. 2010-02-01
10 A 2010-02-15 66424. 853789. 2010-02-11
简单明了,不是吗?
您想对数据执行的所有操作都包含在 f
函数中。您可以根据需要扩展它。
剩下的在一个简单的方案中完成
tibble %>% group_by %>% nest % mutate %>% unnest
我的数据集有 5 个 ID
,跨度从 01-01-2010
到 12-31-2013
。我首先 split
数据 ID
,最后得到一个列表对象。然后我创建另一个列表,创建 10 天的间隔并按 ID
.
我想根据间隔元素中标记的 ID
将这些间隔嵌套到第一个 ID
列表中。
例如:
主列表由 ID
元素组成。 [1]
,[2]
,[3]
是嵌套在ID
的区间,例如[A]
中的区间都是ID
A,[B]
是B,[C]
是C,等等。
[A]
[1]
[2]
[3]
[B]
[1]
[2]
[3]
[C]
[1]
[2]
[3]
[D]
[1]
[2]
[3]
[E]
[1]
[2]
[3]
下面的代码将区间嵌套到 ID
列表中,但它嵌套了所有 ID
,而不是它应该在的特定区间。
set.seed(12345)
library(lubridate)
library(tidyverse)
date <- rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500)
ID <- rep(c("A","B","C","D", "E"), 100)
df <- data.frame(date = date,
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000),
ID)
df_ID <- split(df, df$ID)
df_nested <- lapply(df_ID, function(x){
x %>%
arrange(ID) %>%
# Creates a new column assigning the first day in the 10-day interval in which
# the date falls under (e.g., 01-01-2010 would be in the first 10-day interval
# so the `floor_date` assigned to it would be 01-01-2010)
mutate(new = floor_date(date, "10 days")) %>%
# For any months that has 31 days, the 31st day would normally be assigned its
# own interval. The code below takes the 31st day and joins it with the
# previous interval.
mutate(new = if_else(day(new) == 31, new - days(10), new)) %>%
group_by(new, .add = TRUE) %>%
group_split()
})
我会这样做:
set.seed(12345)
library(lubridate)
library(tidyverse)
f = function(data){
data %>% mutate(
new = floor_date(data$date, "10 days"),
new = if_else(day(new) == 31, new - days(10), new)
)
}
tibble(
ID = rep(c("A","B","C","D", "E"), 100),
date = rep_len(seq(dmy("01-01-2010"), dmy("31-12-2013"), by = "days"), 500),
x = runif(length(date), min = 60000, max = 80000),
y = runif(length(date), min = 800000, max = 900000)
) %>% group_by(ID) %>%
nest() %>%
mutate(data = map(data, f)) %>%
unnest(data)
输出
# A tibble: 500 x 5
# Groups: ID [5]
ID date x y new
<chr> <date> <dbl> <dbl> <date>
1 A 2010-01-01 74418. 820935. 2010-01-01
2 A 2010-01-06 63327. 885896. 2010-01-01
3 A 2010-01-11 60691. 873949. 2010-01-11
4 A 2010-01-16 69250. 868411. 2010-01-11
5 A 2010-01-21 69075. 876142. 2010-01-21
6 A 2010-01-26 67797. 829892. 2010-01-21
7 A 2010-01-31 75860. 843542. 2010-01-21
8 A 2010-02-05 67233. 882318. 2010-02-01
9 A 2010-02-10 75644. 826283. 2010-02-01
10 A 2010-02-15 66424. 853789. 2010-02-11
简单明了,不是吗?
您想对数据执行的所有操作都包含在 f
函数中。您可以根据需要扩展它。
剩下的在一个简单的方案中完成
tibble %>% group_by %>% nest % mutate %>% unnest