数据帧 R 中的嵌套 foreach 循环更改值
Nested foreach loop changing values in a dataframe R
我正在尝试将两个嵌套的 for 循环转换为两个嵌套的 foreach 循环,以根据匹配的先决条件更改数据帧的值。原因是我相信我可以显着加快这个过程。下面是我的代码示例:
library(foreach) # for loop to parallelize
library(doMC) # create the number of cores to use
# set the number of cores to use
registerDoMC(22) # number of CPU cores
file_list <- c("a", "b", "c")
ldf <- c(data.frame(Date = c("2016-10-01", "2016-10-02", "2016-10-03", "2016-10-04")),
data.frame(Date = c("2016-10-07", "2016-10-08", "2016-10-09")),
data.frame(Date = c("2016-10-15", "2016-10-16", "2016-10-17", "2016-10-18", "2016-10-19")))
DF <- data.frame(Date = seq(as.POSIXct("2016-10-01", tz = "UTC"), as.POSIXct("2016-10-31", tz = "UTC"), by = 'day'),
A = 0,
B = 0,
C = 0)
DF2 <- DF # DF2 is used to compare my attempt result
for (i in 1:length(file_list))
{
Date <- ldf[[i]]
Date <- as.POSIXct(Date, tz = "UTC")
for (j in 1:length(Date))
{
ROW <- which(DF$Date == Date[j])
DF[ROW,i+1] <- 1
}
}
throwaway <- foreach (i = 1:length(file_list)) %dopar%
{
Date <- ldf[[i]]
Date <- as.POSIXct(Date, tz = "UTC")
foreach (j = 1:length(Date)) %do%
{
ROW <- which(DF2$Date == Date[j])
DF2[ROW,i+1] <- 1
return(NULL)
}
}
filelist
是我正在阅读的文件列表
ldf
是用来存放读取文件的变量
这两个变量在这个例子中组成,只是为了有一个可重现的例子。
DF
是我要存储由 foreach
循环
所做的值更改的地方
DF2
是我尝试过的尝试,它存储在哪里
我正在寻找的输出是 DF
的输出,但 DF2
保持不变。我知道 foreach 循环是为它们的 return 值而设计的,但是我怎样才能让 return 值与数据帧的值应该改变的位置相匹配。这些值是 file_list
中读取的每个文件的日期与数据帧 DF2
中的日期匹配的位置。如果它们匹配,则将 1 放置在行(日期)和列(文件名)的特定位置。在此先感谢您的帮助!
期望的输出是:
> DF
Date A B C
1 2016-10-01 1 0 0
2 2016-10-02 1 0 0
3 2016-10-03 1 0 0
4 2016-10-04 1 0 0
5 2016-10-05 0 0 0
6 2016-10-06 0 0 0
7 2016-10-07 0 1 0
8 2016-10-08 0 1 0
9 2016-10-09 0 1 0
10 2016-10-10 0 0 0
11 2016-10-11 0 0 0
12 2016-10-12 0 0 0
13 2016-10-13 0 0 0
14 2016-10-14 0 0 0
15 2016-10-15 0 0 1
16 2016-10-16 0 0 1
17 2016-10-17 0 0 1
18 2016-10-18 0 0 1
19 2016-10-19 0 0 1
20 2016-10-20 0 0 0
21 2016-10-21 0 0 0
22 2016-10-22 0 0 0
23 2016-10-23 0 0 0
24 2016-10-24 0 0 0
25 2016-10-25 0 0 0
26 2016-10-26 0 0 0
27 2016-10-27 0 0 0
28 2016-10-28 0 0 0
29 2016-10-29 0 0 0
30 2016-10-30 0 0 0
31 2016-10-31 0 0 0
考虑在数据帧列表的所有 df 项中使用零循环,但 Reduce()
和 merge
。但是,您需要设置您的数据框并列出略有不同的内容。
首先,添加顺序 Date
数据帧作为列表的第一个 elmenet。然后,在您读入的每个文件中添加对应于 A
、B
、C
的第二列,每列都等于 1(可以在 lapply
或 for
读入过程中使用的循环-post这部分用于演示)。总而言之,如下图all.equal
与原始DF完全匹配的结果:
# INITIALIZE LIST WITH DATE SEQUENCE DF
newldf <- list(data.frame(Date = as.factor(seq(as.POSIXct("2016-10-01", tz = "UTC"),
as.POSIXct("2016-10-31", tz = "UTC"),
by = 'day'))))
# APPEND LIST OF DATA FRAMES THAT ARE READ IN, EACH WITH SECOND COL = 1
newldf <- append(newldf,
list(data.frame(Date = c("2016-10-01", "2016-10-02",
"2016-10-03", "2016-10-04"), A = 1),
data.frame(Date = c("2016-10-07", "2016-10-08",
"2016-10-09"), B = 1),
data.frame(Date = c("2016-10-15", "2016-10-16",
"2016-10-17", "2016-10-18", "2016-10-19"), C=1)))
# MERGE ALL DATA FAMES TOGETHER
newDF <- Reduce(function(...) merge(..., by=c("Date"), all=T), newldf)
newDF[is.na(newDF)] <- 0 # CONVERT NAs TO ZEROs
newDF$Date <- as.POSIXct(newDF$Date, tz = "UTC") # CONVERT DATE TO POSIXct
str(newDF)
# 'data.frame': 31 obs. of 4 variables:
# $ Date: POSIXct, format: "2016-10-01" "2016-10-02" ...
# $ A : num 1 1 1 0 0 0 0 0 0 0 ...
# $ B : num 0 0 0 0 0 0 1 1 1 0 ...
# $ C : num 0 0 0 0 0 0 0 0 0 0 ...
str(DF)
# 'data.frame': 31 obs. of 4 variables:
# $ Date: POSIXct, format: "2016-10-01" "2016-10-02" ...
# $ A : num 1 1 1 0 0 0 0 0 0 0 ...
# $ B : num 0 0 0 0 0 0 1 1 1 0 ...
# $ C : num 0 0 0 0 0 0 0 0 0 0 ...
all.equal(DF, newDF)
# [1] TRUE
我正在尝试将两个嵌套的 for 循环转换为两个嵌套的 foreach 循环,以根据匹配的先决条件更改数据帧的值。原因是我相信我可以显着加快这个过程。下面是我的代码示例:
library(foreach) # for loop to parallelize
library(doMC) # create the number of cores to use
# set the number of cores to use
registerDoMC(22) # number of CPU cores
file_list <- c("a", "b", "c")
ldf <- c(data.frame(Date = c("2016-10-01", "2016-10-02", "2016-10-03", "2016-10-04")),
data.frame(Date = c("2016-10-07", "2016-10-08", "2016-10-09")),
data.frame(Date = c("2016-10-15", "2016-10-16", "2016-10-17", "2016-10-18", "2016-10-19")))
DF <- data.frame(Date = seq(as.POSIXct("2016-10-01", tz = "UTC"), as.POSIXct("2016-10-31", tz = "UTC"), by = 'day'),
A = 0,
B = 0,
C = 0)
DF2 <- DF # DF2 is used to compare my attempt result
for (i in 1:length(file_list))
{
Date <- ldf[[i]]
Date <- as.POSIXct(Date, tz = "UTC")
for (j in 1:length(Date))
{
ROW <- which(DF$Date == Date[j])
DF[ROW,i+1] <- 1
}
}
throwaway <- foreach (i = 1:length(file_list)) %dopar%
{
Date <- ldf[[i]]
Date <- as.POSIXct(Date, tz = "UTC")
foreach (j = 1:length(Date)) %do%
{
ROW <- which(DF2$Date == Date[j])
DF2[ROW,i+1] <- 1
return(NULL)
}
}
filelist
是我正在阅读的文件列表
ldf
是用来存放读取文件的变量
这两个变量在这个例子中组成,只是为了有一个可重现的例子。
DF
是我要存储由 foreach
循环
DF2
是我尝试过的尝试,它存储在哪里
我正在寻找的输出是 DF
的输出,但 DF2
保持不变。我知道 foreach 循环是为它们的 return 值而设计的,但是我怎样才能让 return 值与数据帧的值应该改变的位置相匹配。这些值是 file_list
中读取的每个文件的日期与数据帧 DF2
中的日期匹配的位置。如果它们匹配,则将 1 放置在行(日期)和列(文件名)的特定位置。在此先感谢您的帮助!
期望的输出是:
> DF
Date A B C
1 2016-10-01 1 0 0
2 2016-10-02 1 0 0
3 2016-10-03 1 0 0
4 2016-10-04 1 0 0
5 2016-10-05 0 0 0
6 2016-10-06 0 0 0
7 2016-10-07 0 1 0
8 2016-10-08 0 1 0
9 2016-10-09 0 1 0
10 2016-10-10 0 0 0
11 2016-10-11 0 0 0
12 2016-10-12 0 0 0
13 2016-10-13 0 0 0
14 2016-10-14 0 0 0
15 2016-10-15 0 0 1
16 2016-10-16 0 0 1
17 2016-10-17 0 0 1
18 2016-10-18 0 0 1
19 2016-10-19 0 0 1
20 2016-10-20 0 0 0
21 2016-10-21 0 0 0
22 2016-10-22 0 0 0
23 2016-10-23 0 0 0
24 2016-10-24 0 0 0
25 2016-10-25 0 0 0
26 2016-10-26 0 0 0
27 2016-10-27 0 0 0
28 2016-10-28 0 0 0
29 2016-10-29 0 0 0
30 2016-10-30 0 0 0
31 2016-10-31 0 0 0
考虑在数据帧列表的所有 df 项中使用零循环,但 Reduce()
和 merge
。但是,您需要设置您的数据框并列出略有不同的内容。
首先,添加顺序 Date
数据帧作为列表的第一个 elmenet。然后,在您读入的每个文件中添加对应于 A
、B
、C
的第二列,每列都等于 1(可以在 lapply
或 for
读入过程中使用的循环-post这部分用于演示)。总而言之,如下图all.equal
与原始DF完全匹配的结果:
# INITIALIZE LIST WITH DATE SEQUENCE DF
newldf <- list(data.frame(Date = as.factor(seq(as.POSIXct("2016-10-01", tz = "UTC"),
as.POSIXct("2016-10-31", tz = "UTC"),
by = 'day'))))
# APPEND LIST OF DATA FRAMES THAT ARE READ IN, EACH WITH SECOND COL = 1
newldf <- append(newldf,
list(data.frame(Date = c("2016-10-01", "2016-10-02",
"2016-10-03", "2016-10-04"), A = 1),
data.frame(Date = c("2016-10-07", "2016-10-08",
"2016-10-09"), B = 1),
data.frame(Date = c("2016-10-15", "2016-10-16",
"2016-10-17", "2016-10-18", "2016-10-19"), C=1)))
# MERGE ALL DATA FAMES TOGETHER
newDF <- Reduce(function(...) merge(..., by=c("Date"), all=T), newldf)
newDF[is.na(newDF)] <- 0 # CONVERT NAs TO ZEROs
newDF$Date <- as.POSIXct(newDF$Date, tz = "UTC") # CONVERT DATE TO POSIXct
str(newDF)
# 'data.frame': 31 obs. of 4 variables:
# $ Date: POSIXct, format: "2016-10-01" "2016-10-02" ...
# $ A : num 1 1 1 0 0 0 0 0 0 0 ...
# $ B : num 0 0 0 0 0 0 1 1 1 0 ...
# $ C : num 0 0 0 0 0 0 0 0 0 0 ...
str(DF)
# 'data.frame': 31 obs. of 4 variables:
# $ Date: POSIXct, format: "2016-10-01" "2016-10-02" ...
# $ A : num 1 1 1 0 0 0 0 0 0 0 ...
# $ B : num 0 0 0 0 0 0 1 1 1 0 ...
# $ C : num 0 0 0 0 0 0 0 0 0 0 ...
all.equal(DF, newDF)
# [1] TRUE