如何向导入的 csv 添加额外的行?
How to add additional rows to an imported csv?
我目前正在以下列形式将多个 csv 文件加载到 R 中:
read.csv("Cashflows2.csv", header = F, )
V1 V2
1 Date Payments
2 18/08/2017 -20495*
3 18/04/2018 639.76*
4 18/05/2018 639.76
5 18/06/2018 639.76
6 18/07/2018 639.76
7 18/08/2018 639.76
8 18/09/2018 639.76
9 18/10/2018 639.76
10 18/11/2018 639.76*
11 18/05/2019 639.76*
12 18/06/2019 639.76
13 18/07/2019 639.76
14 18/08/2019 639.76
15 18/09/2019 639.76
16 18/10/2019 639.76
17 18/11/2019 639.76
18 18/12/2019 639.76
19 18/01/2020 639.76
20 18/02/2020 639.76
21 18/03/2020 639.76
22 18/04/2020 639.76
23 18/05/2020 639.76
24 18/06/2020 639.76
25 18/07/2020 639.76
26 18/08/2020 639.76
27 18/09/2020 639.76
28 18/10/2020 639.76
29 18/11/2020 639.76
30 18/12/2020 639.76
31 18/01/2021 639.76
32 18/02/2021 639.76
33 18/03/2021 639.76
34 18/04/2021 639.76
35 18/05/2021 639.76
36 18/06/2021 639.76
37 18/07/2021 734.76
但是,如星号所示(未出现在 csv 文件中),有两个期间没有付款。是否有一个函数可以将此 csv 文件转换为 R 中的以下形式:
read.csv("Cashflows2.csv", header = F, )
V1 V2
1 Date Payment
2 18/08/2017 -20495
3 18/09/2017 0
4 18/10/2017 0
5 18/11/2017 0
6 18/12/2017 0
7 18/01/2018 0
8 18/02/2018 0
9 18/03/2018 0
10 18/04/2018 639.76
11 18/05/2018 639.76
12 18/06/2018 639.76
13 18/07/2018 639.76
14 18/08/2018 639.76
15 18/09/2018 639.76
16 18/10/2018 639.76
17 18/11/2018 639.76
18 18/12/2018 0
19 18/01/2019 0
20 18/02/2019 0
21 18/03/2019 0
22 18/04/2019 0
23 18/05/2019 639.76
24 18/06/2019 639.76
25 18/07/2019 639.76
26 18/08/2019 639.76
27 18/09/2019 639.76
28 18/10/2019 639.76
29 18/11/2019 639.76
30 18/12/2019 639.76
31 18/01/2020 639.76
32 18/02/2020 639.76
33 18/03/2020 639.76
34 18/04/2020 639.76
35 18/05/2020 639.76
36 18/06/2020 639.76
37 18/07/2020 639.76
38 18/08/2020 639.76
39 18/09/2020 639.76
40 18/10/2020 639.76
41 18/11/2020 639.76
42 18/12/2020 639.76
43 18/01/2021 639.76
44 18/02/2021 639.76
45 18/03/2021 639.76
46 18/04/2021 639.76
47 18/05/2021 639.76
48 18/06/2021 639.76
49 18/07/2021 734.76
并非所有的 csv 文件都有相同的问题,因此理想情况下,该函数适用于多个相似的 csv 文件,其中并非所有文件都经历了 0 付款期。
如有任何帮助,我们将不胜感激。
dput(df)
structure(list(V1 = structure(c(37L, 22L, 7L, 10L, 14L, 18L,
23L, 26L, 29L, 32L, 11L, 15L, 19L, 24L, 27L, 30L, 33L, 35L, 1L,
3L, 5L, 8L, 12L, 16L, 20L, 25L, 28L, 31L, 34L, 36L, 2L, 4L, 6L,
9L, 13L, 17L, 21L), .Label = c("18/01/2020", "18/01/2021", "18/02/2020",
"18/02/2021", "18/03/2020", "18/03/2021", "18/04/2018", "18/04/2020",
"18/04/2021", "18/05/2018", "18/05/2019", "18/05/2020", "18/05/2021",
"18/06/2018", "18/06/2019", "18/06/2020", "18/06/2021", "18/07/2018",
"18/07/2019", "18/07/2020", "18/07/2021", "18/08/2017", "18/08/2018",
"18/08/2019", "18/08/2020", "18/09/2018", "18/09/2019", "18/09/2020",
"18/10/2018", "18/10/2019", "18/10/2020", "18/11/2018", "18/11/2019",
"18/11/2020", "18/12/2019", "18/12/2020", "Date"), class = "factor"),
V2 = structure(c(4L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), .Label = c("-20495",
"639.76", "734.76", "Payment"), class = "factor")), class = "data.frame", row.names = c(NA,
-37L))
我们可以在使用 header = TRUE
读取数据后使用 tidyr::complete
,将 date
列转换为实际的 Date 对象。
df <- read.csv("Cashflows2.csv", header = TRUE)
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%d/%m/%Y")) %>%
tidyr::complete(Date = seq(min(Date), max(Date), by = "1 month"),
fill = list(Payments = 0))
# A tibble: 48 x 2
# Date Payments
# <date> <dbl>
# 1 2017-08-18 -20495
# 2 2017-09-18 0
# 3 2017-10-18 0
# 4 2017-11-18 0
# 5 2017-12-18 0
# 6 2018-01-18 0
# 7 2018-02-18 0
# 8 2018-03-18 0
# 9 2018-04-18 640.
#10 2018-05-18 640.
# … with 38 more rows
在基础 R 中,您可以使用 Date
的 max
和 min
创建一个新数据框,将 merge
替换为 Date
并替换 NA
s 与 0.
df$Date <- as.Date(df$Date, "%d/%m/%Y")
compare_df <- data.frame(Date = seq(min(df$Date), max(df$Date), by = "1 month"))
df1 <- merge(compare_df, df, by = "Date", all.x = TRUE)
df1$Payments[is.na(df1$Payments)] <- 0
要将其应用于多个 csv 文件,我们可以将其更改为函数并使用 lapply
将其应用于数据帧列表
read_fun <- function(df) {
df$Date <- as.Date(df$Date, "%d/%m/%Y")
compare_df <- data.frame(Date = seq(min(df$Date), max(df$Date), by = "1 month"))
df1 <- merge(compare_df, df, by = "Date", all.x = TRUE)
df1$Payments[is.na(df1$Payments)] <- 0
df1
}
list_df <- lapply(list_df, read_fun)
您应该使用 read.csv
中的 header = TRUE
读取数据,因为您有列名。
my_data <- read.csv("Cashflows2.csv", header = TRUE)
然后您可以使用
将日期列转换为 "proper" 日期列
my_data$Date <- as.Date(my_data$Date, format = "%d/%m/%Y")
那么,我认为解决您的任务的简单方法如下。然而,这需要您安装 tidyr
-package:(使用 install.packages("tidyr")
)
tidyr::complete(my_data, Date = seq.Date(min(Date), max(Date), by = "month"),
fill = list(Payments = 0))
# A tibble: 48 x 2
# Date Payments
# <date> <dbl>
# 1 2017-08-18 -20495
# 2 2017-09-18 0
# 3 2017-10-18 0
# 4 2017-11-18 0
# 5 2017-12-18 0
# 6 2018-01-18 0
# 7 2018-02-18 0
# 8 2018-03-18 0
# 9 2018-04-18 640.
# 10 2018-05-18 640.
# ... with 38 more rows
此函数获取您的数据,并扩展日期序列,我们从您数据中的最小日期开始,到最大日期,步长为一个月。此外,我们希望在缺失的月份中用零填充 Payments 列。
您可以使用
保存更新的数据
write.csv(my_data, "Cashflows2_updated.csv")
如果您想恢复以前的日期格式,您可以使用
my_data$Date <- format(my_data$Date, format = "%d/%m/%Y")
在保存文件之前。
我目前正在以下列形式将多个 csv 文件加载到 R 中:
read.csv("Cashflows2.csv", header = F, )
V1 V2
1 Date Payments
2 18/08/2017 -20495*
3 18/04/2018 639.76*
4 18/05/2018 639.76
5 18/06/2018 639.76
6 18/07/2018 639.76
7 18/08/2018 639.76
8 18/09/2018 639.76
9 18/10/2018 639.76
10 18/11/2018 639.76*
11 18/05/2019 639.76*
12 18/06/2019 639.76
13 18/07/2019 639.76
14 18/08/2019 639.76
15 18/09/2019 639.76
16 18/10/2019 639.76
17 18/11/2019 639.76
18 18/12/2019 639.76
19 18/01/2020 639.76
20 18/02/2020 639.76
21 18/03/2020 639.76
22 18/04/2020 639.76
23 18/05/2020 639.76
24 18/06/2020 639.76
25 18/07/2020 639.76
26 18/08/2020 639.76
27 18/09/2020 639.76
28 18/10/2020 639.76
29 18/11/2020 639.76
30 18/12/2020 639.76
31 18/01/2021 639.76
32 18/02/2021 639.76
33 18/03/2021 639.76
34 18/04/2021 639.76
35 18/05/2021 639.76
36 18/06/2021 639.76
37 18/07/2021 734.76
但是,如星号所示(未出现在 csv 文件中),有两个期间没有付款。是否有一个函数可以将此 csv 文件转换为 R 中的以下形式:
read.csv("Cashflows2.csv", header = F, )
V1 V2
1 Date Payment
2 18/08/2017 -20495
3 18/09/2017 0
4 18/10/2017 0
5 18/11/2017 0
6 18/12/2017 0
7 18/01/2018 0
8 18/02/2018 0
9 18/03/2018 0
10 18/04/2018 639.76
11 18/05/2018 639.76
12 18/06/2018 639.76
13 18/07/2018 639.76
14 18/08/2018 639.76
15 18/09/2018 639.76
16 18/10/2018 639.76
17 18/11/2018 639.76
18 18/12/2018 0
19 18/01/2019 0
20 18/02/2019 0
21 18/03/2019 0
22 18/04/2019 0
23 18/05/2019 639.76
24 18/06/2019 639.76
25 18/07/2019 639.76
26 18/08/2019 639.76
27 18/09/2019 639.76
28 18/10/2019 639.76
29 18/11/2019 639.76
30 18/12/2019 639.76
31 18/01/2020 639.76
32 18/02/2020 639.76
33 18/03/2020 639.76
34 18/04/2020 639.76
35 18/05/2020 639.76
36 18/06/2020 639.76
37 18/07/2020 639.76
38 18/08/2020 639.76
39 18/09/2020 639.76
40 18/10/2020 639.76
41 18/11/2020 639.76
42 18/12/2020 639.76
43 18/01/2021 639.76
44 18/02/2021 639.76
45 18/03/2021 639.76
46 18/04/2021 639.76
47 18/05/2021 639.76
48 18/06/2021 639.76
49 18/07/2021 734.76
并非所有的 csv 文件都有相同的问题,因此理想情况下,该函数适用于多个相似的 csv 文件,其中并非所有文件都经历了 0 付款期。
如有任何帮助,我们将不胜感激。
dput(df)
structure(list(V1 = structure(c(37L, 22L, 7L, 10L, 14L, 18L,
23L, 26L, 29L, 32L, 11L, 15L, 19L, 24L, 27L, 30L, 33L, 35L, 1L,
3L, 5L, 8L, 12L, 16L, 20L, 25L, 28L, 31L, 34L, 36L, 2L, 4L, 6L,
9L, 13L, 17L, 21L), .Label = c("18/01/2020", "18/01/2021", "18/02/2020",
"18/02/2021", "18/03/2020", "18/03/2021", "18/04/2018", "18/04/2020",
"18/04/2021", "18/05/2018", "18/05/2019", "18/05/2020", "18/05/2021",
"18/06/2018", "18/06/2019", "18/06/2020", "18/06/2021", "18/07/2018",
"18/07/2019", "18/07/2020", "18/07/2021", "18/08/2017", "18/08/2018",
"18/08/2019", "18/08/2020", "18/09/2018", "18/09/2019", "18/09/2020",
"18/10/2018", "18/10/2019", "18/10/2020", "18/11/2018", "18/11/2019",
"18/11/2020", "18/12/2019", "18/12/2020", "Date"), class = "factor"),
V2 = structure(c(4L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L), .Label = c("-20495",
"639.76", "734.76", "Payment"), class = "factor")), class = "data.frame", row.names = c(NA,
-37L))
我们可以在使用 header = TRUE
读取数据后使用 tidyr::complete
,将 date
列转换为实际的 Date 对象。
df <- read.csv("Cashflows2.csv", header = TRUE)
library(dplyr)
df %>%
mutate(Date = as.Date(Date, "%d/%m/%Y")) %>%
tidyr::complete(Date = seq(min(Date), max(Date), by = "1 month"),
fill = list(Payments = 0))
# A tibble: 48 x 2
# Date Payments
# <date> <dbl>
# 1 2017-08-18 -20495
# 2 2017-09-18 0
# 3 2017-10-18 0
# 4 2017-11-18 0
# 5 2017-12-18 0
# 6 2018-01-18 0
# 7 2018-02-18 0
# 8 2018-03-18 0
# 9 2018-04-18 640.
#10 2018-05-18 640.
# … with 38 more rows
在基础 R 中,您可以使用 Date
的 max
和 min
创建一个新数据框,将 merge
替换为 Date
并替换 NA
s 与 0.
df$Date <- as.Date(df$Date, "%d/%m/%Y")
compare_df <- data.frame(Date = seq(min(df$Date), max(df$Date), by = "1 month"))
df1 <- merge(compare_df, df, by = "Date", all.x = TRUE)
df1$Payments[is.na(df1$Payments)] <- 0
要将其应用于多个 csv 文件,我们可以将其更改为函数并使用 lapply
read_fun <- function(df) {
df$Date <- as.Date(df$Date, "%d/%m/%Y")
compare_df <- data.frame(Date = seq(min(df$Date), max(df$Date), by = "1 month"))
df1 <- merge(compare_df, df, by = "Date", all.x = TRUE)
df1$Payments[is.na(df1$Payments)] <- 0
df1
}
list_df <- lapply(list_df, read_fun)
您应该使用 read.csv
中的 header = TRUE
读取数据,因为您有列名。
my_data <- read.csv("Cashflows2.csv", header = TRUE)
然后您可以使用
将日期列转换为 "proper" 日期列my_data$Date <- as.Date(my_data$Date, format = "%d/%m/%Y")
那么,我认为解决您的任务的简单方法如下。然而,这需要您安装 tidyr
-package:(使用 install.packages("tidyr")
)
tidyr::complete(my_data, Date = seq.Date(min(Date), max(Date), by = "month"),
fill = list(Payments = 0))
# A tibble: 48 x 2
# Date Payments
# <date> <dbl>
# 1 2017-08-18 -20495
# 2 2017-09-18 0
# 3 2017-10-18 0
# 4 2017-11-18 0
# 5 2017-12-18 0
# 6 2018-01-18 0
# 7 2018-02-18 0
# 8 2018-03-18 0
# 9 2018-04-18 640.
# 10 2018-05-18 640.
# ... with 38 more rows
此函数获取您的数据,并扩展日期序列,我们从您数据中的最小日期开始,到最大日期,步长为一个月。此外,我们希望在缺失的月份中用零填充 Payments 列。
您可以使用
保存更新的数据write.csv(my_data, "Cashflows2_updated.csv")
如果您想恢复以前的日期格式,您可以使用
my_data$Date <- format(my_data$Date, format = "%d/%m/%Y")
在保存文件之前。