将具有 X 个重复属性的多个列合并到 X 个列中
Merge multiple columns with X repeating attributes into X columns
我有一个如下所示的数据框,列按月分隔(enero、febrero、marzo 等),每一行对应一个我需要从时间序列中提取的值。每对 Month/Caudal 的大小取决于一个月的天数。
此外,基于原始数据集,每对Month/Caudal由一个空列的NA分隔。
enero Caudal X febrero Caudal.1 X.1 marzo Caudal.2 X.2
1 1/1/2003 00:15 - NA 1/2/2003 00:15 - NA 1/3/2003 00:15 1.68 NA
2 1/1/2003 00:30 - NA 1/2/2003 00:30 - NA 1/3/2003 00:30 1.69 NA
3 1/1/2003 00:45 - NA 1/2/2003 00:45 - NA 1/3/2003 00:45 1.68 NA
4 1/1/2003 01:00 - NA 1/2/2003 01:00 - NA 1/3/2003 01:00 1.68 NA
5 1/1/2003 01:15 - NA 1/2/2003 01:15 - NA 1/3/2003 01:15 1.68 NA
6 1/1/2003 01:30 - NA 1/2/2003 01:30 - NA 1/3/2003 01:30 1.68 NA
我想要的结果是只有两列的时间序列:Date 和 Caudal。
Date Caudal
1 1/1/2003 00:15 -
2 1/1/2003 00:30 -
3 1/1/2003 00:45 -
4 1/1/2003 01:00 -
5 1/1/2003 01:15 -
6 1/1/2003 01:30 -
7 1/2/2003 00:15 -
8 1/2/2003 00:30 -
9 1/2/2003 00:45 -
10 1/2/2003 01:00 -
11 1/2/2003 01:15 -
12 1/2/2003 01:30 -
13 1/3/2003 00:15 1.68
14 1/3/2003 00:30 1.69
15 1/3/2003 00:45 1.68
16 1/3/2003 01:00 1.68
17 1/3/2003 01:15 1.68
18 1/3/2003 01:30 1.68
我需要对 40 个格式完全相同的 .txt 文件执行此操作。我该如何安排才能将我所有的文件连接成一个连续的 df?
示例数据:
structure(list(enero = c("1/1/2003 00:15", "1/1/2003 00:30",
"1/1/2003 00:45", "1/1/2003 01:00", "1/1/2003 01:15", "1/1/2003 01:30"
), Caudal = c(" - ", " - ", " - ", " - ", " - ", " - "
), X = c(NA, NA, NA, NA, NA, NA), febrero = c("1/2/2003 00:15",
"1/2/2003 00:30", "1/2/2003 00:45", "1/2/2003 01:00", "1/2/2003 01:15",
"1/2/2003 01:30"), Caudal.1 = c(" - ", " - ", " - ", " - ",
" - ", " - "), X.1 = c(NA, NA, NA, NA, NA, NA), marzo = c("1/3/2003 00:15",
"1/3/2003 00:30", "1/3/2003 00:45", "1/3/2003 01:00", "1/3/2003 01:15",
"1/3/2003 01:30"), Caudal.2 = c(" 1.68 ", " 1.69 ", " 1.68 ",
" 1.68 ", " 1.68 ", " 1.68 "), X.2 = c(NA, NA, NA, NA, NA, NA
)), row.names = c(NA, 6L), class = "data.frame")
我们可以先删除空列,然后重命名列集(即 Date 和 Caudal)是最简单的。然后,我们可以使用 _
作为名称分隔符转换为长格式。
library(tidyverse)
df %>%
select(-starts_with("X")) %>%
rename_with(~paste0("Date_", seq_along(.)),
-starts_with("Caudal")) %>%
rename_with(~paste0("Caudal_", seq_along(.)),
starts_with("Caudal")) %>%
pivot_longer(everything(),
names_to = c(".value", "time"),
names_sep = "_",
values_drop_na = TRUE) %>%
select(-time) %>%
arrange(Date)
输出
Date Caudal
<chr> <chr>
1 1/1/2003 00:15 " - "
2 1/1/2003 00:30 " - "
3 1/1/2003 00:45 " - "
4 1/1/2003 01:00 " - "
5 1/1/2003 01:15 " - "
6 1/1/2003 01:30 " - "
7 1/2/2003 00:15 " - "
8 1/2/2003 00:30 " - "
9 1/2/2003 00:45 " - "
10 1/2/2003 01:00 " - "
11 1/2/2003 01:15 " - "
12 1/2/2003 01:30 " - "
13 1/3/2003 00:15 " 1.68 "
14 1/3/2003 00:30 " 1.69 "
15 1/3/2003 00:45 " 1.68 "
16 1/3/2003 01:00 " 1.68 "
17 1/3/2003 01:15 " 1.68 "
18 1/3/2003 01:30 " 1.68 "
我有一个如下所示的数据框,列按月分隔(enero、febrero、marzo 等),每一行对应一个我需要从时间序列中提取的值。每对 Month/Caudal 的大小取决于一个月的天数。
此外,基于原始数据集,每对Month/Caudal由一个空列的NA分隔。
enero Caudal X febrero Caudal.1 X.1 marzo Caudal.2 X.2
1 1/1/2003 00:15 - NA 1/2/2003 00:15 - NA 1/3/2003 00:15 1.68 NA
2 1/1/2003 00:30 - NA 1/2/2003 00:30 - NA 1/3/2003 00:30 1.69 NA
3 1/1/2003 00:45 - NA 1/2/2003 00:45 - NA 1/3/2003 00:45 1.68 NA
4 1/1/2003 01:00 - NA 1/2/2003 01:00 - NA 1/3/2003 01:00 1.68 NA
5 1/1/2003 01:15 - NA 1/2/2003 01:15 - NA 1/3/2003 01:15 1.68 NA
6 1/1/2003 01:30 - NA 1/2/2003 01:30 - NA 1/3/2003 01:30 1.68 NA
我想要的结果是只有两列的时间序列:Date 和 Caudal。
Date Caudal
1 1/1/2003 00:15 -
2 1/1/2003 00:30 -
3 1/1/2003 00:45 -
4 1/1/2003 01:00 -
5 1/1/2003 01:15 -
6 1/1/2003 01:30 -
7 1/2/2003 00:15 -
8 1/2/2003 00:30 -
9 1/2/2003 00:45 -
10 1/2/2003 01:00 -
11 1/2/2003 01:15 -
12 1/2/2003 01:30 -
13 1/3/2003 00:15 1.68
14 1/3/2003 00:30 1.69
15 1/3/2003 00:45 1.68
16 1/3/2003 01:00 1.68
17 1/3/2003 01:15 1.68
18 1/3/2003 01:30 1.68
我需要对 40 个格式完全相同的 .txt 文件执行此操作。我该如何安排才能将我所有的文件连接成一个连续的 df?
示例数据:
structure(list(enero = c("1/1/2003 00:15", "1/1/2003 00:30",
"1/1/2003 00:45", "1/1/2003 01:00", "1/1/2003 01:15", "1/1/2003 01:30"
), Caudal = c(" - ", " - ", " - ", " - ", " - ", " - "
), X = c(NA, NA, NA, NA, NA, NA), febrero = c("1/2/2003 00:15",
"1/2/2003 00:30", "1/2/2003 00:45", "1/2/2003 01:00", "1/2/2003 01:15",
"1/2/2003 01:30"), Caudal.1 = c(" - ", " - ", " - ", " - ",
" - ", " - "), X.1 = c(NA, NA, NA, NA, NA, NA), marzo = c("1/3/2003 00:15",
"1/3/2003 00:30", "1/3/2003 00:45", "1/3/2003 01:00", "1/3/2003 01:15",
"1/3/2003 01:30"), Caudal.2 = c(" 1.68 ", " 1.69 ", " 1.68 ",
" 1.68 ", " 1.68 ", " 1.68 "), X.2 = c(NA, NA, NA, NA, NA, NA
)), row.names = c(NA, 6L), class = "data.frame")
我们可以先删除空列,然后重命名列集(即 Date 和 Caudal)是最简单的。然后,我们可以使用 _
作为名称分隔符转换为长格式。
library(tidyverse)
df %>%
select(-starts_with("X")) %>%
rename_with(~paste0("Date_", seq_along(.)),
-starts_with("Caudal")) %>%
rename_with(~paste0("Caudal_", seq_along(.)),
starts_with("Caudal")) %>%
pivot_longer(everything(),
names_to = c(".value", "time"),
names_sep = "_",
values_drop_na = TRUE) %>%
select(-time) %>%
arrange(Date)
输出
Date Caudal
<chr> <chr>
1 1/1/2003 00:15 " - "
2 1/1/2003 00:30 " - "
3 1/1/2003 00:45 " - "
4 1/1/2003 01:00 " - "
5 1/1/2003 01:15 " - "
6 1/1/2003 01:30 " - "
7 1/2/2003 00:15 " - "
8 1/2/2003 00:30 " - "
9 1/2/2003 00:45 " - "
10 1/2/2003 01:00 " - "
11 1/2/2003 01:15 " - "
12 1/2/2003 01:30 " - "
13 1/3/2003 00:15 " 1.68 "
14 1/3/2003 00:30 " 1.69 "
15 1/3/2003 00:45 " 1.68 "
16 1/3/2003 01:00 " 1.68 "
17 1/3/2003 01:15 " 1.68 "
18 1/3/2003 01:30 " 1.68 "