R语言将数据框转换成特定格式
R Language to convert the data frame into specific format
我有以下数据框,我正在尝试将其转换为特定格式。 新项目及其数量 每天添加为列。我想自动检测 R 语言 中的所有项目及其数量,并转换为提到的预期输出。
我的数据框:(每天添加数量的新项目作为新列)
id date a1_item a1_quantity a2_item a2_quantity
z1 2022-02-28 name1 10 name11 20
z1 2021-10-31 name2 20 name21 30
z2 2021-12-31 name3 10 name31 40
r3 2021-10-31 name4 40 name41 10
r4 2021-06-30 name5 30 name51 05
r5 2021-08-31 name6 10 name61 40
我的预期输出:
id date item quanity
z1 2022-02-28 name1 10
z1 2022-02-28 name11 20
z1 2021-10-31 name2 20
z1 2021-10-31 name21 30
z2 2021-12-31 name3 10
z2 2021-12-31 name31 40
r3 2021-10-31 name4 40
r3 2021-10-31 name41 10
r4 2021-06-30 name5 30
r4 2021-06-30 name51 05
r5 2021-08-31 name6 10
r5 2021-08-31 name61 40
我们可以使用 pivot_longer
和 names_pattern
来捕获子串作为一个组
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = contains("_"), names_to = c(".value"),
names_pattern = ".*_(.*)")
-输出
# A tibble: 12 × 4
id date item quantity
<chr> <chr> <chr> <int>
1 z1 2022-02-28 name1 10
2 z1 2022-02-28 name11 20
3 z1 2021-10-31 name2 20
4 z1 2021-10-31 name21 30
5 z2 2021-12-31 name3 10
6 z2 2021-12-31 name31 40
7 r3 2021-10-31 name4 40
8 r3 2021-10-31 name41 10
9 r4 2021-06-30 name5 30
10 r4 2021-06-30 name51 5
11 r5 2021-08-31 name6 10
12 r5 2021-08-31 name61 40
数据
df1 <- structure(list(id = c("z1", "z1", "z2", "r3", "r4", "r5"),
date = c("2022-02-28",
"2021-10-31", "2021-12-31", "2021-10-31", "2021-06-30", "2021-08-31"
), a1_item = c("name1", "name2", "name3", "name4", "name5", "name6"
), a1_quantity = c(10L, 20L, 10L, 40L, 30L, 10L), a2_item = c("name11",
"name21", "name31", "name41", "name51", "name61"), a2_quantity = c(20L,
30L, 40L, 10L, 5L, 40L)), class = "data.frame", row.names = c(NA,
-6L))
我有以下数据框,我正在尝试将其转换为特定格式。 新项目及其数量 每天添加为列。我想自动检测 R 语言 中的所有项目及其数量,并转换为提到的预期输出。
我的数据框:(每天添加数量的新项目作为新列)
id date a1_item a1_quantity a2_item a2_quantity
z1 2022-02-28 name1 10 name11 20
z1 2021-10-31 name2 20 name21 30
z2 2021-12-31 name3 10 name31 40
r3 2021-10-31 name4 40 name41 10
r4 2021-06-30 name5 30 name51 05
r5 2021-08-31 name6 10 name61 40
我的预期输出:
id date item quanity
z1 2022-02-28 name1 10
z1 2022-02-28 name11 20
z1 2021-10-31 name2 20
z1 2021-10-31 name21 30
z2 2021-12-31 name3 10
z2 2021-12-31 name31 40
r3 2021-10-31 name4 40
r3 2021-10-31 name41 10
r4 2021-06-30 name5 30
r4 2021-06-30 name51 05
r5 2021-08-31 name6 10
r5 2021-08-31 name61 40
我们可以使用 pivot_longer
和 names_pattern
来捕获子串作为一个组
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(cols = contains("_"), names_to = c(".value"),
names_pattern = ".*_(.*)")
-输出
# A tibble: 12 × 4
id date item quantity
<chr> <chr> <chr> <int>
1 z1 2022-02-28 name1 10
2 z1 2022-02-28 name11 20
3 z1 2021-10-31 name2 20
4 z1 2021-10-31 name21 30
5 z2 2021-12-31 name3 10
6 z2 2021-12-31 name31 40
7 r3 2021-10-31 name4 40
8 r3 2021-10-31 name41 10
9 r4 2021-06-30 name5 30
10 r4 2021-06-30 name51 5
11 r5 2021-08-31 name6 10
12 r5 2021-08-31 name61 40
数据
df1 <- structure(list(id = c("z1", "z1", "z2", "r3", "r4", "r5"),
date = c("2022-02-28",
"2021-10-31", "2021-12-31", "2021-10-31", "2021-06-30", "2021-08-31"
), a1_item = c("name1", "name2", "name3", "name4", "name5", "name6"
), a1_quantity = c(10L, 20L, 10L, 40L, 30L, 10L), a2_item = c("name11",
"name21", "name31", "name41", "name51", "name61"), a2_quantity = c(20L,
30L, 40L, 10L, 5L, 40L)), class = "data.frame", row.names = c(NA,
-6L))