Pivot_wider 具有多个(平行)列
Pivot_wider with multiple (parallel) columns
假设我有以下数据 (df
),其中包含按区域划分的公司的数据。 df
是一种宽格式,其中 WC19600
、WC19610
描述位置,WC19601
、WC19611
包含销售数据。这些变量中的倒数第二个数字表示 segment level
.
# A tibble: 2 x 6
NAME ISIN WC19600 WC19610 WC19601 WC19611
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 APPLE US0378331005 United States Other Foreign 109197000 125010000
2 MICROSOFT US5949181045 United States Other countries 83953000 84135000
我的目标是使用更长格式的数据,例如
# A tibble: 4 x 5
NAME ISIN segm region sales
<chr> <chr> <dbl> <chr> <dbl>
1 APPLE US0378331005 0 United States 109197000
2 APPLE US0378331005 1 Other Foreign 125010000
3 MICROSOFT US5949181045 0 United States 83953000
4 MICROSOFT US5949181045 1 United States 84135000
我试过了
我尝试了以下几行,但实际上我只需要旋转一次并合并 segment level
上的输出并在 desc
上有两列
df %>%
tidyr::pivot_longer(
c(WC19600, WC19610),
names_pattern = "WC196(\d)0",
names_to = "segm",
values_to = "region"
) %>%
tidyr::pivot_longer(
c(WC19601, WC19611),
names_pattern = "WC196(\d)1",
names_to = "segm",
values_to = "sales",
names_repair = "minimal"
)
# A tibble: 8 x 6
NAME ISIN segm region segm sales
<chr> <chr> <chr> <chr> <chr> <dbl>
1 APPLE US0378331005 0 United States 0 109197000
2 APPLE US0378331005 0 United States 1 125010000
3 APPLE US0378331005 1 Other Foreign 0 109197000
4 APPLE US0378331005 1 Other Foreign 1 125010000
5 MICROSOFT US5949181045 0 United States 0 83953000
6 MICROSOFT US5949181045 0 United States 1 84135000
7 MICROSOFT US5949181045 1 Other countries 0 83953000
8 MICROSOFT US5949181045 1 Other countries 1 84135000
数据
# input data
df <- tibble::tribble(
~NAME, ~ISIN, ~WC19600, ~WC19610, ~WC19601, ~WC19611,
"APPLE", "US0378331005", "United States", "Other Foreign", 109197000, 125010000,
"MICROSOFT", "US5949181045", "United States", "Other countries", 83953000, 84135000
)
# aimed results
expected <- tribble(
~NAME, ~ISIN, ~segm, ~region, ~sales,
"APPLE","US0378331005",0,"United States",109197000,
"APPLE","US0378331005",1,"Other Foreign",125010000,
"MICROSOFT", "US5949181045", 0, "United States", 83953000,
"MICROSOFT", "US5949181045", 1, "United States", 84135000,
)
library(dplyr)
library(tidyr)
pivot_longer(
df, -c(NAME, ISIN),
names_pattern = "(.*)([0-9])$", names_to = c("segm", ".value")
) %>%
rename(region = "0", sales= "1")
# # A tibble: 4 x 5
# NAME ISIN segm region sales
# <chr> <chr> <chr> <chr> <dbl>
# 1 APPLE US0378331005 WC1960 United States 109197000
# 2 APPLE US0378331005 WC1961 Other Foreign 125010000
# 3 MICROSOFT US5949181045 WC1960 United States 83953000
# 4 MICROSOFT US5949181045 WC1961 Other countries 84135000
(如果你need/want,你可以添加%>% mutate(segm = gsub(".*(.)$", "\1", segm))
来清理segm
。)
假设我有以下数据 (df
),其中包含按区域划分的公司的数据。 df
是一种宽格式,其中 WC19600
、WC19610
描述位置,WC19601
、WC19611
包含销售数据。这些变量中的倒数第二个数字表示 segment level
.
# A tibble: 2 x 6
NAME ISIN WC19600 WC19610 WC19601 WC19611
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 APPLE US0378331005 United States Other Foreign 109197000 125010000
2 MICROSOFT US5949181045 United States Other countries 83953000 84135000
我的目标是使用更长格式的数据,例如
# A tibble: 4 x 5
NAME ISIN segm region sales
<chr> <chr> <dbl> <chr> <dbl>
1 APPLE US0378331005 0 United States 109197000
2 APPLE US0378331005 1 Other Foreign 125010000
3 MICROSOFT US5949181045 0 United States 83953000
4 MICROSOFT US5949181045 1 United States 84135000
我试过了
我尝试了以下几行,但实际上我只需要旋转一次并合并 segment level
上的输出并在 desc
df %>%
tidyr::pivot_longer(
c(WC19600, WC19610),
names_pattern = "WC196(\d)0",
names_to = "segm",
values_to = "region"
) %>%
tidyr::pivot_longer(
c(WC19601, WC19611),
names_pattern = "WC196(\d)1",
names_to = "segm",
values_to = "sales",
names_repair = "minimal"
)
# A tibble: 8 x 6
NAME ISIN segm region segm sales
<chr> <chr> <chr> <chr> <chr> <dbl>
1 APPLE US0378331005 0 United States 0 109197000
2 APPLE US0378331005 0 United States 1 125010000
3 APPLE US0378331005 1 Other Foreign 0 109197000
4 APPLE US0378331005 1 Other Foreign 1 125010000
5 MICROSOFT US5949181045 0 United States 0 83953000
6 MICROSOFT US5949181045 0 United States 1 84135000
7 MICROSOFT US5949181045 1 Other countries 0 83953000
8 MICROSOFT US5949181045 1 Other countries 1 84135000
数据
# input data
df <- tibble::tribble(
~NAME, ~ISIN, ~WC19600, ~WC19610, ~WC19601, ~WC19611,
"APPLE", "US0378331005", "United States", "Other Foreign", 109197000, 125010000,
"MICROSOFT", "US5949181045", "United States", "Other countries", 83953000, 84135000
)
# aimed results
expected <- tribble(
~NAME, ~ISIN, ~segm, ~region, ~sales,
"APPLE","US0378331005",0,"United States",109197000,
"APPLE","US0378331005",1,"Other Foreign",125010000,
"MICROSOFT", "US5949181045", 0, "United States", 83953000,
"MICROSOFT", "US5949181045", 1, "United States", 84135000,
)
library(dplyr)
library(tidyr)
pivot_longer(
df, -c(NAME, ISIN),
names_pattern = "(.*)([0-9])$", names_to = c("segm", ".value")
) %>%
rename(region = "0", sales= "1")
# # A tibble: 4 x 5
# NAME ISIN segm region sales
# <chr> <chr> <chr> <chr> <dbl>
# 1 APPLE US0378331005 WC1960 United States 109197000
# 2 APPLE US0378331005 WC1961 Other Foreign 125010000
# 3 MICROSOFT US5949181045 WC1960 United States 83953000
# 4 MICROSOFT US5949181045 WC1961 Other countries 84135000
(如果你need/want,你可以添加%>% mutate(segm = gsub(".*(.)$", "\1", segm))
来清理segm
。)