从长到宽的数据
Long to wide data
我一直在尝试将一些数据从长格式重塑为宽格式。我有兴趣让每个唯一 ID 一行。为了模仿我的要求,我创建了一个示例输入和所需的输出,如下所示:
输入:
id date size category name type
124 3.1 1 fruit apple royalGala
327 1.1 0 veg chilli green
124 2.1 2 fruit apple green
124 1.2 1 fruit apple jazz
124 2.2 2 fruit apple eve
124 2.1 3 fruit apple pinkLady
327 1.2 1 veg chilli red
327 1.2 2 veg chilli Jalapeño
327 1.2 3 veg chilli bananaPepper
327 3.3 1 veg chilli fresnoPepper
输出:
id fruit_apple_royalGala_date fruit_apple_royalGala_size fruit_apple_green_date fruit_apple_green_size fruit_apple_jazz_date fruit_apple_jazz_size fruit_apple_eve_date fruit_apple_eve_size fruit_apple_pinkLady_date fruit_apple_pinkLady_size veg_chilli_green_date veg_chilli_green_size veg_chilli_red_date veg_chilli_red_size veg_chilli_Jalapeño_date veg_chilli_Jalapeño_size veg_chilli_bananaPepper_date veg_chilli_bananaPepper_size veg_chilli_fresnoPepper_date veg_chilli_fresnoPepper_size
124 3.1 1 2.1 2 1.2 1 2.2 2 2.1 3 NA NA NA NA NA NA NA NA NA NA
327 NA NA NA NA NA NA NA NA NA NA 1.1 0 1.2 1 1.2 2 1.2 3 3.3 1
我不确定如何获得所需的输出。我在 Whosebug 上查看了一些相关问题,但其中 none 帮助我解决了这个问题,例如 Convert data from long format to wide format with multiple measure columns, and Reshape multiple value columns to wide format.
我从昨天开始就一直在研究这个问题,但由于对 gather 和 spread 等方面的经验很少,所以一直无法解决。如果能提供任何帮助,我将不胜感激。
谢谢!
dput()
为了方便,我也复制了dput()
structure(list(
id = c(124L, 327L, 124L, 124L, 124L, 124L, 327L, 327L, 327L, 327L),
date = c(3.1, 1.1, 2.1, 1.2, 2.2, 2.1, 1.2, 1.2, 1.2, 3.3),
size = c(1L, 0L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L),
category = c("fruit", "veg", "fruit", "fruit", "fruit", "fruit", "veg", "veg", "veg", "veg"),
name = c("apple", "chilli", "apple", "apple", "apple", "apple", "chilli", "chilli", "chilli", "chilli"),
type = c("royalGala", "green", "green", "jazz", "eve", "pinkLady", "red", "Jalapeño", "bananaPepper", "fresnoPepper")),
class = "data.frame", row.names = c(NA, -10L))
部分解决方案
我有解决这个问题的方法,但是当我 运行 在我的原始数据上时,我的解决方案失败了。
# Read the csv file
df = read.csv("C:/Desktop/test.csv")
# Unite multiple columns in to one
df_unite = df %>%
unite("info", category:type, remove = TRUE)
# Conversion from long into wide format
setDT(df_unite) # coerce to data.table
df_wide <- dcast(df_unite, id ~ info,
value.var = c("date", "size"))
df %>%
pivot_wider(id, c(category, name, type),
values_from = c(date, size),
names_glue = '{category}_{name}_{type}_{.value}')
是的,完全可以对长数据透视表数据进行排序,以便首先显示所有日期,然后显示所有尺寸。 pivot_wider
实际上提供了这个功能。
只需将 names_sort=TRUE
添加到上面@Onyambu 的函数即可。它看起来像这样:
df %>%
pivot_wider(id,c(category,name,type),
values_from = c(date, size),
names_glue='{category}_{name}_{type}_{.value}',
names_sort=TRUE)
我一直在尝试将一些数据从长格式重塑为宽格式。我有兴趣让每个唯一 ID 一行。为了模仿我的要求,我创建了一个示例输入和所需的输出,如下所示:
输入:
id date size category name type
124 3.1 1 fruit apple royalGala
327 1.1 0 veg chilli green
124 2.1 2 fruit apple green
124 1.2 1 fruit apple jazz
124 2.2 2 fruit apple eve
124 2.1 3 fruit apple pinkLady
327 1.2 1 veg chilli red
327 1.2 2 veg chilli Jalapeño
327 1.2 3 veg chilli bananaPepper
327 3.3 1 veg chilli fresnoPepper
输出:
id fruit_apple_royalGala_date fruit_apple_royalGala_size fruit_apple_green_date fruit_apple_green_size fruit_apple_jazz_date fruit_apple_jazz_size fruit_apple_eve_date fruit_apple_eve_size fruit_apple_pinkLady_date fruit_apple_pinkLady_size veg_chilli_green_date veg_chilli_green_size veg_chilli_red_date veg_chilli_red_size veg_chilli_Jalapeño_date veg_chilli_Jalapeño_size veg_chilli_bananaPepper_date veg_chilli_bananaPepper_size veg_chilli_fresnoPepper_date veg_chilli_fresnoPepper_size
124 3.1 1 2.1 2 1.2 1 2.2 2 2.1 3 NA NA NA NA NA NA NA NA NA NA
327 NA NA NA NA NA NA NA NA NA NA 1.1 0 1.2 1 1.2 2 1.2 3 3.3 1
我不确定如何获得所需的输出。我在 Whosebug 上查看了一些相关问题,但其中 none 帮助我解决了这个问题,例如 Convert data from long format to wide format with multiple measure columns,
我从昨天开始就一直在研究这个问题,但由于对 gather 和 spread 等方面的经验很少,所以一直无法解决。如果能提供任何帮助,我将不胜感激。
谢谢!
dput()
为了方便,我也复制了dput()
structure(list(
id = c(124L, 327L, 124L, 124L, 124L, 124L, 327L, 327L, 327L, 327L),
date = c(3.1, 1.1, 2.1, 1.2, 2.2, 2.1, 1.2, 1.2, 1.2, 3.3),
size = c(1L, 0L, 2L, 1L, 2L, 3L, 1L, 2L, 3L, 1L),
category = c("fruit", "veg", "fruit", "fruit", "fruit", "fruit", "veg", "veg", "veg", "veg"),
name = c("apple", "chilli", "apple", "apple", "apple", "apple", "chilli", "chilli", "chilli", "chilli"),
type = c("royalGala", "green", "green", "jazz", "eve", "pinkLady", "red", "Jalapeño", "bananaPepper", "fresnoPepper")),
class = "data.frame", row.names = c(NA, -10L))
部分解决方案
我有解决这个问题的方法,但是当我 运行 在我的原始数据上时,我的解决方案失败了。
# Read the csv file
df = read.csv("C:/Desktop/test.csv")
# Unite multiple columns in to one
df_unite = df %>%
unite("info", category:type, remove = TRUE)
# Conversion from long into wide format
setDT(df_unite) # coerce to data.table
df_wide <- dcast(df_unite, id ~ info,
value.var = c("date", "size"))
df %>%
pivot_wider(id, c(category, name, type),
values_from = c(date, size),
names_glue = '{category}_{name}_{type}_{.value}')
是的,完全可以对长数据透视表数据进行排序,以便首先显示所有日期,然后显示所有尺寸。 pivot_wider
实际上提供了这个功能。
只需将 names_sort=TRUE
添加到上面@Onyambu 的函数即可。它看起来像这样:
df %>%
pivot_wider(id,c(category,name,type),
values_from = c(date, size),
names_glue='{category}_{name}_{type}_{.value}',
names_sort=TRUE)