如何将多个数据框日期列组合成 R 中的一个堆叠日期列
How to combine multiple data frame date columns into one stacked date column in R
我有以下数据框,它记录了个人的姓名、他们的工作以及他们在给定的一天工作了多少小时:
test_df <- data.frame(Name = c("Mark", "Stacy", "Anthony", "Colette"),
Job = c("Bartender", "Bartender", "Host", "Server"),
"01-Jan" = c(4, 5, 0, 6),
"03-Jan" = c(3, 7, 7, 8),
"04-Jan" = c(8, 0, 5, 4),
"07-Jan" = c(5, 6, 6, 7),
"08-Jan" = c(6, 8, 4, 0))
上面的数据框只有一行对应每个人的名字,然后每一天有一列,记录了一个人在那一天工作了多少小时。我想切换这个,让每个人的名字出现在 5 行中(对应于数据框中表示的 5 天),然后是“日期”列和“小时数”列,显示一个人工作了多少小时在哪一天,一次一行。像这样:
test_df_2 <- data.frame(Name = c("Mark", "Mark", "Mark", "Mark", "Mark",
"Stacy", "Stacy", "Stacy", "Stacy", "Stacy",
"Anthony", "Anthony", "Anthony", "Anthony", "Anthony",
"Colette", "Colette", "Colette", "Colette", "Colette"),
Job = c("Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
"Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
"Host", "Host", "Host", "Host", "Host",
"Server", "Server", "Server", "Server", "Server"),
Date = c("01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020"),
Hours = c(4, 3, 8, 5, 6,
5, 7, 0, 6, 8,
0, 7, 5, 6, 4,
6, 8, 4, 7, 0))
我想在 R 中编写一个脚本来将第一个数据框转换为第二个数据框,但我不确定如何操作。我查看了有关将多个数据框列合并为一个的其他堆栈溢出答案,但我看不到如何使用相应的日期让行填充正确的次数。
你可以使用 tidyr::pivot_longer():
library(tidyr)
pivot_longer(test_df, 3:7, names_to="Date", values_to="Hours")
这将创建您正在寻找的输出结构。然后您只需重新编码 Date
列中的值。
另一种选择是使用 reshape2::melt():
melt(test_df, id.vars=c("Name", "Job"), variable.name = "Date", value.name="Hours")
不过,tidyr/pivot_longer 似乎是新手。
This article 很好地解释了工作中的概念。
我有以下数据框,它记录了个人的姓名、他们的工作以及他们在给定的一天工作了多少小时:
test_df <- data.frame(Name = c("Mark", "Stacy", "Anthony", "Colette"),
Job = c("Bartender", "Bartender", "Host", "Server"),
"01-Jan" = c(4, 5, 0, 6),
"03-Jan" = c(3, 7, 7, 8),
"04-Jan" = c(8, 0, 5, 4),
"07-Jan" = c(5, 6, 6, 7),
"08-Jan" = c(6, 8, 4, 0))
上面的数据框只有一行对应每个人的名字,然后每一天有一列,记录了一个人在那一天工作了多少小时。我想切换这个,让每个人的名字出现在 5 行中(对应于数据框中表示的 5 天),然后是“日期”列和“小时数”列,显示一个人工作了多少小时在哪一天,一次一行。像这样:
test_df_2 <- data.frame(Name = c("Mark", "Mark", "Mark", "Mark", "Mark",
"Stacy", "Stacy", "Stacy", "Stacy", "Stacy",
"Anthony", "Anthony", "Anthony", "Anthony", "Anthony",
"Colette", "Colette", "Colette", "Colette", "Colette"),
Job = c("Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
"Bartender", "Bartender", "Bartender", "Bartender", "Bartender",
"Host", "Host", "Host", "Host", "Host",
"Server", "Server", "Server", "Server", "Server"),
Date = c("01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020",
"01-01-2020", "01-03-2020", "01-04-2020", "01-07-2020", "01-08-2020"),
Hours = c(4, 3, 8, 5, 6,
5, 7, 0, 6, 8,
0, 7, 5, 6, 4,
6, 8, 4, 7, 0))
我想在 R 中编写一个脚本来将第一个数据框转换为第二个数据框,但我不确定如何操作。我查看了有关将多个数据框列合并为一个的其他堆栈溢出答案,但我看不到如何使用相应的日期让行填充正确的次数。
你可以使用 tidyr::pivot_longer():
library(tidyr)
pivot_longer(test_df, 3:7, names_to="Date", values_to="Hours")
这将创建您正在寻找的输出结构。然后您只需重新编码 Date
列中的值。
另一种选择是使用 reshape2::melt():
melt(test_df, id.vars=c("Name", "Job"), variable.name = "Date", value.name="Hours")
不过,tidyr/pivot_longer 似乎是新手。
This article 很好地解释了工作中的概念。