如何使用宽数据格式的第二行作为额外的变量名

Question

我有一个数据框，其中第一行是列名称（当然），但第二行是一个额外的信息和一个我想对其应用统计信息的变量/例如，第一行是植物的 ID，第二行是位置，其余行是时间序列上的因变量。重要的是要注意第一个列是我的 x 轴，它代表时间。这是我的一些数据：

Days L-FCS L-DRC    L-PCH   S-PCH   S-PCH   S-SSV   M-SSV   L-SSV   S-DRC   L-MNS   L-DRC
Room  1-BR-SW 1-BR-SW   1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 2-BR-SE 2-BR-SE 2-BR-SE 
0.00    0   0   0   0   0   0   0   0   0   0   0
0.04    0   0   1   0   0   0   1   0   0   0   0
0.08    0   0   1   0   0   1   2   0   0   0   0
0.13    0   0   -1  0   0   2   3   0   0   0   0
0.17    0   0   -1  0   0   3   4   0   0   0   0
0.21    0   0   -1  0   0   4   5   0   0   0   0
0.25    0   0   -1  0   0   4   6   0   0   0   0
0.29    0   0   -2  0   0   4   6   0   0   0   0
0.33    0   0   -1  0   0   4   6   0   0   0   0
0.38    -1  0   -1  0   0   4   6   0   0   0   0
0.42    -2  0   -1  0   0   4   6   0   0   0   0
0.46    -5  0   -1  0   0   4   6   0   0   0   0
0.50    -5  0   -2  0   0   4   6   0   0   -1  0
0.54    -5  0   -2  0   0   4   6   0   0   -2  0
0.58    -6  0   -3  0   0   4   7   0   0   -3  0
0.63    -8  0   -3  0   0   4   8   0   0   -3  0
0.67    -9  0   -3  0   0   4   8   0   0   -3  0
0.71    -9  0   -3  0   0   4   11  0   -1  -3  0
0.75    -9  0   -3  0   0   4   11  0   -1  -4  0
0.79    -9  0   -3  0   0   4   13  0   -1  -5  0
0.83    -10 0   -3  0   0   4   13  0   -1  -5  0
0.88    -12 0   -3  0   0   4   13  0   -1  -5  0
0.92    -13 0   -4  0   0   4   13  0   -1  -6  0
0.96    -14 0   -5  0   -1  4   13  0   -1  -6  0
1.00    -14 0   -5  0   -1  4   13  0   -1  -6  0
1.04    -15 0   -5  0   -1  4   13  0   -2  -6  0
1.08    -16 0   -5  0   -1  4   13  0   -2  -6  0

我省略了一些行名称，因为它不适合这里的列（第一行和第二行名称对于数字来说太宽了）

为了将来使用，我很想知道如何使用任何多行作为我的数据的变量。我试图将它重塑为长格式（我将长格式与此数据一起用于其他目的）但我找不到如何重塑它所以我也有这些列。

到目前为止我所做的是完全省略第二行，所以我只有 Days 列，它的第一个调用是 0.00

再次附上dput():

structure(list(Days = c("Room", "0.00", "0.04", "0.08", "0.13", 
"0.17"), L.FCS = c("1-BR-SW", "0", "0", "0", "0", "0"), L.DRC = c("1-BR-SW", 
"0", "0", "0", "0", "0"), L.PCH = c("1-BR-SW", "0", "0", "0", 
"0", "0"), S.PCH = c("1-BR-SW", "0", "0", "0", "0", "0"), S.PCH.1 = c("1-BR-SW", 
"0", "0", "0", "0", "0"), S.SSV = c("1-BR-SW", "0", "0", "0", 
"0", "0"), Hodaya_M = c("1-BR-SW", "0", "1", "1", "3", "3"), 
    L.SSV = c("2-BR-SE", "0", "-1", "-1", "-2", "-2"), S.DRC = c("2-BR-SE", 
    "0", "0", "-1", "-1", "-1")), row.names = c(NA, 6L), class = "data.frame")

Answer 1

我想到了解决此问题的两个选项：首先，数据框第一行中的内容是标签。因此，可以将此信息视为标签。其次，这是不太优雅的解决方案，将第一行的列名和文本标签粘贴在一起。我在下面提供了两个选项：

library(tidyverse)

# this is an example dataframe (easier version of what you posted)
df <- tibble(
  time = c('text1', 1:4),
  var1 = c('text2', 2:5),
  var2 = c('text3', 3:6)
)
df

#######################
# OPTION 1: Labelling #
#######################

# load package for lable function
library(Hmisc)

# set first row as label
label(df) <- df[1,]

# remove first row with label information (since now set as label)
df <- df[-1,]

print(df)
label(df)

######################################################
# OPTION 2: Pasting together column names and labels #
######################################################

# get text from column names + first row
text1 <- names(df)
text2 <- as.character(df[1,])

# paste together both strings
new_col_header <- paste(text1, text2, sep = "__")

# remove first row with label information
df <- df[-1,]

# rename columns with new info
colnames(df) <- new_col_header

print(df)

Answer 2

我建议分别提取 header 信息和数据，然后将两者合并。

library(tidyverse)
df1_headers <- df1 %>%
  janitor::clean_names() %>%   # Reformats names and makes them unique
  slice(1) %>%   # only keep the first data row, your "location"
  mutate(across(,as.character)) %>%  # make everything character data
  pivot_longer(everything(), values_to = "location")

df1_body <- df1 %>% 
  janitor::clean_names() %>%
  slice(-1) %>%   # remove only the first row
  mutate(across(,as.numeric),       # make numeric
         row = row_number()) %>%    # add row number in case days not unique / ordered
  pivot_longer(-c(row, days))

df_long <- df1_body %>% left_join(df1_headers)

这种格式对于进一步的分析和可视化应该非常灵活，例如

ggplot(df_long, aes(days, value, color = name)) +
  geom_line() +
  facet_wrap(~location)

源数据

df1 <- data.frame(
  stringsAsFactors = FALSE,
       check.names = FALSE,
              Days = c("Room","0","0.04","0.08",
                       "0.13","0.17","0.21","0.25","0.29","0.33","0.38",
                       "0.42","0.46","0.5","0.54","0.58","0.63","0.67",
                       "0.71","0.75","0.79","0.83","0.88","0.92","0.96","1",
                       "1.04","1.08"),
           `L-FCS` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","-1","-2","-5","-5","-5",
                       "-6","-8","-9","-9","-9","-9","-10","-12","-13",
                       "-14","-14","-15","-16"),
           `L-DRC` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0","0"),
           `L-PCH` = c("1-BR-SW","0","1","1","-1",
                       "-1","-1","-1","-2","-1","-1","-1","-1","-2",
                       "-2","-3","-3","-3","-3","-3","-3","-3","-3","-4",
                       "-5","-5","-5","-5"),
           `S-PCH` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0","0"),
           `S-PCH` = c("1-BR-SW","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","-1","-1","-1",
                       "-1"),
           `S-SSV` = c("1-BR-SW","0","0","1","2",
                       "3","4","4","4","4","4","4","4","4","4","4",
                       "4","4","4","4","4","4","4","4","4","4","4","4"),
           `M-SSV` = c("1-BR-SW","0","1","2","3",
                       "4","5","6","6","6","6","6","6","6","6","7",
                       "8","8","11","11","13","13","13","13","13","13",
                       "13","13"),
           `L-SSV` = c("2-BR-SE","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0","0"),
           `S-DRC` = c("2-BR-SE","0","0","0","0",
                       "0","0","0","0","0","0","0","0","0","0","0",
                       "0","0","-1","-1","-1","-1","-1","-1","-1","-1",
                       "-2","-2"),
           `L-MNS` = c("2-BR-SE","0","0","0","0",
                       "0","0","0","0","0","0","0","0","-1","-2","-3",
                       "-3","-3","-3","-4","-5","-5","-5","-6","-6",
                       "-6","-6","-6"),
           `L-DRC` = c(NA,0L,0L,0L,0L,0L,0L,0L,
                       0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,
                       0L,0L,0L,0L,0L,0L)

如何使用宽数据格式的第二行作为额外的变量名

how to use the second row in a wide data format as an extra variable name

r

rows