如何使用宽数据格式的第二行作为额外的变量名
how to use the second row in a wide data format as an extra variable name
我有一个数据框,其中第一行是列名称(当然),但第二行是一个额外的信息和一个我想对其应用统计信息的变量/例如,第一行是植物的 ID,第二行是位置,其余行是时间序列上的因变量。重要的是要注意第一个列是我的 x 轴,它代表时间。这是我的一些数据:
Days L-FCS L-DRC L-PCH S-PCH S-PCH S-SSV M-SSV L-SSV S-DRC L-MNS L-DRC
Room 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 2-BR-SE 2-BR-SE 2-BR-SE
0.00 0 0 0 0 0 0 0 0 0 0 0
0.04 0 0 1 0 0 0 1 0 0 0 0
0.08 0 0 1 0 0 1 2 0 0 0 0
0.13 0 0 -1 0 0 2 3 0 0 0 0
0.17 0 0 -1 0 0 3 4 0 0 0 0
0.21 0 0 -1 0 0 4 5 0 0 0 0
0.25 0 0 -1 0 0 4 6 0 0 0 0
0.29 0 0 -2 0 0 4 6 0 0 0 0
0.33 0 0 -1 0 0 4 6 0 0 0 0
0.38 -1 0 -1 0 0 4 6 0 0 0 0
0.42 -2 0 -1 0 0 4 6 0 0 0 0
0.46 -5 0 -1 0 0 4 6 0 0 0 0
0.50 -5 0 -2 0 0 4 6 0 0 -1 0
0.54 -5 0 -2 0 0 4 6 0 0 -2 0
0.58 -6 0 -3 0 0 4 7 0 0 -3 0
0.63 -8 0 -3 0 0 4 8 0 0 -3 0
0.67 -9 0 -3 0 0 4 8 0 0 -3 0
0.71 -9 0 -3 0 0 4 11 0 -1 -3 0
0.75 -9 0 -3 0 0 4 11 0 -1 -4 0
0.79 -9 0 -3 0 0 4 13 0 -1 -5 0
0.83 -10 0 -3 0 0 4 13 0 -1 -5 0
0.88 -12 0 -3 0 0 4 13 0 -1 -5 0
0.92 -13 0 -4 0 0 4 13 0 -1 -6 0
0.96 -14 0 -5 0 -1 4 13 0 -1 -6 0
1.00 -14 0 -5 0 -1 4 13 0 -1 -6 0
1.04 -15 0 -5 0 -1 4 13 0 -2 -6 0
1.08 -16 0 -5 0 -1 4 13 0 -2 -6 0
我省略了一些行名称,因为它不适合这里的列(第一行和第二行名称对于数字来说太宽了)
为了将来使用,我很想知道如何使用任何多行作为我的数据的变量。我试图将它重塑为长格式(我将长格式与此数据一起用于其他目的)但我找不到如何重塑它所以我也有这些列。
到目前为止我所做的是完全省略第二行,所以我只有 Days 列,它的第一个调用是 0.00
再次附上dput():
structure(list(Days = c("Room", "0.00", "0.04", "0.08", "0.13",
"0.17"), L.FCS = c("1-BR-SW", "0", "0", "0", "0", "0"), L.DRC = c("1-BR-SW",
"0", "0", "0", "0", "0"), L.PCH = c("1-BR-SW", "0", "0", "0",
"0", "0"), S.PCH = c("1-BR-SW", "0", "0", "0", "0", "0"), S.PCH.1 = c("1-BR-SW",
"0", "0", "0", "0", "0"), S.SSV = c("1-BR-SW", "0", "0", "0",
"0", "0"), Hodaya_M = c("1-BR-SW", "0", "1", "1", "3", "3"),
L.SSV = c("2-BR-SE", "0", "-1", "-1", "-2", "-2"), S.DRC = c("2-BR-SE",
"0", "0", "-1", "-1", "-1")), row.names = c(NA, 6L), class = "data.frame")
我想到了解决此问题的两个选项:首先,数据框第一行中的内容是标签。因此,可以将此信息视为标签。其次,这是不太优雅的解决方案,将第一行的列名和文本标签粘贴在一起。我在下面提供了两个选项:
library(tidyverse)
# this is an example dataframe (easier version of what you posted)
df <- tibble(
time = c('text1', 1:4),
var1 = c('text2', 2:5),
var2 = c('text3', 3:6)
)
df
#######################
# OPTION 1: Labelling #
#######################
# load package for lable function
library(Hmisc)
# set first row as label
label(df) <- df[1,]
# remove first row with label information (since now set as label)
df <- df[-1,]
print(df)
label(df)
######################################################
# OPTION 2: Pasting together column names and labels #
######################################################
# get text from column names + first row
text1 <- names(df)
text2 <- as.character(df[1,])
# paste together both strings
new_col_header <- paste(text1, text2, sep = "__")
# remove first row with label information
df <- df[-1,]
# rename columns with new info
colnames(df) <- new_col_header
print(df)
我建议分别提取 header 信息和数据,然后将两者合并。
library(tidyverse)
df1_headers <- df1 %>%
janitor::clean_names() %>% # Reformats names and makes them unique
slice(1) %>% # only keep the first data row, your "location"
mutate(across(,as.character)) %>% # make everything character data
pivot_longer(everything(), values_to = "location")
df1_body <- df1 %>%
janitor::clean_names() %>%
slice(-1) %>% # remove only the first row
mutate(across(,as.numeric), # make numeric
row = row_number()) %>% # add row number in case days not unique / ordered
pivot_longer(-c(row, days))
df_long <- df1_body %>% left_join(df1_headers)
这种格式对于进一步的分析和可视化应该非常灵活,例如
ggplot(df_long, aes(days, value, color = name)) +
geom_line() +
facet_wrap(~location)
源数据
df1 <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Days = c("Room","0","0.04","0.08",
"0.13","0.17","0.21","0.25","0.29","0.33","0.38",
"0.42","0.46","0.5","0.54","0.58","0.63","0.67",
"0.71","0.75","0.79","0.83","0.88","0.92","0.96","1",
"1.04","1.08"),
`L-FCS` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","-1","-2","-5","-5","-5",
"-6","-8","-9","-9","-9","-9","-10","-12","-13",
"-14","-14","-15","-16"),
`L-DRC` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0","0"),
`L-PCH` = c("1-BR-SW","0","1","1","-1",
"-1","-1","-1","-2","-1","-1","-1","-1","-2",
"-2","-3","-3","-3","-3","-3","-3","-3","-3","-4",
"-5","-5","-5","-5"),
`S-PCH` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0","0"),
`S-PCH` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","-1","-1","-1",
"-1"),
`S-SSV` = c("1-BR-SW","0","0","1","2",
"3","4","4","4","4","4","4","4","4","4","4",
"4","4","4","4","4","4","4","4","4","4","4","4"),
`M-SSV` = c("1-BR-SW","0","1","2","3",
"4","5","6","6","6","6","6","6","6","6","7",
"8","8","11","11","13","13","13","13","13","13",
"13","13"),
`L-SSV` = c("2-BR-SE","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0","0"),
`S-DRC` = c("2-BR-SE","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","-1","-1","-1","-1","-1","-1","-1","-1",
"-2","-2"),
`L-MNS` = c("2-BR-SE","0","0","0","0",
"0","0","0","0","0","0","0","0","-1","-2","-3",
"-3","-3","-3","-4","-5","-5","-5","-6","-6",
"-6","-6","-6"),
`L-DRC` = c(NA,0L,0L,0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L)
我有一个数据框,其中第一行是列名称(当然),但第二行是一个额外的信息和一个我想对其应用统计信息的变量/例如,第一行是植物的 ID,第二行是位置,其余行是时间序列上的因变量。重要的是要注意第一个列是我的 x 轴,它代表时间。这是我的一些数据:
Days L-FCS L-DRC L-PCH S-PCH S-PCH S-SSV M-SSV L-SSV S-DRC L-MNS L-DRC
Room 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 1-BR-SW 2-BR-SE 2-BR-SE 2-BR-SE
0.00 0 0 0 0 0 0 0 0 0 0 0
0.04 0 0 1 0 0 0 1 0 0 0 0
0.08 0 0 1 0 0 1 2 0 0 0 0
0.13 0 0 -1 0 0 2 3 0 0 0 0
0.17 0 0 -1 0 0 3 4 0 0 0 0
0.21 0 0 -1 0 0 4 5 0 0 0 0
0.25 0 0 -1 0 0 4 6 0 0 0 0
0.29 0 0 -2 0 0 4 6 0 0 0 0
0.33 0 0 -1 0 0 4 6 0 0 0 0
0.38 -1 0 -1 0 0 4 6 0 0 0 0
0.42 -2 0 -1 0 0 4 6 0 0 0 0
0.46 -5 0 -1 0 0 4 6 0 0 0 0
0.50 -5 0 -2 0 0 4 6 0 0 -1 0
0.54 -5 0 -2 0 0 4 6 0 0 -2 0
0.58 -6 0 -3 0 0 4 7 0 0 -3 0
0.63 -8 0 -3 0 0 4 8 0 0 -3 0
0.67 -9 0 -3 0 0 4 8 0 0 -3 0
0.71 -9 0 -3 0 0 4 11 0 -1 -3 0
0.75 -9 0 -3 0 0 4 11 0 -1 -4 0
0.79 -9 0 -3 0 0 4 13 0 -1 -5 0
0.83 -10 0 -3 0 0 4 13 0 -1 -5 0
0.88 -12 0 -3 0 0 4 13 0 -1 -5 0
0.92 -13 0 -4 0 0 4 13 0 -1 -6 0
0.96 -14 0 -5 0 -1 4 13 0 -1 -6 0
1.00 -14 0 -5 0 -1 4 13 0 -1 -6 0
1.04 -15 0 -5 0 -1 4 13 0 -2 -6 0
1.08 -16 0 -5 0 -1 4 13 0 -2 -6 0
我省略了一些行名称,因为它不适合这里的列(第一行和第二行名称对于数字来说太宽了)
为了将来使用,我很想知道如何使用任何多行作为我的数据的变量。我试图将它重塑为长格式(我将长格式与此数据一起用于其他目的)但我找不到如何重塑它所以我也有这些列。
到目前为止我所做的是完全省略第二行,所以我只有 Days 列,它的第一个调用是 0.00
再次附上dput():
structure(list(Days = c("Room", "0.00", "0.04", "0.08", "0.13",
"0.17"), L.FCS = c("1-BR-SW", "0", "0", "0", "0", "0"), L.DRC = c("1-BR-SW",
"0", "0", "0", "0", "0"), L.PCH = c("1-BR-SW", "0", "0", "0",
"0", "0"), S.PCH = c("1-BR-SW", "0", "0", "0", "0", "0"), S.PCH.1 = c("1-BR-SW",
"0", "0", "0", "0", "0"), S.SSV = c("1-BR-SW", "0", "0", "0",
"0", "0"), Hodaya_M = c("1-BR-SW", "0", "1", "1", "3", "3"),
L.SSV = c("2-BR-SE", "0", "-1", "-1", "-2", "-2"), S.DRC = c("2-BR-SE",
"0", "0", "-1", "-1", "-1")), row.names = c(NA, 6L), class = "data.frame")
我想到了解决此问题的两个选项:首先,数据框第一行中的内容是标签。因此,可以将此信息视为标签。其次,这是不太优雅的解决方案,将第一行的列名和文本标签粘贴在一起。我在下面提供了两个选项:
library(tidyverse)
# this is an example dataframe (easier version of what you posted)
df <- tibble(
time = c('text1', 1:4),
var1 = c('text2', 2:5),
var2 = c('text3', 3:6)
)
df
#######################
# OPTION 1: Labelling #
#######################
# load package for lable function
library(Hmisc)
# set first row as label
label(df) <- df[1,]
# remove first row with label information (since now set as label)
df <- df[-1,]
print(df)
label(df)
######################################################
# OPTION 2: Pasting together column names and labels #
######################################################
# get text from column names + first row
text1 <- names(df)
text2 <- as.character(df[1,])
# paste together both strings
new_col_header <- paste(text1, text2, sep = "__")
# remove first row with label information
df <- df[-1,]
# rename columns with new info
colnames(df) <- new_col_header
print(df)
我建议分别提取 header 信息和数据,然后将两者合并。
library(tidyverse)
df1_headers <- df1 %>%
janitor::clean_names() %>% # Reformats names and makes them unique
slice(1) %>% # only keep the first data row, your "location"
mutate(across(,as.character)) %>% # make everything character data
pivot_longer(everything(), values_to = "location")
df1_body <- df1 %>%
janitor::clean_names() %>%
slice(-1) %>% # remove only the first row
mutate(across(,as.numeric), # make numeric
row = row_number()) %>% # add row number in case days not unique / ordered
pivot_longer(-c(row, days))
df_long <- df1_body %>% left_join(df1_headers)
这种格式对于进一步的分析和可视化应该非常灵活,例如
ggplot(df_long, aes(days, value, color = name)) +
geom_line() +
facet_wrap(~location)
源数据
df1 <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
Days = c("Room","0","0.04","0.08",
"0.13","0.17","0.21","0.25","0.29","0.33","0.38",
"0.42","0.46","0.5","0.54","0.58","0.63","0.67",
"0.71","0.75","0.79","0.83","0.88","0.92","0.96","1",
"1.04","1.08"),
`L-FCS` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","-1","-2","-5","-5","-5",
"-6","-8","-9","-9","-9","-9","-10","-12","-13",
"-14","-14","-15","-16"),
`L-DRC` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0","0"),
`L-PCH` = c("1-BR-SW","0","1","1","-1",
"-1","-1","-1","-2","-1","-1","-1","-1","-2",
"-2","-3","-3","-3","-3","-3","-3","-3","-3","-4",
"-5","-5","-5","-5"),
`S-PCH` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0","0"),
`S-PCH` = c("1-BR-SW","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","-1","-1","-1",
"-1"),
`S-SSV` = c("1-BR-SW","0","0","1","2",
"3","4","4","4","4","4","4","4","4","4","4",
"4","4","4","4","4","4","4","4","4","4","4","4"),
`M-SSV` = c("1-BR-SW","0","1","2","3",
"4","5","6","6","6","6","6","6","6","6","7",
"8","8","11","11","13","13","13","13","13","13",
"13","13"),
`L-SSV` = c("2-BR-SE","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0","0"),
`S-DRC` = c("2-BR-SE","0","0","0","0",
"0","0","0","0","0","0","0","0","0","0","0",
"0","0","-1","-1","-1","-1","-1","-1","-1","-1",
"-2","-2"),
`L-MNS` = c("2-BR-SE","0","0","0","0",
"0","0","0","0","0","0","0","0","-1","-2","-3",
"-3","-3","-3","-4","-5","-5","-5","-6","-6",
"-6","-6","-6"),
`L-DRC` = c(NA,0L,0L,0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,0L,
0L,0L,0L,0L,0L,0L)