如何将长格式的数据帧转换为适当格式的列表?
How to convert a dataframe in long format into a list of an appropriate format?
我有以下长格式的数据框:
我需要将其转换为如下所示的列表:
其中,列表的每个主要元素都是“实例编号”。及其子元素应包含所有相应的参数和值对 - 格式为“参数 X”=“abc”,如您在第二张图片中看到的那样,一个接一个地列出。
是否有任何现有功能可以做到这一点?我真的找不到任何东西。非常感谢任何帮助。
谢谢。
require(data.table)
your_dt <- data.table(your_df)
dt_long <- melt.data.table(your_dt, id.vars='Instance No.')
class(dt_long) # for debugging
dt_long[, strVal:=paste(variable,value, sep = '=')]
result_list <- list()
for (i in unique(dt_long[['Instance No.']])){
result_list[[as.character(i)]] <- dt_long[`Instance No.`==i, strVal]
}
dplyr 解决方案
require(dplyr)
df_original <- data.frame("Instance No." = c(3,3,3,3,5,5,5,2,2,2,2),
"Parameter" = c("age", "workclass", "education", "occupation",
"age", "workclass", "education",
"age", "workclass", "education", "income"),
"Value" = c("Senior", "Private", "HS-grad", "Sales",
"Middle-aged", "Gov", "Hs-grad",
"Middle-aged", "Private", "Masters", "Large"),
check.names = FALSE)
# the split function requires a factor to use as the grouping variable.
# Param_Value will be the properly formated vector
df_modified <- mutate(df_original,
Param_Value = paste0(Parameter, "=", Value))
# drop the parameter and value columns now that the data is contained in Param_Value
df_modified <- select(df_modified,
`Instance No.`,
Param_Value)
# there is now a list containing dataframes with rows grouped by Instance No.
list_format <- split(df_modified,
df_modified$`Instance No.`)
# The Instance No. is still in each dataframe. Loop through each and strip the column.
list_simplified <- lapply(list_format,
select, -`Instance No.`)
# unlist the remaining Param_Value column and drop the names.
list_out <- lapply(list_simplified ,
unlist, use.names = F)
现在应该有一个按要求格式化的向量列表。
$`2`
[1] "age=Middle-aged" "workclass=Private" "education=Masters" "income=Large"
$`3`
[1] "age=Senior" "workclass=Private" "education=HS-grad" "occupation=Sales"
$`5`
[1] "age=Middle-aged" "workclass=Gov" "education=Hs-grad"
发布的data.table解决方案更快,但我认为这更容易理解。
仅供参考。这是执行此操作的 R base oneliner。 df
是你的数据框。
l <- lapply(split(df, list(df["Instance No."])),
function(x) paste0(x$Parameter, "=", x$Value))
我有以下长格式的数据框:
我需要将其转换为如下所示的列表:
其中,列表的每个主要元素都是“实例编号”。及其子元素应包含所有相应的参数和值对 - 格式为“参数 X”=“abc”,如您在第二张图片中看到的那样,一个接一个地列出。
是否有任何现有功能可以做到这一点?我真的找不到任何东西。非常感谢任何帮助。
谢谢。
require(data.table)
your_dt <- data.table(your_df)
dt_long <- melt.data.table(your_dt, id.vars='Instance No.')
class(dt_long) # for debugging
dt_long[, strVal:=paste(variable,value, sep = '=')]
result_list <- list()
for (i in unique(dt_long[['Instance No.']])){
result_list[[as.character(i)]] <- dt_long[`Instance No.`==i, strVal]
}
dplyr 解决方案
require(dplyr)
df_original <- data.frame("Instance No." = c(3,3,3,3,5,5,5,2,2,2,2),
"Parameter" = c("age", "workclass", "education", "occupation",
"age", "workclass", "education",
"age", "workclass", "education", "income"),
"Value" = c("Senior", "Private", "HS-grad", "Sales",
"Middle-aged", "Gov", "Hs-grad",
"Middle-aged", "Private", "Masters", "Large"),
check.names = FALSE)
# the split function requires a factor to use as the grouping variable.
# Param_Value will be the properly formated vector
df_modified <- mutate(df_original,
Param_Value = paste0(Parameter, "=", Value))
# drop the parameter and value columns now that the data is contained in Param_Value
df_modified <- select(df_modified,
`Instance No.`,
Param_Value)
# there is now a list containing dataframes with rows grouped by Instance No.
list_format <- split(df_modified,
df_modified$`Instance No.`)
# The Instance No. is still in each dataframe. Loop through each and strip the column.
list_simplified <- lapply(list_format,
select, -`Instance No.`)
# unlist the remaining Param_Value column and drop the names.
list_out <- lapply(list_simplified ,
unlist, use.names = F)
现在应该有一个按要求格式化的向量列表。
$`2`
[1] "age=Middle-aged" "workclass=Private" "education=Masters" "income=Large"
$`3`
[1] "age=Senior" "workclass=Private" "education=HS-grad" "occupation=Sales"
$`5`
[1] "age=Middle-aged" "workclass=Gov" "education=Hs-grad"
发布的data.table解决方案更快,但我认为这更容易理解。
仅供参考。这是执行此操作的 R base oneliner。 df
是你的数据框。
l <- lapply(split(df, list(df["Instance No."])),
function(x) paste0(x$Parameter, "=", x$Value))