在循环遍历 R 中的数据帧列表后,将每个数据帧的名称附加为一列
Attach the name of each dataframe as a column after looping over a list of dataframes in R
我创建了一个数据框列表。我需要遍历它们,过滤我需要的内容并保存为单个文件。
但是,我需要从每个文件中知道每个值的来源。
每个数据框都有一个名称,如塑料椅子 1111、木椅 3950、Table 6909 等...并保存在名为“listed”的列表中,该列表包含以下结构:
listed[1]
Material_ID ABC Key.Figure W01 W02 W03
46548970 A Actuals 1048 564 548
46548970 A Forecasted 848 500 590
18969856 A Actuals 358 1500 900
18969856 A Forecasted 460 1602 1000
listed[2]
Material_ID ABC Key.Figure W01 W02 W03
24564897 A Actuals 1258 444 798
26548970 A Forecasted 1345 500 850
34879856 A Actuals 985 1020 980
15486856 A Forecasted 846 1064 1100
我想获得的是:
Group name Group Code Material_ID ABC Key.Figure W01 W02 W03
Plastic Chair 1111 46548970 A Actuals 1048 564 548
Plastic Chair 1111 18969856 A Actuals 358 1500 900
Wooden Chair 3950 24564897 A Actuals 1258 444 798
Wooden Chair 3950 34879856 A Actuals 985 1020 980
是否可以使用数据框名称在左侧创建这两列?
非常感谢您的帮助!
如果您需要更好地了解情况,这是我的代码。
library(openxlsx)
library(dplyr)
library(purrr)
# read the data
filename = 'Dataset.xlsx'
wb <- loadWorkbook(filename)
# get a list of the spreadshits in the excel file
sheetNames <- sheets(wb)
sheetNames <- make_names(sheetNames)
# create an empty list
listed <- list()
# assign which spreadshit as a dataframe inside a list
for(i in 1:length(sheetNames))
{
listed[[i]] <- assign(sheetNames[i],readWorkbook(wb,sheet = i))
print(paste0("read the ", i," file")) # here it says what it's doing
}
# remove variable Sales.Org.ID
map(listed, ~ (.x %>% select(-Sales.Org.ID)))
# filter the dataframes to only show rows with Key.Figure = "Actual Totals"
list_actuals <- lapply(listed, function(x) x %>%
filter( Key.Figure == "Actual Totals"),
)
# put the result in a single dataframe
result_actuals = do.call(rbind,list_actuals)
我认为稍微简化代码会有所帮助。例如,不要先将 sheet 名称更改为 make_names
,然后遍历 sheet 数字以导入。相反,在导入数据之前使用未更改的 sheet 名称,并在以后根据需要更改名称。也可以尝试 map_df
,而不是 lapply
后接 rbind。它不像评论中建议的 purrr::mapdfr
那样专业,但更容易看到正在发生的事情。在下面的示例代码中,我在 map_df
中使用 mutate
将名称插入到每个数据框中,然后 map_df
将它们合并。
library(openxlsx)
library(dplyr)
library(purrr)
# read the data
filename = 'Dataset.xlsx'
wb <- loadWorkbook(filename)
wb %>%
sheets() %>%
# read all of the sheets, put the sheet name in a new column
map_df(~readWorkbook(wb, sheet = .x) %>% mutate(group_name = .x)) %>%
# remove variable Sales.Org.ID
select(-Sales.Org.ID) %>%
# filter the dataframes to only show rows with Key.Figure = "Actual Totals"
filter( Key.Figure == "Actual Totals") %>%
# if you still want to change the names taken from the sheet names
mutate(group_name = make_name(group_name))
我创建了一个数据框列表。我需要遍历它们,过滤我需要的内容并保存为单个文件。 但是,我需要从每个文件中知道每个值的来源。
每个数据框都有一个名称,如塑料椅子 1111、木椅 3950、Table 6909 等...并保存在名为“listed”的列表中,该列表包含以下结构:
listed[1]
Material_ID ABC Key.Figure W01 W02 W03
46548970 A Actuals 1048 564 548
46548970 A Forecasted 848 500 590
18969856 A Actuals 358 1500 900
18969856 A Forecasted 460 1602 1000
listed[2]
Material_ID ABC Key.Figure W01 W02 W03
24564897 A Actuals 1258 444 798
26548970 A Forecasted 1345 500 850
34879856 A Actuals 985 1020 980
15486856 A Forecasted 846 1064 1100
我想获得的是:
Group name Group Code Material_ID ABC Key.Figure W01 W02 W03
Plastic Chair 1111 46548970 A Actuals 1048 564 548
Plastic Chair 1111 18969856 A Actuals 358 1500 900
Wooden Chair 3950 24564897 A Actuals 1258 444 798
Wooden Chair 3950 34879856 A Actuals 985 1020 980
是否可以使用数据框名称在左侧创建这两列?
非常感谢您的帮助!
如果您需要更好地了解情况,这是我的代码。
library(openxlsx)
library(dplyr)
library(purrr)
# read the data
filename = 'Dataset.xlsx'
wb <- loadWorkbook(filename)
# get a list of the spreadshits in the excel file
sheetNames <- sheets(wb)
sheetNames <- make_names(sheetNames)
# create an empty list
listed <- list()
# assign which spreadshit as a dataframe inside a list
for(i in 1:length(sheetNames))
{
listed[[i]] <- assign(sheetNames[i],readWorkbook(wb,sheet = i))
print(paste0("read the ", i," file")) # here it says what it's doing
}
# remove variable Sales.Org.ID
map(listed, ~ (.x %>% select(-Sales.Org.ID)))
# filter the dataframes to only show rows with Key.Figure = "Actual Totals"
list_actuals <- lapply(listed, function(x) x %>%
filter( Key.Figure == "Actual Totals"),
)
# put the result in a single dataframe
result_actuals = do.call(rbind,list_actuals)
我认为稍微简化代码会有所帮助。例如,不要先将 sheet 名称更改为 make_names
,然后遍历 sheet 数字以导入。相反,在导入数据之前使用未更改的 sheet 名称,并在以后根据需要更改名称。也可以尝试 map_df
,而不是 lapply
后接 rbind。它不像评论中建议的 purrr::mapdfr
那样专业,但更容易看到正在发生的事情。在下面的示例代码中,我在 map_df
中使用 mutate
将名称插入到每个数据框中,然后 map_df
将它们合并。
library(openxlsx)
library(dplyr)
library(purrr)
# read the data
filename = 'Dataset.xlsx'
wb <- loadWorkbook(filename)
wb %>%
sheets() %>%
# read all of the sheets, put the sheet name in a new column
map_df(~readWorkbook(wb, sheet = .x) %>% mutate(group_name = .x)) %>%
# remove variable Sales.Org.ID
select(-Sales.Org.ID) %>%
# filter the dataframes to only show rows with Key.Figure = "Actual Totals"
filter( Key.Figure == "Actual Totals") %>%
# if you still want to change the names taken from the sheet names
mutate(group_name = make_name(group_name))