R:跨 excel 个文件的条件格式
R: Conditional Formatting across excel files
我试图根据与单独 excel 文件中的列的匹配来突出显示 excel 文件的行。几乎,我想突出显示 file1 中的一行,如果该行中的单元格与 file2 中的单元格匹配。
我看到 R 包 "conditionalFormatting" 有一些这样的功能,但我不知道如何使用它。
我认为伪代码看起来像这样:
file1 <- read_excel("file1")
file2 <- read_excel("file2")
conditionalFormatting(file1, sheet = 1, cols = 1:end, rows = 1:22,
rule = "number in file1 is found in a specific column of file 2")
请让我知道这是否有意义或者我是否需要澄清一些事情。
谢谢!
conditionalFormatting()
函数将活动条件格式嵌入到 excel 文档中,但可能比您一次性突出显示所需的要复杂得多。我建议将每个文件加载到数据框中,确定哪些行包含匹配的单元格,创建突出显示样式(黄色背景),将文件加载为工作簿对象,将适当的行设置为突出显示样式,并保存更新的工作簿目的。
以下函数用于确定哪些行具有匹配项。 magrittr
包提供了 %>%
管道,data.table
包提供了 transpose()
函数。
find_matched_rows <- function(df1, df2) {
require(magrittr)
require(data.table)
# the dataframe object treats each column as a list making it much easier and
# faster to search via column than row. Transpose the original file1 dataframe
# to treat the rows as columns.
df1_transposed <- data.table::transpose(df1)
# assuming that the location of the match in the second file is irrelevant,
# unlist the file2 dataframe so that each value in file1 can be searched in a
# vector
df2_as_vector <- unlist(df2)
# determine which columns contain a match. If one or more matches are found,
# attribute the row as 'TRUE' in the output vector to be used to subset the
# row numbers
match_map <- lapply(df1_transposed,FUN = `%in%`, df2_as_vector) %>%
as.data.frame(stringsAsFactors = FALSE) %>%
sapply(function(x) sum(x) > 0)
# make a vector of row numbers using the logical match_map vector to subset
matched_rows <- seq(1:nrow(df1))[match_map]
return(matched_rows)
}
以下代码加载数据、查找匹配的行、应用突出显示并保存原始 file1.xlsx。第二个 tst_df1 和 tst_df2 提供了一种测试 find_matched_rows() 函数的简单方法。正如预期的那样,它发现第一个数据帧的第一行和第三行包含一个与第二个数据帧中的单元格匹配的单元格。
# used to ensure that the correct rows are highlighted. the dataframe does not
# include the header as an independent row unlike excel.
file1_header_row <- 1
file2_header_row <- 1
tst_df1 <- openxlsx::read.xlsx("./file1.xlsx",
startRow = file1_header_row)
tst_df2 <- openxlsx::read.xlsx("./file2.xlsx",
startRow = file2_header_row)
#example data for testing
tst_df1 <- data.frame(fname = c("John", "Bob", "Bill"),
lname = c("Smith", "Johnson", "Samson"),
wage = c(10, 15.23, 137.38),
stringsAsFactors = FALSE)
tst_df2 <- data.frame(a = c(10, 34, 284.2),
b = c("Billy", "Bill", "Billy-Bob"),
c = c("Samson", "Johansson", NA),
stringsAsFactors = FALSE)
df_matched_rows <- find_matched_rows(tst_df1, tst_df2)
# any color found in colours() can be used here or hex color beginning with "#"
highlight_style <- openxlsx::createStyle(fgFill = "yellow")
file1_wb <- openxlsx::loadWorkbook(file = "./file1.xlsx")
openxlsx::addStyle(wb = file1_wb,
sheet = 1,
style = highlight_style,
rows = file1_header_row + df_matched_rows,
cols = 1:ncol(tst_df1),
stack = TRUE,
gridExpand = TRUE)
openxlsx::saveWorkbook(wb = file1_wb,
file = "./file1.xlsx",
overwrite = TRUE)
我试图根据与单独 excel 文件中的列的匹配来突出显示 excel 文件的行。几乎,我想突出显示 file1 中的一行,如果该行中的单元格与 file2 中的单元格匹配。
我看到 R 包 "conditionalFormatting" 有一些这样的功能,但我不知道如何使用它。
我认为伪代码看起来像这样:
file1 <- read_excel("file1")
file2 <- read_excel("file2")
conditionalFormatting(file1, sheet = 1, cols = 1:end, rows = 1:22,
rule = "number in file1 is found in a specific column of file 2")
请让我知道这是否有意义或者我是否需要澄清一些事情。
谢谢!
conditionalFormatting()
函数将活动条件格式嵌入到 excel 文档中,但可能比您一次性突出显示所需的要复杂得多。我建议将每个文件加载到数据框中,确定哪些行包含匹配的单元格,创建突出显示样式(黄色背景),将文件加载为工作簿对象,将适当的行设置为突出显示样式,并保存更新的工作簿目的。
以下函数用于确定哪些行具有匹配项。 magrittr
包提供了 %>%
管道,data.table
包提供了 transpose()
函数。
find_matched_rows <- function(df1, df2) {
require(magrittr)
require(data.table)
# the dataframe object treats each column as a list making it much easier and
# faster to search via column than row. Transpose the original file1 dataframe
# to treat the rows as columns.
df1_transposed <- data.table::transpose(df1)
# assuming that the location of the match in the second file is irrelevant,
# unlist the file2 dataframe so that each value in file1 can be searched in a
# vector
df2_as_vector <- unlist(df2)
# determine which columns contain a match. If one or more matches are found,
# attribute the row as 'TRUE' in the output vector to be used to subset the
# row numbers
match_map <- lapply(df1_transposed,FUN = `%in%`, df2_as_vector) %>%
as.data.frame(stringsAsFactors = FALSE) %>%
sapply(function(x) sum(x) > 0)
# make a vector of row numbers using the logical match_map vector to subset
matched_rows <- seq(1:nrow(df1))[match_map]
return(matched_rows)
}
以下代码加载数据、查找匹配的行、应用突出显示并保存原始 file1.xlsx。第二个 tst_df1 和 tst_df2 提供了一种测试 find_matched_rows() 函数的简单方法。正如预期的那样,它发现第一个数据帧的第一行和第三行包含一个与第二个数据帧中的单元格匹配的单元格。
# used to ensure that the correct rows are highlighted. the dataframe does not
# include the header as an independent row unlike excel.
file1_header_row <- 1
file2_header_row <- 1
tst_df1 <- openxlsx::read.xlsx("./file1.xlsx",
startRow = file1_header_row)
tst_df2 <- openxlsx::read.xlsx("./file2.xlsx",
startRow = file2_header_row)
#example data for testing
tst_df1 <- data.frame(fname = c("John", "Bob", "Bill"),
lname = c("Smith", "Johnson", "Samson"),
wage = c(10, 15.23, 137.38),
stringsAsFactors = FALSE)
tst_df2 <- data.frame(a = c(10, 34, 284.2),
b = c("Billy", "Bill", "Billy-Bob"),
c = c("Samson", "Johansson", NA),
stringsAsFactors = FALSE)
df_matched_rows <- find_matched_rows(tst_df1, tst_df2)
# any color found in colours() can be used here or hex color beginning with "#"
highlight_style <- openxlsx::createStyle(fgFill = "yellow")
file1_wb <- openxlsx::loadWorkbook(file = "./file1.xlsx")
openxlsx::addStyle(wb = file1_wb,
sheet = 1,
style = highlight_style,
rows = file1_header_row + df_matched_rows,
cols = 1:ncol(tst_df1),
stack = TRUE,
gridExpand = TRUE)
openxlsx::saveWorkbook(wb = file1_wb,
file = "./file1.xlsx",
overwrite = TRUE)