使用 openxlsx 按单元格填充颜色过滤 Excel 中突出显示的数据
Filter data highlighted in Excel by cell fill color using openxlsx
我有一个很大的 Excel table(18k 行和 400 列),其中一些行使用不同的颜色突出显示。有没有办法使用 openxlsx
按颜色过滤行?
我首先加载了工作簿
wb <- loadWorkbook(file = "Items Comparison.xlsx")
getStyles(wb)
df <- read.xlsx(wb, sheet = 1)
我看到工作簿中使用的样式使用 getStyles(wb)
,但不确定如何使用该信息按颜色过滤每列的所有单元格。
[[1]]
A custom cell style.
Cell formatting: GENERAL
Font name: Tahoma
Font size: 9
Font colour: #FFFFFF
Font decoration: BOLD
Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin
Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A
Cell vert. align: top
Cell fill foreground: rgb: #384C70
Cell fill background: rgb: #384C70
wraptext: TRUE
[[2]]
A custom cell style.
Cell formatting: GENERAL
Font name: Tahoma
Font size: 9
Font colour: #FFFFFF
Font decoration: BOLD
Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin
Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A
Cell vert. align: top
Cell fill foreground: rgb: #384C70
Cell fill background: rgb: #384C70
wraptext: TRUE
如何按填充颜色过滤数据?
更新
根据@Henrik 的解决方案,我尝试使用他的代码,但我一直出错。因此,为了了解发生了什么,我打印了 x$style$fill$fillFg
的输出
rgb
"FF384C70"
rgb
"FF384C70"
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
rgb
"FF384C70"
NULL
NULL
NULL
rgb
"FFFFFF00"
rgb
"FFFFFF00"
theme
"0"
theme
"0"
rgb
"FFFFFF00"
NULL
theme
"2"
theme tint
"4" "0.79998168889431442"
theme
"8"
theme
"8"
rgb
"FFFFC000"
rgb
"FFFFC000"
theme tint
"5" "0.39997558519241921"
theme tint
"5" "0.39997558519241921"
theme tint
"9" "0.39997558519241921"
theme tint
"5" "0.79998168889431442"
rgb
"FFFFFF00"
rgb
"FF384C70"
NULL
NULL
NULL
rgb
"FF384C70"
rgb
"FF384C70"
[[1]]
rgb
"FF384C70"
[[2]]
rgb
"FF384C70"
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
[[7]]
NULL
[[8]]
NULL
[[9]]
NULL
[[10]]
NULL
[[11]]
NULL
[[12]]
NULL
[[13]]
rgb
"FF384C70"
[[14]]
NULL
[[15]]
NULL
[[16]]
NULL
[[17]]
rgb
"FFFFFF00"
[[18]]
rgb
"FFFFFF00"
[[19]]
theme
"0"
[[20]]
theme
"0"
[[21]]
rgb
"FFFFFF00"
[[22]]
NULL
[[23]]
theme
"2"
[[24]]
theme tint
"4" "0.79998168889431442"
[[25]]
theme
"8"
[[26]]
theme
"8"
[[27]]
rgb
"FFFFC000"
[[28]]
rgb
"FFFFC000"
[[29]]
theme tint
"5" "0.39997558519241921"
[[30]]
theme tint
"5" "0.39997558519241921"
[[31]]
theme tint
"9" "0.39997558519241921"
[[32]]
theme tint
"5" "0.79998168889431442"
[[33]]
rgb
"FFFFFF00"
[[34]]
rgb
"FF384C70"
[[35]]
NULL
[[36]]
NULL
[[37]]
NULL
[[38]]
rgb
"FF384C70"
[[39]]
rgb
"FF384C70"
我仍然很困惑为什么只有 39 项。总行数是可变的,但不是 39。我也不理解这个操作 - 是按行还是按列?
library(tidyxl)
formats <- xlsx_formats( "./temp/test_file.xlsx" )
cells <- xlsx_cells( "./temp/test_file.xlsx" )
#what colors are used?
formats$local$fill$patternFill$fgColor$rgb
# [1] NA "FFC00000" "FF00B0F0" NA
#find rows fo cells with red background
cells[ cells$local_format_id %in%
which( formats$local$fill$patternFill$fgColor$rgb == "FFC00000"),
"row" ]
# [1] 1
在您的工作簿对象中,您找到 styleObjects
元素。您可以在那里找到填充颜色 (style$fill$fillFg
) 和 row
元素。遍历样式对象 (lapply
),检查颜色是否为所需颜色(例如红色,“FFFF0000”;x$style$fill$fillFg == "FFFF0000"
,然后获取行索引 (x$rows[1]
).
wb <- loadWorkbook(file = "foo.xlsx")
unlist(lapply(wb$styleObjects, function(x){
x$rows[1][x$style$fill$fillFg == "FFFF0000"]}))
# [1] 3
如果彩色单元格不连续,您可能需要同时抓取行和列:
l = lapply(wb$styleObjects, function(x){
if(x$style$fill$fillFg == "FFFF0000"){
data.frame(ri = x$rows, ci = x$cols, col = "FFFF0000")}})
l[lengths(l) > 0]
# [[1]]
# ri ci col
# 1 1 2 FFFF0000
# 2 2 3 FFFF0000
# 3 3 1 FFFF0000
使用openxlsx包的解决方案
下面的示例查找颜色“FFC000”并查找第 1 列和第 6 列
该方法首先识别哪些定义的样式具有感兴趣的字体颜色,然后查看样式对象以查看这些样式已应用于哪些单元格,返回与颜色匹配的行的索引和 pre-defined 列搜索.结果将给出列搜索中至少一个单元格具有指定颜色的所有行。
excelwb <- openxlsx::loadWorkbook(excel_file)
strikestyles <- getStyles(excelwb)
goldcolors <- which(sapply(strikestyles,'[[','fontColour')=="FFFFC000")
goldcols <- c(1,6) #these are the columns that have the gold color of interest -- could also be 1:ncol
goldrows <- lapply(excelwb$styleObjects[goldcolors],
function(x) {
value_cols <- which(x$cols %in% goldcols)
if (length(value_cols)==0) return(NULL)
else return (x$rows[value_cols])
})
goldrows <- as.numeric(unlist(goldrows))
我有一个很大的 Excel table(18k 行和 400 列),其中一些行使用不同的颜色突出显示。有没有办法使用 openxlsx
按颜色过滤行?
我首先加载了工作簿
wb <- loadWorkbook(file = "Items Comparison.xlsx")
getStyles(wb)
df <- read.xlsx(wb, sheet = 1)
我看到工作簿中使用的样式使用 getStyles(wb)
,但不确定如何使用该信息按颜色过滤每列的所有单元格。
[[1]]
A custom cell style.
Cell formatting: GENERAL
Font name: Tahoma
Font size: 9
Font colour: #FFFFFF
Font decoration: BOLD
Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin
Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A
Cell vert. align: top
Cell fill foreground: rgb: #384C70
Cell fill background: rgb: #384C70
wraptext: TRUE
[[2]]
A custom cell style.
Cell formatting: GENERAL
Font name: Tahoma
Font size: 9
Font colour: #FFFFFF
Font decoration: BOLD
Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin
Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A
Cell vert. align: top
Cell fill foreground: rgb: #384C70
Cell fill background: rgb: #384C70
wraptext: TRUE
如何按填充颜色过滤数据?
更新
根据@Henrik 的解决方案,我尝试使用他的代码,但我一直出错。因此,为了了解发生了什么,我打印了 x$style$fill$fillFg
rgb
"FF384C70"
rgb
"FF384C70"
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
rgb
"FF384C70"
NULL
NULL
NULL
rgb
"FFFFFF00"
rgb
"FFFFFF00"
theme
"0"
theme
"0"
rgb
"FFFFFF00"
NULL
theme
"2"
theme tint
"4" "0.79998168889431442"
theme
"8"
theme
"8"
rgb
"FFFFC000"
rgb
"FFFFC000"
theme tint
"5" "0.39997558519241921"
theme tint
"5" "0.39997558519241921"
theme tint
"9" "0.39997558519241921"
theme tint
"5" "0.79998168889431442"
rgb
"FFFFFF00"
rgb
"FF384C70"
NULL
NULL
NULL
rgb
"FF384C70"
rgb
"FF384C70"
[[1]]
rgb
"FF384C70"
[[2]]
rgb
"FF384C70"
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
[[7]]
NULL
[[8]]
NULL
[[9]]
NULL
[[10]]
NULL
[[11]]
NULL
[[12]]
NULL
[[13]]
rgb
"FF384C70"
[[14]]
NULL
[[15]]
NULL
[[16]]
NULL
[[17]]
rgb
"FFFFFF00"
[[18]]
rgb
"FFFFFF00"
[[19]]
theme
"0"
[[20]]
theme
"0"
[[21]]
rgb
"FFFFFF00"
[[22]]
NULL
[[23]]
theme
"2"
[[24]]
theme tint
"4" "0.79998168889431442"
[[25]]
theme
"8"
[[26]]
theme
"8"
[[27]]
rgb
"FFFFC000"
[[28]]
rgb
"FFFFC000"
[[29]]
theme tint
"5" "0.39997558519241921"
[[30]]
theme tint
"5" "0.39997558519241921"
[[31]]
theme tint
"9" "0.39997558519241921"
[[32]]
theme tint
"5" "0.79998168889431442"
[[33]]
rgb
"FFFFFF00"
[[34]]
rgb
"FF384C70"
[[35]]
NULL
[[36]]
NULL
[[37]]
NULL
[[38]]
rgb
"FF384C70"
[[39]]
rgb
"FF384C70"
我仍然很困惑为什么只有 39 项。总行数是可变的,但不是 39。我也不理解这个操作 - 是按行还是按列?
library(tidyxl)
formats <- xlsx_formats( "./temp/test_file.xlsx" )
cells <- xlsx_cells( "./temp/test_file.xlsx" )
#what colors are used?
formats$local$fill$patternFill$fgColor$rgb
# [1] NA "FFC00000" "FF00B0F0" NA
#find rows fo cells with red background
cells[ cells$local_format_id %in%
which( formats$local$fill$patternFill$fgColor$rgb == "FFC00000"),
"row" ]
# [1] 1
在您的工作簿对象中,您找到 styleObjects
元素。您可以在那里找到填充颜色 (style$fill$fillFg
) 和 row
元素。遍历样式对象 (lapply
),检查颜色是否为所需颜色(例如红色,“FFFF0000”;x$style$fill$fillFg == "FFFF0000"
,然后获取行索引 (x$rows[1]
).
wb <- loadWorkbook(file = "foo.xlsx")
unlist(lapply(wb$styleObjects, function(x){
x$rows[1][x$style$fill$fillFg == "FFFF0000"]}))
# [1] 3
如果彩色单元格不连续,您可能需要同时抓取行和列:
l = lapply(wb$styleObjects, function(x){
if(x$style$fill$fillFg == "FFFF0000"){
data.frame(ri = x$rows, ci = x$cols, col = "FFFF0000")}})
l[lengths(l) > 0]
# [[1]]
# ri ci col
# 1 1 2 FFFF0000
# 2 2 3 FFFF0000
# 3 3 1 FFFF0000
使用openxlsx包的解决方案 下面的示例查找颜色“FFC000”并查找第 1 列和第 6 列 该方法首先识别哪些定义的样式具有感兴趣的字体颜色,然后查看样式对象以查看这些样式已应用于哪些单元格,返回与颜色匹配的行的索引和 pre-defined 列搜索.结果将给出列搜索中至少一个单元格具有指定颜色的所有行。
excelwb <- openxlsx::loadWorkbook(excel_file)
strikestyles <- getStyles(excelwb)
goldcolors <- which(sapply(strikestyles,'[[','fontColour')=="FFFFC000")
goldcols <- c(1,6) #these are the columns that have the gold color of interest -- could also be 1:ncol
goldrows <- lapply(excelwb$styleObjects[goldcolors],
function(x) {
value_cols <- which(x$cols %in% goldcols)
if (length(value_cols)==0) return(NULL)
else return (x$rows[value_cols])
})
goldrows <- as.numeric(unlist(goldrows))