使用 openxlsx 按单元格填充颜色过滤 Excel 中突出显示的数据

Filter data highlighted in Excel by cell fill color using openxlsx

我有一个很大的 Excel table(18k 行和 400 列),其中一些行使用不同的颜色突出显示。有没有办法使用 openxlsx 按颜色过滤行?

我首先加载了工作簿

wb <- loadWorkbook(file = "Items Comparison.xlsx")
getStyles(wb)
df <- read.xlsx(wb, sheet = 1)

我看到工作簿中使用的样式使用 getStyles(wb),但不确定如何使用该信息按颜色过滤每列的所有单元格。

[[1]]
A custom cell style. 

 Cell formatting: GENERAL 
 Font name: Tahoma 
 Font size: 9 
 Font colour: #FFFFFF 
 Font decoration: BOLD 
 Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin 
 Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A 
 Cell vert. align: top 
 Cell fill foreground:  rgb: #384C70 
 Cell fill background:  rgb: #384C70 
 wraptext: TRUE 


[[2]]
A custom cell style. 

 Cell formatting: GENERAL 
 Font name: Tahoma 
 Font size: 9 
 Font colour: #FFFFFF 
 Font decoration: BOLD 
 Cell borders: Top: thin, Bottom: thin, Left: thin, Right: thin 
 Cell border colours: #4E648A, #4E648A, #4E648A, #4E648A 
 Cell vert. align: top 
 Cell fill foreground:  rgb: #384C70 
 Cell fill background:  rgb: #384C70 
 wraptext: TRUE 

如何按填充颜色过滤数据?

更新

根据@Henrik 的解决方案,我尝试使用他的代码,但我一直出错。因此,为了了解发生了什么,我打印了 x$style$fill$fillFg

的输出
       rgb 
"FF384C70" 
       rgb 
"FF384C70" 
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
       rgb 
"FF384C70" 
NULL
NULL
NULL
       rgb 
"FFFFFF00" 
       rgb 
"FFFFFF00" 
 theme 
   "0" 
 theme 
   "0" 
       rgb 
"FFFFFF00" 
NULL
 theme 
   "2" 
                theme                  tint 
                  "4" "0.79998168889431442" 
 theme 
   "8" 
 theme 
   "8" 
       rgb 
"FFFFC000" 
       rgb 
"FFFFC000" 
                theme                  tint 
                  "5" "0.39997558519241921" 
                theme                  tint 
                  "5" "0.39997558519241921" 
                theme                  tint 
                  "9" "0.39997558519241921" 
                theme                  tint 
                  "5" "0.79998168889431442" 
       rgb 
"FFFFFF00" 
       rgb 
"FF384C70" 
NULL
NULL
NULL
       rgb 
"FF384C70" 
       rgb 
"FF384C70" 
[[1]]
       rgb 
"FF384C70" 

[[2]]
       rgb 
"FF384C70" 

[[3]]
NULL

[[4]]
NULL

[[5]]
NULL

[[6]]
NULL

[[7]]
NULL

[[8]]
NULL

[[9]]
NULL

[[10]]
NULL

[[11]]
NULL

[[12]]
NULL

[[13]]
       rgb 
"FF384C70" 

[[14]]
NULL

[[15]]
NULL

[[16]]
NULL

[[17]]
       rgb 
"FFFFFF00" 

[[18]]
       rgb 
"FFFFFF00" 

[[19]]
 theme 
   "0" 

[[20]]
 theme 
   "0" 

[[21]]
       rgb 
"FFFFFF00" 

[[22]]
NULL

[[23]]
 theme 
   "2" 

[[24]]
                theme                  tint 
                  "4" "0.79998168889431442" 

[[25]]
 theme 
   "8" 

[[26]]
 theme 
   "8" 

[[27]]
       rgb 
"FFFFC000" 

[[28]]
       rgb 
"FFFFC000" 

[[29]]
                theme                  tint 
                  "5" "0.39997558519241921" 

[[30]]
                theme                  tint 
                  "5" "0.39997558519241921" 

[[31]]
                theme                  tint 
                  "9" "0.39997558519241921" 

[[32]]
                theme                  tint 
                  "5" "0.79998168889431442" 

[[33]]
       rgb 
"FFFFFF00" 

[[34]]
       rgb 
"FF384C70" 

[[35]]
NULL

[[36]]
NULL

[[37]]
NULL

[[38]]
       rgb 
"FF384C70" 

[[39]]
       rgb 
"FF384C70" 

我仍然很困惑为什么只有 39 项。总行数是可变的,但不是 39。我也不理解这个操作 - 是按行还是按列?

library(tidyxl)

formats <- xlsx_formats( "./temp/test_file.xlsx" )
cells <- xlsx_cells( "./temp/test_file.xlsx" )

#what colors are used?
formats$local$fill$patternFill$fgColor$rgb
# [1] NA         "FFC00000" "FF00B0F0" NA  

#find rows fo cells  with red background
cells[ cells$local_format_id %in%
         which( formats$local$fill$patternFill$fgColor$rgb == "FFC00000"), 
       "row" ]

# [1] 1

在您的工作簿对象中,您找到 styleObjects 元素。您可以在那里找到填充颜色 (style$fill$fillFg) 和 row 元素。遍历样式对象 (lapply),检查颜色是否为所需颜色(例如红色,“FFFF0000”;x$style$fill$fillFg == "FFFF0000",然后获取行索引 (x$rows[1]).

wb <- loadWorkbook(file = "foo.xlsx")
unlist(lapply(wb$styleObjects, function(x){
  x$rows[1][x$style$fill$fillFg == "FFFF0000"]}))

# [1] 3

如果彩色单元格不连续,您可能需要同时抓取行和列:

l = lapply(wb$styleObjects, function(x){
  if(x$style$fill$fillFg == "FFFF0000"){
    data.frame(ri = x$rows, ci = x$cols, col = "FFFF0000")}})
l[lengths(l) > 0]

# [[1]]
#   ri ci      col
# 1  1  2 FFFF0000
# 2  2  3 FFFF0000
# 3  3  1 FFFF0000

使用openxlsx包的解决方案 下面的示例查找颜色“FFC000”并查找第 1 列和第 6 列 该方法首先识别哪些定义的样式具有感兴趣的字体颜色,然后查看样式对象以查看这些样式已应用于哪些单元格,返回与颜色匹配的行的索引和 pre-defined 列搜索.结果将给出列搜索中至少一个单元格具有指定颜色的所有行。

excelwb <- openxlsx::loadWorkbook(excel_file)
strikestyles <- getStyles(excelwb)
goldcolors <- which(sapply(strikestyles,'[[','fontColour')=="FFFFC000") 
goldcols <- c(1,6) #these are the columns that have the gold color of interest -- could also be 1:ncol
goldrows <- lapply(excelwb$styleObjects[goldcolors],
                     function(x) {
                       value_cols <- which(x$cols %in% goldcols) 
                       if (length(value_cols)==0) return(NULL)
                       else return (x$rows[value_cols])
                     })
goldrows <- as.numeric(unlist(goldrows))