为什么 rownames() 不能在 R 中处理我的数据框？

Question

我想用我的数据框某些行的行名创建一个向量，但我总是失败，我觉得我明显遗漏了一些东西。我的数据框非常大，但我创建了一个示例，它给了我完全相同的问题。

resmakeup <- data.frame("example" = c(4, -3, 2, 1), 
                         row.names = c("number1", "number2", "number3", "number4")
                        )
selection <- rownames(resmakeup[abs(resmakeup$example) >= 2,])

所以，如果我的 table 看起来像这样：

        example
number1       4
number2      -3
number3       2
number4       1

我希望“选择”向量包含数字 1、数字 2 和数字 3，但这不起作用。相反，我得到一个空向量。我检查了数据框是否有带有 has_rownames() 的行名，这是真的。此外，我检查了我的选择 resmakeup[abs(resmakeup$example) >= 2,] 是否有效，它确实有效。

我做错了什么，我该如何解决？

sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252
[4] LC_NUMERIC=C                       LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] writexl_1.3.1               forcats_0.5.0               stringr_1.4.0               purrr_0.3.4                
 [5] readr_1.4.0                 tidyr_1.1.2                 tibble_3.0.4                tidyverse_1.3.0            
 [9] RColorBrewer_1.1-2          readxl_1.3.1                pheatmap_1.0.12             ggthemes_4.2.4             
[13] ggrepel_0.9.1               ggplot2_3.3.3               GEOquery_2.58.0             edgeR_3.32.1               
[17] limma_3.46.0                dplyr_1.0.2                 DESeq2_1.30.0               SummarizedExperiment_1.20.0
[21] Biobase_2.50.0              MatrixGenerics_1.2.0        matrixStats_0.57.0          GenomicRanges_1.42.0       
[25] GenomeInfoDb_1.26.2         IRanges_2.24.1              S4Vectors_0.28.1            BiocGenerics_0.36.0        
[29] ashr_2.2-47                

loaded via a namespace (and not attached):
 [1] fs_1.5.0               bitops_1.0-6           lubridate_1.7.9.2      bit64_4.0.5            httr_1.4.2            
 [6] tools_4.0.2            backports_1.2.1        R6_2.5.0               irlba_2.3.3            DBI_1.1.1             
[11] colorspace_2.0-0       withr_2.4.0            tidyselect_1.1.0       bit_4.0.4              compiler_4.0.2        
[16] cli_2.2.0              rvest_0.3.6            xml2_1.3.2             DelayedArray_0.16.0    labeling_0.4.2        
[21] scales_1.1.1           SQUAREM_2021.1         genefilter_1.72.0      mixsqp_0.3-43          digest_0.6.27         
[26] XVector_0.30.0         pkgconfig_2.0.3        dbplyr_2.0.0           invgamma_1.1           rlang_0.4.10          
[31] rstudioapi_0.13        RSQLite_2.2.1          farver_2.0.3           generics_0.1.0         jsonlite_1.7.2        
[36] BiocParallel_1.24.1    RCurl_1.98-1.2         magrittr_2.0.1         GenomeInfoDbData_1.2.4 Matrix_1.2-18         
[41] fansi_0.4.2            Rcpp_1.0.5             munsell_0.5.0          lifecycle_0.2.0        stringi_1.5.3         
[46] zlibbioc_1.36.0        grid_4.0.2             blob_1.2.1             crayon_1.3.4           lattice_0.20-41       
[51] haven_2.3.1            splines_4.0.2          annotate_1.68.0        hms_1.0.0              locfit_1.5-9.4        
[56] pillar_1.4.7           geneplotter_1.68.0     reprex_0.3.0           XML_3.99-0.5           glue_1.4.2            
[61] modelr_0.1.8           vctrs_0.3.6            cellranger_1.1.0       gtable_0.3.0           assertthat_0.2.1      
[66] xfun_0.20              xtable_1.8-4           broom_0.7.3            survival_3.1-12        truncnorm_1.0-8       
[71] tinytex_0.29           AnnotationDbi_1.52.0   memoise_1.1.0          ellipsis_0.3.1

Answer 1

这是子集 a data.frame 的问题（参见 this help file for more information）。您需要在数据中指定 drop = FALSE：

rownames(resmakeup[abs(resmakeup$example) >= 2,,drop = FALSE])
# [1] "number1" "number2" "number3"

如果您检查运行 resmakeup[abs(resmakeup$example) >= 2,] returns，您会注意到它返回的是向量而不是 data.frame（强制到尽可能低的维度）。使用 drop = FALSE 将在子集化后保留 data.frame 类型。

Answer 2

当您运行遇到问题时，开始从外向内执行表达式，找出哪里开始出错。

rownames(resmakeup[abs(resmakeup$example) >= 2,])
# NULL
resmakeup[abs(resmakeup$example) >= 2,]
# [1]  4 -3  2

好的，您无法从 integer 向量中获取行名称。

这里的罪魁祸首是当您 select 下降到一列或一行时，R 的默认行为会降低 data.frame 的维度。（仅供参考，dplyr 和 data.table 都选择不遵循这种令人沮丧的行为。）您可以使用 drop=FALSE.[=23= 来解决这个问题]

resmakeup[abs(resmakeup$example) >= 2,, drop = FALSE]
#         example
# number1       4
# number2      -3
# number3       2

因此

rownames(resmakeup[abs(resmakeup$example) >= 2,, drop = FALSE])
# [1] "number1" "number2" "number3"

我将借此机会对一个基本的 R 函数进行 soap-box，这也使它更容易阅读，并且它不会展示 drop=“功能”：subset。

rownames(subset(resmakeup, abs(example) >= 2))
# [1] "number1" "number2" "number3"

它使用非标准评估（即能够使用没有 df$ 前导的列名，如 example 中）使阅读更简单，而且它永远不会下降。

为什么 rownames() 不能在 R 中处理我的数据框？

Why isn't rownames() working on my dataframe in R?

r

rows

dataframe