如何根据提供的条件从数据框列表中检索特定值?

How to retrieve a specific value from a list of data frame based on a condition provided?

我有一个数据框列表(下面的示例),其中数据是关于每个州的医院列表。

如上所述,第二个元素有排名变量,因此我尝试调用该元素并匹配指定的排名。我是初学者,我想我对“==”和“=”感到困惑。

 > outcome_split[[2]][, "hospital name"]["rank"==2]
    character(0)
    > outcome_split[[2]][, "hospital name"]["rank"=7]
    [1] "BIBB MEDICAL CENTER"

我想 return 与指定级别相匹配的医院名称,但我不知道该怎么做。如前所述混淆'=='和'='因为'=='returns character(0)而'='returns是第二个元素中的医院名称,但是这return不是基于排名变量而是基于ID值,在第7位,提到的医院存在但不是排名第7.

> outcome_split[[2]][, c("hospital name","rank")]
                                       hospital name rank
1                        ANDALUSIA REGIONAL HOSPITAL   52
2                          ATHENS-LIMESTONE HOSPITAL    9
3                          ATMORE COMMUNITY HOSPITAL   53
4                        BAPTIST MEDICAL CENTER EAST    2
5                       BAPTIST MEDICAL CENTER SOUTH   46
6                   BAPTIST MEDICAL CENTER-PRINCETON    8
7                                BIBB MEDICAL CENTER   54
8                       BIRMINGHAM VA MEDICAL CENTER   26
9                           BROOKWOOD MEDICAL CENTER   30
10                    BRYAN W WHITFIELD MEM HOSP INC   55

示例数据:

outcome_split <- structure(list(AK = structure(list(`hospital name` = c("PROVIDENCE ALASKA MEDICAL CENTER", 
"MAT-SU REGIONAL MEDICAL CENTER", "BARTLETT REGIONAL HOSPITAL", 
"FAIRBANKS MEMORIAL HOSPITAL", "ALASKA REGIONAL HOSPITAL", "YUKON KUSKOKWIM DELTA REG HOSPITAL", 
"CENTRAL PENINSULA GENERAL HOSPITAL", "ALASKA NATIVE MEDICAL CENTER", 
"MT EDGECUMBE HOSPITAL", "PROVIDENCE VALDEZ MEDICAL CENTER", 
"PROVIDENCE SEWARD HOSPITAL", "SITKA COMMUNITY HOSPITAL", "PROVIDENCE KODIAK ISLAND MEDICAL CTR", 
"CORDOVA COMMUNITY MEDICAL CENTER", "NORTON SOUND REGIONAL HOSPITAL", 
"PEACEHEALTH KETCHIKAN MEDICAL             CENTER", "SOUTH PENINSULA HOSPITAL"
), state = c("AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", 
"AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK", "AK"), `heart attack` = c("13.4", 
"17.7", "Not Available", "15.5", "14.5", "Not Available", "Not Available", 
"15.7", "Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available"), `heart failure` = c("12.4", "11.4", "11.6", 
"15.6", "13.4", "11.2", "11.6", "11.6", "Not Available", "Not Available", 
"Not Available", "Not Available", "Not Available", "Not Available", 
"Not Available", "11.4", "10.8"), pneumonia = c("10.5", "12.1", 
"11.6", "13.4", "12.5", "9.7", "13.8", "15.5", "14.2", "Not Available", 
"Not Available", "11.5", "12.0", "Not Available", "11.6", "11.3", 
"12.2")), .Names = c("hospital name", "state", "heart attack", 
"heart failure", "pneumonia"), row.names = 99:115, class = "data.frame"), 
    AL = structure(list(`hospital name` = c("ANDALUSIA REGIONAL HOSPITAL", 
    "ATHENS-LIMESTONE HOSPITAL", "ATMORE COMMUNITY HOSPITAL", 
    "BAPTIST MEDICAL CENTER EAST", "BAPTIST MEDICAL CENTER SOUTH", 
    "BAPTIST MEDICAL CENTER-PRINCETON", "BIBB MEDICAL CENTER", 
    "BIRMINGHAM VA MEDICAL CENTER", "BROOKWOOD MEDICAL CENTER", 
    "BRYAN W WHITFIELD MEM HOSP INC", "BULLOCK COUNTY HOSPITAL", 
    "CALLAHAN EYE FOUNDATION HOSPITAL", "CHEROKEE MEDICAL CENTER", 
    "CHILTON MEDICAL CENTER", "CITIZENS BAPTIST MEDICAL CENTER", 
    "CLAY COUNTY HOSPITAL", "COMMUNITY HOSPITAL INC", "COOPER GREEN MERCY HOSPITAL", 
    "COOSA VALLEY MEDICAL CENTER", "CRENSHAW COMMUNITY HOSPITAL", 
    "CRESTWOOD MEDICAL CENTER", "CULLMAN REGIONAL MEDICAL CENTER", 
    "D C H REGIONAL MEDICAL CENTER", "D W MCMILLAN MEMORIAL HOSPITAL", 
    "DALE MEDICAL CENTER", "DECATUR GENERAL HOSPITAL", "DEKALB REGIONAL MEDICAL CENTER", 
    "EAST ALABAMA MEDICAL CENTER AND SNF", "ELBA GENERAL HOSPITAL", 
    "ELIZA COFFEE MEMORIAL HOSPITAL", "ELMORE COMMUNITY HOSPITAL", 
    "EVERGREEN MEDICAL CENTER", "FAYETTE MEDICAL CENTER", "FLORALA MEMORIAL HOSPITAL", 
    "FLOWERS HOSPITAL", "GADSDEN REGIONAL MEDICAL CENTER", "GEORGE H. LANIER MEMORIAL HOSPITAL", 
    "GEORGIANA HOSPITAL", "GREENE COUNTY HOSPITAL", "GROVE HILL MEMORIAL HOSPITAL", 
    "HALE COUNTY HOSPITAL", "HELEN KELLER MEMORIAL HOSPITAL", 
    "HIGHLANDS MEDICAL CENTER", "HILL HOSPITAL OF SUMTER COUNTY", 
    "HUNTSVILLE HOSPITAL", "INFIRMARY WEST", "J PAUL JONES HOSPITAL", 
    "JACK HUGHSTON MEMORIAL HOSPITAL", "JACKSON HOSPITAL & CLINIC INC", 
    "JACKSON MEDICAL CENTER", "JACKSONVILLE MEDICAL CENTER", 
    "L V STABLER MEMORIAL HOSPITAL", "LAKE MARTIN COMMUNITY HOSPITAL", 
    "LAKELAND COMMUNITY HOSPITAL", "LAWRENCE MEDICAL CENTER", 
    "MARION REGIONAL MEDICAL CENTER", "MARSHALL MEDICAL CENTER NORTH", 
    "MARSHALL MEDICAL CENTER SOUTH", "MEDICAL CENTER BARBOUR", 
    "MEDICAL CENTER ENTERPRISE", "MEDICAL WEST, AN AFFILIATE OF UAB HEALTH SYSTEM", 
    "MIZELL MEMORIAL HOSPITAL", "MOBILE INFIRMARY", "MONROE COUNTY HOSPITAL", 
    "NORTH BALDWIN INFIRMARY", "NORTHEAST ALABAMA REGIONAL MED CENTER", 
    "NORTHWEST MEDICAL CENTER", "PARKWAY MEDICAL CENTER", "PICKENS COUNTY MEDICAL CENTER", 
    "PRATTVILLE BAPTIST HOSPITAL", "PROVIDENCE HOSPITAL", "RED BAY HOSPITAL", 
    "RIVERVIEW REGIONAL MEDICAL CENTER", "RUSSELL HOSPITAL", 
    "RUSSELLVILLE HOSPITAL", "SHELBY BAPTIST MEDICAL CENTER", 
    "SHOALS HOSPITAL", "SOUTH BALDWIN REGIONAL MEDICAL CENTER", 
    "SOUTHEAST ALABAMA MEDICAL CENTER", "SPRINGHILL MEDICAL CENTER", 
    "ST VINCENT'S BIRMINGHAM", "ST VINCENT'S EAST", "ST VINCENT'S ST CLAIR", 
    "ST VINCENTS BLOUNT", "STRINGFELLOW MEMORIAL HOSPITAL", "THOMAS HOSPITAL", 
    "TRINITY MEDICAL CENTER", "TROY REGIONAL MEDICAL CENTER", 
    "TUSCALOOSA VA MEDICAL CENTER", "UNIV OF S AL CHILDREN'S & WOMEN'S HOS", 
    "UNIV OF SOUTH ALABAMA MEDICAL CENTER", "UNIVERSITY OF ALABAMA HOSPITAL", 
    "VA CENTRAL ALABAMA HEALTHCARE SYSTEM - MONTGOMERY", "VAUGHAN REG MED CENTER PARKWAY CAMPUS", 
    "WALKER BAPTIST MEDICAL CENTER", "WASHINGTON COUNTY HOSPITAL", 
    "WEDOWEE HOSPITAL", "WIREGRASS MEDICAL CENTER"), state = c("AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", 
    "AL", "AL", "AL", "AL", "AL", "AL", "AL"), `heart attack` = c("Not Available", 
    "15.0", "Not Available", "14.2", "17.8", "14.9", "Not Available", 
    "16.1", "16.5", "Not Available", "Not Available", "Not Available", 
    "Not Available", "Not Available", "17.3", "16.7", "17.1", 
    "Not Available", "15.2", "Not Available", "13.3", "17.1", 
    "15.8", "15.7", "17.3", "16.8", "18.0", "16.3", "Not Available", 
    "18.1", "Not Available", "Not Available", "16.7", "Not Available", 
    "15.2", "16.7", "15.4", "14.5", "Not Available", "Not Available", 
    "Not Available", "19.6", "15.0", "Not Available", "15.2", 
    "Not Available", "Not Available", "Not Available", "17.5", 
    "Not Available", "Not Available", "Not Available", "Not Available", 
    "Not Available", "15.6", "Not Available", "Not Available", 
    "18.5", "Not Available", "16.6", "15.3", "Not Available", 
    "19.3", "Not Available", "Not Available", "15.6", "Not Available", 
    "15.8", "Not Available", "14.6", "15.2", "Not Available", 
    "16.9", "17.1", "Not Available", "15.9", "Not Available", 
    "15.8", "14.3", "16.0", "16.2", "17.7", "Not Available", 
    "Not Available", "16.4", "14.7", "16.8", "Not Available", 
    "Not Available", "Not Available", "Not Available", "15.0", 
    "Not Available", "14.7", "17.0", "Not Available", "Not Available", 
    "Not Available"), `heart failure` = c("10.1", "11.7", "10.8", 
    "9.6", "11.8", "11.4", "14.0", "10.4", "13.5", "11.7", "12.3", 
    "Not Available", "12.1", "11.5", "14.9", "12.6", "12.3", 
    "Not Available", "11.7", "13.8", "13.8", "12.1", "11.2", 
    "14.8", "11.8", "10.9", "16.6", "12.9", "Not Available", 
    "11.3", "11.3", "9.1", "11.7", "10.4", "12.0", "10.7", "8.8", 
    "10.8", "11.2", "10.4", "10.7", "12.6", "13.4", "Not Available", 
    "12.4", "12.5", "Not Available", "10.8", "10.2", "12.3", 
    "16.4", "11.1", "10.9", "13.6", "9.9", "11.5", "12.5", "15.2", 
    "13.5", "12.9", "11.4", "13.6", "10.7", "13.0", "11.5", "11.2", 
    "11.8", "10.5", "12.6", "14.8", "13.5", "12.6", "10.8", "11.6", 
    "14.8", "13.6", "13.6", "15.1", "11.4", "10.4", "10.6", "10.9", 
    "10.8", "13.0", "12.0", "12.8", "12.9", "11.2", "Not Available", 
    "Not Available", "12.5", "12.5", "12.2", "12.0", "10.8", 
    "Not Available", "10.4", "10.6"), pneumonia = c("11.1", "12.1", 
    "13.0", "10.2", "14.3", "11.6", "13.6", "11.0", "13.0", "9.1", 
    "12.1", "Not Available", "14.7", "11.2", "12.1", "11.8", 
    "11.6", "Not Available", "11.4", "15.8", "10.4", "12.1", 
    "11.3", "12.6", "9.9", "11.9", "15.8", "12.1", "12.0", "13.4", 
    "11.2", "12.0", "12.9", "12.1", "11.3", "14.6", "10.3", "11.3", 
    "11.5", "12.1", "11.5", "15.0", "12.9", "Not Available", 
    "14.1", "13.1", "11.4", "10.9", "14.7", "9.3", "19.2", "13.0", 
    "10.8", "10.7", "9.8", "10.0", "8.7", "13.9", "15.0", "12.9", 
    "12.1", "14.9", "12.5", "15.6", "14.6", "13.2", "13.1", "11.9", 
    "12.4", "14.2", "10.6", "11.6", "12.7", "14.9", "11.5", "10.7", 
    "12.8", "9.8", "10.9", "13.8", "12.6", "16.2", "11.4", "15.3", 
    "12.0", "13.1", "13.9", "11.1", "Not Available", "Not Available", 
    "Not Available", "12.7", "11.3", "14.0", "11.9", "Not Available", 
    "13.9", "12.3"), rank = c(52L, 9L, 53L, 2L, 46L, 8L, 54L, 
    26L, 30L, 55L, 56L, 57L, 58L, 59L, 42L, 32L, 39L, 60L, 12L, 
    61L, 1L, 40L, 21L, 20L, 43L, 35L, 47L, 28L, 62L, 48L, 63L, 
    64L, 33L, 65L, 13L, 34L, 17L, 4L, 66L, 67L, 68L, 51L, 10L, 
    69L, 14L, 70L, 71L, 72L, 44L, 73L, 74L, 75L, 76L, 77L, 18L, 
    78L, 79L, 49L, 80L, 31L, 16L, 81L, 50L, 82L, 83L, 19L, 84L, 
    22L, 85L, 5L, 15L, 86L, 37L, 41L, 87L, 24L, 88L, 23L, 3L, 
    25L, 27L, 45L, 89L, 90L, 29L, 6L, 36L, 91L, 92L, 93L, 94L, 
    11L, 95L, 7L, 38L, 96L, 97L, 98L)), class = "data.frame", .Names = c("hospital name", 
    "state", "heart attack", "heart failure", "pneumonia", "rank"
    ), row.names = c(NA, -98L))), .Names = c("AK", "AL"))

如果您想 select 从第二个列表元素中排名第 2 和第 7,请尝试:

outcome_split[[2]][outcome_split[[2]]$rank == 2, c("hospital name", "rank")]

hospital name rank

4 BAPTIST MEDICAL CENTER EAST 2

outcome_split[[2]][outcome_split[[2]]$rank == 7, c("hospital name", "rank")]

hospital name rank

94 VAUGHAN REG MED CENTER PARKWAY CAMPUS 7

我建议将您的列表折叠成 data.frame,因为这将使过滤更加容易。尝试搜索 dplyr::bind_rowsdo.call("rbind")

您的 rank 栏目顺序不对,请参阅下面我按排名排列的位置。

select'ing 是 one-liner 和 dplyr(或 data.table):

require(dplyr)

output_split[[2]] %>% filter(rank == 2) %>% select('hospital name')

                hospital name
1 BAPTIST MEDICAL CENTER EAST

output_split[[2]] %>% filter(rank == '7') %>% select('hospital name')
                      hospital name
1 VAUGHAN REG MED CENTER PARKWAY CAMPUS

# Here's the hospital order when we arrange by 'rank':
output_split[[2]] %>% arrange(rank) %>% select('hospital name', 'rank') %>% head(7)
                          hospital name rank
1              CRESTWOOD MEDICAL CENTER    1
2           BAPTIST MEDICAL CENTER EAST    2
3      SOUTHEAST ALABAMA MEDICAL CENTER    3
4                    GEORGIANA HOSPITAL    4
5           PRATTVILLE BAPTIST HOSPITAL    5
6                       THOMAS HOSPITAL    6
7 VAUGHAN REG MED CENTER PARKWAY CAMPUS    7

# ... and here was your original order
output_split[[2]] %>% select('hospital name', 'rank') %>% head(7)
                     hospital name rank
1      ANDALUSIA REGIONAL HOSPITAL   52
2        ATHENS-LIMESTONE HOSPITAL    9
3        ATMORE COMMUNITY HOSPITAL   53
4      BAPTIST MEDICAL CENTER EAST    2
5     BAPTIST MEDICAL CENTER SOUTH   46
6 BAPTIST MEDICAL CENTER-PRINCETON    8
7              BIBB MEDICAL CENTER   54

顺便说一下,为了避免麻烦,在列名中使用下划线而不是空格,这样我们就不需要在 'hospital_name' 周围加上引号等

names(os[[2]]) <- gsub(' ', '_', names(os[[2]]))) 重命名它们 "hospital_name" "state" "heart_attack" "heart_failure" "pneumonia" "rank"

或者您可以使用 make.names(),它会破坏字母数字、下划线和点以外的任何字符。如果你想要更好的控制,还有 gsub()。

您可以将 df 列表折叠成一个大 df:

output_split[[1]]$rank <- NA
do.call(function(...) rbind(..., make.row.names=F), output_split)

这样做。现在你的 dplyr select 就是 %>% filter(state=='AL', rank==2) %>% select('hospital name')