如何将函数应用于列的每个单元格?

How do you apply a function to each cell of a column?

我正在使用以下完美运行的函数来解析文本数据,以查找患者病历中动脉狭窄的百分比。

txt <- "Small caliber RCA with 50% proximal and 70% mid stenoses."

coronary_anatomy <- function(x) {
    
    # Check if sentence
    if(!is.character(x)) {stop("Requires character string", call. = FALSE)}
    
    # Establish variables
    epicardial <- c("LM", "LAD", "LCX", "RCA")
    mods <- c("proximal", "mid", "distal", "ostial")
    
    sentence <-
        tibble(line = 1, sentence = x) %>%
        tidytext::unnest_tokens(input = sentence, output = word, to_lower = FALSE) %>%
        pull(word)
    
    # Identify number/locations of disease
    artery <- sentence[which(sentence %in% epicardial)]
    locs <- grep("\d+", sentence) 
    mlocs <- which(sentence %in% mods)
    
    # Find the nearest neighbors to identify which modifier goes with which location
    space <- combn(mlocs, length(locs))
    dist <- apply(space, 2, function(x) {sum(abs(locs - x))})
    matched <- space[, which.min(dist)]
    
    tbl <- 
        tibble(
            anatomy = paste(sentence[matched], artery),
            stenosis = as.numeric(sentence[locs])
        )
    
    # Return
    return(tbl)
}

# Test it out

coronary_anatomy(txt)

Output:

# A tibble: 2 x 2
anatomy      stenosis
<chr>           <dbl>
1 proximal RCA       50
2 mid RCA            70

代码运行良好。但现在我 运行 正在研究更大规模应用它的问题。我想将此代码应用于包含一整列患者病历的数据框。我想要运行函数通过的数据框的简化数据框如下所示。

# A tibble: 2 x 2
PatientID      Records
<chr>           <chr>
1 1234            Small caliber RCA with 50% proximal and 70% mid stenoses
2 1235            Small caliber LCX with 40% proximal and 70% mid stenoses

那么问题来了。我想以某种方式 运行 通过整个记录列使用此功能。但是,运行使用此函数(如上所示)会输出一个小标题,其大小会根据可用于解析的信息量而有所不同。

有没有比我聪明的人知道如何运行这个函数通过包含医疗记录的数据table列中的每个单元格,并以有组织的方式输出它,假设输出是小标题?

如果速度不是问题,您可以使用 lapplypurrr::map 函数(甚至是 for 循环)遍历每一行数据,将每个小标题结果保存在a list,然后将 tibbles 列表组合成一个漂亮的大 tibble 以供使用。例如,

# dplyr and lapply
result_list = lapply(your_data$Records, coronary_anatomy)
names(result_list) = your_data$PatientID
result_tbl = bind_rows(result_list, .id = "PatientID")
result_tbl
# # A tibble: 4 x 3
#   PatientID anatomy      stenosis
#   <chr>     <chr>           <dbl>
# 1 1234      proximal RCA       50
# 2 1234      mid RCA            70
# 3 1235      proximal LCX       40
# 4 1235      mid LCX            70

如果您使用的是 dplyr 1.0 或更高版本,您也可以简单地使用 group_bysummarize:

your_data %>% 
  group_by(PatientID) %>% 
  summarize(coronary_anatomy(Records))
# `summarise()` regrouping output by 'PatientID' (override with `.groups` argument)
# # A tibble: 4 x 3
# # Groups:   PatientID [2]
#   PatientID anatomy      stenosis
#       <int> <chr>           <dbl>
# 1      1234 proximal RCA       50
# 2      1234 mid RCA            70
# 3      1235 proximal LCX       40
# 4      1235 mid LCX            70