如何计算列表中元素的可变重要性的平均值?

how can I calculate the mean of variable importance of elements in the list?

我正在训练随机森林算法三次并将变量的重要性保存到列表中(使用 caret 包)。如果存在,我如何计算每个特征的平均值? 例如,如何计算三个整体“ESR”的平均值? (我要训练这个算法一千次) 这些是我的例子:

[[1]]
rf variable importance


  only 20 most important variables shown (out of 119)

                 Overall
Albumin           100.00
age                97.36
PR                 60.18
RR                 42.41
Weight             35.26
SystolicBP         32.14
Cancers1           29.79
ESR                27.66
Neutrophyl         26.98
CPK                25.68
EjectionFraction   25.59
BMI                24.42
Calcium            23.87
WBC                22.36
Urea               22.01
LDH                21.23
FBS                20.21
Ddimer             19.32
HB                 18.99
Lymphocyte         18.78

[[2]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
FBS                57.80
WBC                53.88
PR                 53.84
Neutrophyl         53.52
Weight             52.31
HB                 51.69
LDH                50.15
Urea               49.31
Albumin            47.05
Lymphocyte         46.87
CPK                46.54
SystolicBP         45.64
Calcium            44.87
ESR                43.54
Ferritin           43.03
CRP                43.00
PLT                42.83
Creatinine         42.53
EjectionFraction   41.43
[[3]]
rf variable importance

  only 20 most important variables shown (out of 119)

                 Overall
age               100.00
Albumin            43.41
Weight             24.88
FBS                24.63
BS                 23.31
PR                 21.47
LDH                21.06
Neutrophyl         20.68
BMI                17.94
EjectionFraction   17.29
CPK                16.49
WBC                16.11
ALP                15.72
RR                 15.28
Lymphocyte         14.94
Cancers1           14.68
CRP                14.50
ESR                14.38
Ddimer             13.05
Ferritin           12.96

我可以创建一个保存要素及其整体的数据框吗? 谢谢你的帮助 这是我的代码:

prediction_value_rf=list()
importance_rf=list()
auc_rf=list()
weight_rf=list()
for ( i in 1:1000){
   resample_death <- death[sample(nrow(death), size=300), ]
   resample_alive <-alive[sample(nrow(alive), size=300), ]
   f_dataset=rbind(resample_alive,resample_death)
   inx <- sample.split(seq_len(nrow(f_dataset)), 0.25)
   trainData<- f_dataset[!inx, ]
   testData <-  f_dataset[inx, ]
   rf_fit <- train(vital_status ~ ., 
                   data = trainData, 
                   method = "rf",
   )
   pred=predict(rf_fit, testData[,-109])
   pred1=predict(rf_fit, testData[,-109],type='prob')
   prediction_value_rf[[i]]=pred1[2]
   auc=auc(testData$vital_status,as.numeric(pred1[[2]]),direction="<", levels = levels(testData$vital_status))
   auc_rf[[i]]=auc
   a=varImp(rf_fit,scale = TRUE)
   importance_rf[[i]] <- a
   weight_rf[[i]]=max(rf_fit$results$Accuracy)
}

最后,我想计算所有整体特征的平均值(想创建集成模型)。 我的数据集包含 109 个特征和 4200 个样本。

> dput(importance_rf)
list(structure(list(importance = structure(list(Overall = c(100, 
32.9191368970689, 0, 29.4889011862606, 24.8664587940577, 21.8746288172869, 
21.7051171149606, 20.0868919191658, 20.3678665772965, 20.2873319598582, 
33.7597621482843, 42.1891066454062, 22.7027798691687, 17.0766042463516, 
39.4559095867264, 17.9431725056776, 23.2881573588367, 5.04721532342669, 
22.3290849893345, 20.7266835722104, 21.5723519894789, 19.5211504808207, 
21.2794742178794, 20.1624361665348, 13.7420140365184, 31.7941409073075, 
20.9409991203303, 30.4229311296897, 11.5187371425859, 12.8487688047673, 
9.40749461290917, 10.361793419014, 32.5677389075859, 26.5411449178312, 
23.3996095888034, 2.84823906954271, 10.0257295515002, 2.27406632480383, 
0.221285401034356, 0.844517489791465, 1.97286969198767, 0.0909347758420391, 
0.541007254389242, 0.359718315763083, 1.26912866459011, 0.158954429130366, 
0.245159217854806, 1.43768928047267, 0.796627703857018, 0.0731764363395144, 
1.72357935713514, 0.424562470997031, 3.38312715168264, 1.88770244332681, 
0.0314985706869475, 0, 0.65427952713802, 0, 0.0171557103229226, 
0.709743254593806, 1.13539938842206, 0.0367104133426984, 2.95211595985093, 
0, 0.582868854914444, 0.393813676879418, 1.15732422255054, 2.24940561099934, 
1.73472209382337, 1.34428847541862, 1.15486784386305, 0, 0.689216959226089, 
0.625678629482648, 1.81161997423301, 0.433030827900777, 10.9106578268112, 
2.24295278032112, 18.176936900799, 1.74711580562318, 1.45310012173878, 
0.952143653091356, 1.16652405720194, 1.11866015943186, 2.68527336222893, 
1.12853921993574, 5.10727247259446, 1.93994049536545, 1.36475795626174, 
2.95717137358439, 0.115367165512589, 0, 1.45815337045876, 0, 
1.78943634306828, 5.71749991297189, 2.43536004133198, 1.27231795918686, 
11.4771984230702, 3.0971032186365, 0.708058471655881, 0.170261025718881, 
3.37435307537382, 1.56044494248123, 1.09294450754124, 0, 2.25592933845801, 
2.30276525800757, 1.86149986210819, 1.46145976307003, 1.26858067553346, 
2.11041986636824, 0.0902116364175813, 1.54299863875175, 0, 0.269632340125967, 
1.88548693593634, 4.47233507072462, 0.66752451890319)), class = "data.frame", row.names = c("age", 
"Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
"ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
"Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", "Ferritin", 
"HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", 
"PotassiumK", "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
"TotalLungInvolvementRank", "TotalLungInvolvementPercent", "sex2", 
"Type.of.heart.disease1", "Type.of.heart.disease2", "Type.of.heart.disease9", 
"Unilateral.paralysis1", "Ulcers1", "Obesity.BMI.above.351", 
"Peripheral.artery.disease1", "organ.involment.from.diabetes1", 
"organ.involment.from.diabetes2", "organ.involment.from.diabetes3", 
"UsingDrugHistory1", "UsingAlcoholHistory1", "Transplantation1", 
"SeverityofKidneyDisease1", "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
"SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
"SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
"SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
"Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
"KidneyTransplantation1", "Immunedeficiencydisease1", "Hypothyroidism1", 
"Hypertention1", "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
"HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
"Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
"Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
"WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
"Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
"Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
"Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
"LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
"Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", "Dyspnea1", 
"DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", 
"Body_Pain1", "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
"PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"), 
    structure(list(importance = structure(list(Overall = c(100, 
    36.8463357663146, 0, 20.5921448468941, 35.0980630859042, 
    15.7098956910968, 27.5542325637653, 22.3935810225052, 25.6062709809081, 
    18.9072078537409, 30.5428709528983, 26.4061314161858, 27.2933977255992, 
    18.3744993875278, 57.5115149169245, 14.4361277134982, 49.9265957132235, 
    6.10831602661626, 28.2527379885906, 23.0147565449908, 32.7997892888894, 
    22.7055707536584, 36.9763807158356, 28.9941599048441, 17.8186386653819, 
    31.2682240107287, 26.2894098494535, 41.1751827476675, 22.6316241605114, 
    16.9314172346857, 14.4927913128733, 13.1792980470757, 44.2836496383372, 
    32.7246002717468, 30.3912750391576, 10.0409713536124, 9.83444013035946, 
    2.50470824612248, 1.72055335723373, 1.05083165735798, 1.56193393834476, 
    0.233521622728958, 1.08064736921506, 0.555709266569136, 2.40106539585553, 
    0.291833555475466, 0.380999891346632, 2.56592221397732, 1.62107348934456, 
    0.504647559430998, 1.19859835755469, 0, 1.4382135880929, 
    1.94514657535966, 0, 0.0569205442253742, 0.44589056596685, 
    0.0539230755197555, 0, 0.055077983652405, 1.24527213390211, 
    0, 1.36267778294481, 0.151259347248717, 0.499919817645286, 
    0, 2.79981213016671, 2.72663427247346, 1.93725253183476, 
    2.70715099933653, 1.99722906280419, 0, 0.111342938271961, 
    1.2426657762317, 2.15186257620788, 0.584084013981451, 9.87542370836023, 
    3.21493418783175, 14.6556614893423, 0.67462103889104, 0.787088521176588, 
    2.61946726039402, 2.8099384934716, 0.377053883833586, 2.2824838493133, 
    1.12217532020233, 3.44210364347885, 2.61343827037804, 9.58864870521531, 
    1.77823199575717, 0, 0, 0.828679129518211, 0, 2.73842874693014, 
    14.5506870851474, 0.390367251047195, 0.811902694072225, 15.5803912323052, 
    4.18258978600944, 2.13546475796113, 2.66088800284236, 2.97761832225233, 
    3.54039994200135, 2.44519084017892, 0.737528372419208, 2.20708600548186, 
    4.12502178170407, 3.1835668678093, 7.61195991815971, 2.35303302862437, 
    5.70342032074721, 0.409606955773683, 2.4977310780031, 0.0107020031498121, 
    0.268000372472171, 2.32396173268619, 1.64515893404575, 0.868523484401606
    )), class = "data.frame", row.names = c("age", "Weight", 
    "HookhConsumption", "BMI", "SystolicBP", "RR", "DiastolicBP", 
    "ALP", "ALT", "AST", "Albumin", "BS", "CPK", "CRP", "Calcium", 
    "Creatinine", "Ddimer", "Directbilirubin", "ESR", "FBS", 
    "Ferritin", "HB", "LDH", "Lymphocyte", "Mg", "Neutrophyl", 
    "PLT", "PR", "PhosphorP", "PotassiumK", "SodiumNA", "Totalbilirubin", 
    "Urea", "WBC", "EjectionFraction", "TotalLungInvolvementRank", 
    "TotalLungInvolvementPercent", "sex2", "Type.of.heart.disease1", 
    "Type.of.heart.disease2", "Type.of.heart.disease9", "Unilateral.paralysis1", 
    "Ulcers1", "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
    "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
    "organ.involment.from.diabetes3", "UsingDrugHistory1", "UsingAlcoholHistory1", 
    "Transplantation1", "SeverityofKidneyDisease1", "SeverityofKidneyDisease2", 
    "SeverityofKidneyDisease3", "SeverityChronicliverdisease1", 
    "SeverityChronicliverdisease2", "SeverityChronicliverdisease3", 
    "SeverityChronicliverdisease4", "SeverityChronicliverdisease9", 
    "Schizophrenia1", "Rheumatologicaldiseases1", "Pregnant1", 
    "Neurologicaldiseases1", "LiverTransplantation1", "KidneyTransplantation1", 
    "Immunedeficiencydisease1", "Hypothyroidism1", "Hypertention1", 
    "Hyperlipidemia1", "Historyofsmoking1", "HistoryofHookah1", 
    "HeartTransplantation1", "HIV1", "FattyLiver1", "Diabetes1", 
    "Chronicliverdisease1", "Chronickidneydisease1", "CardiovascularDisease1", 
    "Cancers1", "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
    "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
    "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
    "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
    "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
    "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", "Headace1", 
    "Fever1", "Fatigue1", "EyeConjunctivitis1", "Epigastric1", 
    "Dyspnea1", "DryCough1", "Dizziness1", "Diarrhea1", "Chestpain1", 
    "CardiacArrhythmia1", "Body_Pain1", "Bleeding1", "Ataxia1", 
    "Anorexia1", "PCRCOVID19Test1", "PCRCOVID19Test2")), model = "rf", 
        calledFrom = "varImp"), class = "varImp.train"), structure(list(
        importance = structure(list(Overall = c(100, 36.4519408382731, 
        0.0121282468302786, 27.9982404793903, 19.4487163883379, 
        24.6079653972917, 14.1539998143239, 18.684018340339, 
        20.1182663550791, 17.4200861293186, 46.6309831468223, 
        52.2217679510578, 28.5910698857479, 16.845796014194, 
        31.6509235655573, 17.1000574614637, 27.8424176478161, 
        5.69845064904499, 21.3838903337718, 20.217605303817, 
        19.8702958841878, 22.3737582989512, 33.0788664305301, 
        20.6035947546629, 16.3220426343042, 23.4809287675538, 
        23.1749036748423, 57.122094059206, 12.2409421568247, 
        11.234114301956, 15.7946508155502, 8.80563729211453, 
        20.2205078755919, 20.3091908316546, 27.7497357152039, 
        3.8622908315769, 12.8894291926347, 5.96701805516155, 
        0.761922263853243, 1.41991036581607, 1.54560737492769, 
        0.825161722105208, 0.0172016746252156, 0.693982409239905, 
        0, 0.358366468201754, 1.74812586771487, 2.2746344067366, 
        0.745595100629448, 0.465199425668223, 0.408092232849501, 
        0.115358703965213, 0.0358338604150282, 2.88640197248697, 
        0, 0.288302498762889, 0.332551323637155, 0.0121282468302786, 
        0, 1.03515126482736, 1.1213600137207, 0.329413397366096, 
        2.0612368962315, 0, 0.610994615626186, 1.0215655608971, 
        3.90651448858199, 1.73374217783332, 1.47244358073369, 
        2.20534241559288, 0.173681720638885, 0, 0.631950099628902, 
        0.132328128708788, 2.92435478031454, 1.03537122788376, 
        4.74067414123091, 1.77981701502525, 13.1150432121738, 
        0.720556880972878, 1.20366662244445, 1.19169376389038, 
        1.86442992849398, 0.518200723424615, 2.278501378269, 
        1.23638371282217, 3.66947066761794, 2.03933409738165, 
        1.25289331603719, 1.01627904400807, 0.0324453169731015, 
        0, 2.29817177168672, 0, 1.53194610140319, 7.15322639329996, 
        0.759542631415349, 1.53353473284619, 4.77390474517756, 
        1.05656481042379, 0.699450154375729, 1.16224285818854, 
        3.65223350861514, 1.93274707207956, 1.57589588221639, 
        0.449432695377871, 1.36863730886437, 2.11275137384133, 
        3.29450357362525, 1.08676677214028, 2.18565092410049, 
        1.15456248328987, 0.492245547306216, 1.59592156033113, 
        0.0129367966189638, 0.514499765305734, 1.58591810753971, 
        1.84832826238423, 0.807564130566264)), class = "data.frame", row.names = c("age", 
        "Weight", "HookhConsumption", "BMI", "SystolicBP", "RR", 
        "DiastolicBP", "ALP", "ALT", "AST", "Albumin", "BS", 
        "CPK", "CRP", "Calcium", "Creatinine", "Ddimer", "Directbilirubin", 
        "ESR", "FBS", "Ferritin", "HB", "LDH", "Lymphocyte", 
        "Mg", "Neutrophyl", "PLT", "PR", "PhosphorP", "PotassiumK", 
        "SodiumNA", "Totalbilirubin", "Urea", "WBC", "EjectionFraction", 
        "TotalLungInvolvementRank", "TotalLungInvolvementPercent", 
        "sex2", "Type.of.heart.disease1", "Type.of.heart.disease2", 
        "Type.of.heart.disease9", "Unilateral.paralysis1", "Ulcers1", 
        "Obesity.BMI.above.351", "Peripheral.artery.disease1", 
        "organ.involment.from.diabetes1", "organ.involment.from.diabetes2", 
        "organ.involment.from.diabetes3", "UsingDrugHistory1", 
        "UsingAlcoholHistory1", "Transplantation1", "SeverityofKidneyDisease1", 
        "SeverityofKidneyDisease2", "SeverityofKidneyDisease3", 
        "SeverityChronicliverdisease1", "SeverityChronicliverdisease2", 
        "SeverityChronicliverdisease3", "SeverityChronicliverdisease4", 
        "SeverityChronicliverdisease9", "Schizophrenia1", "Rheumatologicaldiseases1", 
        "Pregnant1", "Neurologicaldiseases1", "LiverTransplantation1", 
        "KidneyTransplantation1", "Immunedeficiencydisease1", 
        "Hypothyroidism1", "Hypertention1", "Hyperlipidemia1", 
        "Historyofsmoking1", "HistoryofHookah1", "HeartTransplantation1", 
        "HIV1", "FattyLiver1", "Diabetes1", "Chronicliverdisease1", 
        "Chronickidneydisease1", "CardiovascularDisease1", "Cancers1", 
        "CVAStrokeCVDTIA1", "COPD1", "Asthma1", "WetCough1", 
        "WeightLoss1", "WeaknessandLethargy1", "Vomit1", "Trembling1", 
        "Sweating1", "Sputum1", "Sorethroat1", "SkinRush1", "Rush1", 
        "Rhinorrhea1", "PharynxExoda1", "Nausea1", "Muscle_Painmyalgia1", 
        "Lossofsenseoftaste1", "Lossofsenseofsmell1", "LossofConsciousness1", 
        "LimbEdema1", "Jointpain_Arthralgia1", "Hemoptysis1", 
        "Headace1", "Fever1", "Fatigue1", "EyeConjunctivitis1", 
        "Epigastric1", "Dyspnea1", "DryCough1", "Dizziness1", 
        "Diarrhea1", "Chestpain1", "CardiacArrhythmia1", "Body_Pain1", 
        "Bleeding1", "Ataxia1", "Anorexia1", "PCRCOVID19Test1", 
        "PCRCOVID19Test2")), model = "rf", calledFrom = "varImp"), class = "varImp.train"))

这部分:

how can I calculate the mean of each feature if it exists? for example, how can I calculate the mean of three overall "ESR"?

因为你已经生成了列表,所以你可以创建一个函数,选择包含特征名称的行,然后将这个函数应用于列表的每个元素,然后将其展平,然后计算均值.如果在某些元素中该特征不存在,可以使用 na.rm.

将其排除在均值计算之外

例如,这类似于您的列表:

mylist <- list(structure(list(Overall = c(100, 97.36, 60.18, 42.41, 35.26, 
32.14, 29.79, 27.66, 26.98, 25.68, 25.59, 24.42, 23.87, 22.36, 
22.01, 21.23, 20.21, 19.32, 18.99, 18.78)), class = "data.frame", row.names = c("Albumin", 
"age", "PR", "RR", "Weight", "SystolicBP", "Cancers1", "ESR", 
"Neutrophyl", "CPK", "EjectionFraction", "BMI", "Calcium", "WBC", 
"Urea", "LDH", "FBS", "Ddimer", "HB", "Lymphocyte")), structure(list(
    Overall = c(100, 57.8, 53.88, 53.84, 53.52, 52.31, 51.69, 
    50.15, 49.31, 47.05, 46.87, 46.54, 45.64, 44.87, 43.54, 43.03, 
    43, 42.83, 42.53, 41.43)), class = "data.frame", row.names = c("age", 
"FBS", "WBC", "PR", "Neutrophyl", "Weight", "HB", "LDH", "Urea", 
"Albumin", "Lymphocyte", "CPK", "SystolicBP", "Calcium", "ESR", 
"Ferritin", "CRP", "PLT", "Creatinine", "EjectionFraction")), 
    structure(list(Overall = c(100, 43.41, 24.88, 24.63, 23.31, 
    21.47, 21.06, 20.68, 17.94, 17.29, 16.49, 16.11, 15.72, 15.28, 
    14.94, 14.68, 14.5, 14.38, 13.05, 12.96)), class = "data.frame", row.names = c("age", 
    "Albumin", "Weight", "FBS", "BS", "PR", "LDH", "Neutrophyl", 
    "BMI", "EjectionFraction", "CPK", "WBC", "ALP", "RR", "Lymphocyte", 
    "Cancers1", "CRP", "ESR", "Ddimer", "Ferritin")))

以下是如何计算 ESR 的平均值,它存在于所有元素中,而 CRP 不存在于其中一个元素中:

mylist |> lapply(function(dat) dat["ESR", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.52667

mylist |> lapply(function(dat) dat["CRP", "Overall"]) |> unlist() |> mean(na.rm = TRUE)
#[1] 28.75

因为您有很多特征,您可以创建另一个函数来将此步骤应用于每个特征。例如:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")
feature_mean <- function(feature_name){
    out <- lapply(mylist, function(dat) dat[feature_name, "Overall"])|> 
        unlist() |> mean(na.rm = TRUE) |> 
        setNames(paste0("mean_",feature_name))
    return(out)
     }

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#28.52667 28.75000 29.57000 30.78333 30.81333 

编辑

上一示例中使用的合成数据 mylist 在其每个元素中仅包含一个“整体”数据框对象,因此可以将特征提取直接应用于数据lapply。但是,您在更新后的问题中提供的实际数据 importance_rf 在其每个元素中都有多个对象,“总体”数据框对象位于第一个元素中。不同之处在于您在评论中显示的错误原因。要应用提取,应首先使用 lapply(function(list) list[[1]]) 提取“整体”数据帧,然后应用前面的步骤。

# Extract mean ESR 
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["ESR", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 23.98857

# Extract mean CRP
importance_rf |> 
 lapply(function(list) list[[1]]) |> 
 lapply(function(dat) dat["CRP", "Overall"]) |> 
 unlist() |> 
 mean(na.rm = TRUE)
#[1] 17.4323

一个{base R}方式

可以将前面的步骤应用于特征向量,如下所示:

features <- c("ESR", "CRP", "CPK", "WBC", "LDH")

feature_mean <- function(feature_name){
     out <- importance_rf |> 
         lapply(function(list) list[[1]]) |>
         lapply(function(dat) dat[feature_name, "Overall"])|> 
         unlist() |> mean(na.rm = TRUE) |> 
         setNames(paste0("mean_",feature_name))
     return(out)
}

# Extract the mean values

features |> lapply(feature_mean) |> unlist()

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491 

关于代码的简单说明:

  • lapply(function(list) list[[1]])提取important_rf列表中每个元素的第一个元素,即包含特征数据的数据框。
  • dat[feature_name, "Overall"] 在每个提取的数据帧中提取目标特征的值 feature_name。每一步只从每个数据帧中提取一个特征。
  • unlist() 将提取特征的数据结构从列表转换为数值向量。
  • setNames 为数值向量创建名称,以便于识别正在计算均值的特征。

这样使用的函数都属于base R类。 您无需安装任何外部包即可获取它们。 另一种选择是将基本 R 函数与 purrr 包中的其他函数组合使用。

一个{purrr}方式

library(purrr)

importance_rf |> 
  map(pluck(1,1)) |> 
  map(function(dat) set_names(dat[features,], features)) |>
  as.data.frame() |> 
  rowMeans() |> 
  set_names(paste0("mean_", features))

#mean_ESR mean_CRP mean_CPK mean_WBC mean_LDH 
#23.98857 17.43230 26.19575 26.52498 30.44491

这些步骤比上面 base R 中的步骤短得多,但每个步骤中所做的事情可能不太明显。

请注意,maplapply 类似,pluck(x,1,1)x[[1]][[1]] 等价。

关于代码的简单说明:

  • map(pluck(1,1)) 提取数据帧,与上面的 lapply(function(list) list[[1]]) 类似。
  • map(function(dat) set_names(dat[features,], features))提取特征列表,类似于上面的dat[feature_name, "Overall"]

有区别:

在上面的base R方式中,从所有数据帧中提取每个特征,然后计算平均值,然后以相同的方式提取另一个特征。

在这种purrr方式中,从列表中的每个数据框中提取所有目标特征,然后使用as.data.frame将这些特征组合成一个新的数据框,这样每一行代表一个特征.然后,rowMeans用于计算特征所有值的平均值。

请注意,您可以在 |> 管道之前检查每个步骤的结果。例如,importance_rf 将显示每个元素中的所有对象。 importance_rf |> map(pluck(1,1)) 将仅显示数据框对象。

包含加权均值的更新

这是一个简单的示例,说明如何计算列表中每个功能的加权平均值。假设你有这个列表:

some.list <- list(L1 = c(a = 2, b = 4, c = 7), 
                  L2 = c(a = 5, b = 5, c = 2), 
                  L3 = c(a = 3, b = 3, c = 6))
some.list
$L1
a b c 
2 4 7 

$L2
a b c 
5 5 2 

$L3
a b c 
3 3 6 

假设列表中的 L1、L2 和 L3 具有以下权重值:

weight <- c(w.L1 = 0.5, w.L2=0.6, w.L3 = 0.9)
weight
w.L1 w.L2 w.L3 
 0.5  0.6  0.9 

计算a的加权均值,例如需要这样计算:

您可以通过将列表中 a 的每个值乘以相应的归一化权重来获得此值。在这种情况下,w1 的归一化权重为 w1/(w1+w2+w3).

要在 R 中执行这些步骤:

norm.weight <- weight/sum(weight)
norm.weight
w.L1 w.L2 w.L3 
0.25 0.30 0.45 

# weighted means of a,b, and c
some.list |> map2(norm.weight, `*`) |> as.data.frame() |> rowSums()
   a    b    c 
3.35 3.85 5.05 

将这些模拟 weight 值应用于您的 importance_rf 列表和示例中的 features,我们得到:

importance_rf |> 
    map(pluck(1,1)) |> 
    map(function(dat) set_names(dat[features,], features)) |>
    map2(norm.weight, `*`) |> 
    as.data.frame() |> 
    rowSums()
    
     ESR      CRP      CPK      WBC      LDH 
23.68084 17.36211 26.72970 25.59180 31.29827