R：如何修改 CreateTable() 函数以重复观察并使用错误索引？

Question

我正在尝试在以下数据集上创建一个 table，我在这里报告的是前五十个观察结果。下面是我正在处理的数据集的报告。

age 和 gnder 变量有一些拼写错误，我建议按如下方式修正：

colnames(d)[8] <- 'COND'
d$gender = ifelse(tolower(substr(d$gender,1,1)) == "k", "F", "M") 
library(libr)
d <- datastep(d, {
  if (is.na(age)) {
    age <- 21
  }}
)

我正在尝试使用以下代码创建摘要 table：

CreateTableOne(
  vars = c('TASK', 'COND', 't1.key', 'T1.response', 'age', 'T1.ACC'), 
  strata = c('ID'),
  factorVars = c('gender'), 
  argsApprox = list(correct = FALSE), 
  smd = TRUE, 
  addOverall = TRUE, 
  test = TRUE) %>% 
  na.omit() %>% 
  kableone()

获得这个table

不过这个功能你怎么看，我对同一个对象有很多观察，所以我只算了54个ID，所以男女的数量是不正确的。

length(unique(d$ID)) 
[1] 54

有人知道怎么解决吗？此外，由于 'age' 和 'T1.ACC' 具有非正态分布，有人知道我如何用中位数和 Q1 和 Q3 替换它们吗？

Answer 1

我愿意帮助你。但是，您提供的数据存在以下问题：

缺少变量 COND
TASK变量只有一个唯一值（ CreateTableOne函数不接受具有一个唯一值的变量）。
变量只有一个唯一值 age。
变量ID重复了几次。

但是，即使不更改您的数据，您也可以看到您的问题所在。如果你有这种形式的数据，你不能使用CreateTableOne！这是因为它计算值 m 的每次出现和值 k 的每次出现。由于一个人有多个条目，CreateTableOne 函数将分别计算每个条目。

请看一下我在这里提出的解决方案。

更新 1

好的。让我们尝试面对您的数据。您有 54 位不同 ID 的患者。

data_Confidence_in_Action %>% distinct(ID) %>% nrow()
#[1] 54

但是请注意，有一个 ID 似乎不正确。

data_Confidence_in_Action %>% distinct(ID) %>%
  mutate(lenID = str_length(ID)) %>% filter(lenID!=5)
#  A tibble: 1 x 2
#  ID         lenID
#  <chr>      <int>
#1 P1419 dots    10

不过，我们可以保持原样。如果必须，请自行更正。但是，请记住您有多达 8 种不同的性别。要小心，因为在我们国家，性别意识形态并不受欢迎 ;-)

data_Confidence_in_Action %>% distinct(gender)
#  A tibble: 8 x 1
#  gender     
#  <chr>      
#1 k          
#2 kobieta    
#3 M          
#4 K          
#5 m¦Ö+-czyzna
#6 21         
#7 m          
#8 M¦Ö+-czyzna

不幸的是，这需要修复。不幸的是，患者 P1440 是按性别分配年龄的。那么P1440的性别是什么？

data_Confidence_in_Action %>% filter(gender==21) %>% distinct(ID, gender, age)
#  A tibble: 1 x 3
#  ID    gender   age
#  <chr> <chr>  <dbl>
#1 P1440 21        NA

data_Confidence_in_Action %>% distinct(ID, gender) %>% 
  group_by(gender) %>% summarise(n = n())
#  A tibble: 8 x 2
#  gender          n
#  <chr>       <int>
#1 21              1
#2 k              36
#3 K               3
#4 kobieta         9
#5 m               1
#6 M               1
#7 m¦Ö+-czyzna     2
#8 M¦Ö+-czyzna     1

如你所见，你有更多的女性。所以让P1440是一个女人。会好吗？

最后，请注意这两个变量的名称不方便。这是关于 Condition (whether a person responded) 和 Go / Nogo (whether a person should respond).

让我们一次解决所有问题。

data_Confidence_in_Action = data_Confidence_in_Action %>% 
  mutate(
    gender = ifelse(str_detect(gender, "[k,K,21]"),"k","m"),
    age = ifelse(is.na(age), 21, age)
  ) %>% rename(Condition=`Condition (whether a person responded)`, 
               Go.Nogo = `Go/Nogo (whether a person should respond)`)

最后，让我们将一些变量从 chr 更改为 factor，但不要替换正确的级别。我希望我明智地接受了它。

data_Confidence_in_Action = data_Confidence_in_Action %>% 
  mutate(
    ID = ID %>% fct_inorder(),
    gender = gender %>% fct_infreq(),
    t1.key = t1.key %>% fct_infreq(),
    Condition = Condition %>% fct_infreq(),
    CR.key = CR.key %>% fct_infreq(),
    TASK = TASK %>% fct_infreq(),
    Go.Nogo = Go.Nogo %>% fct_infreq(),
    difficulty = difficulty %>% factor(c("easy", "medium", "hard"))
  )

有了这样组织的数据，让我们进入问题的核心。你到底想分析什么。请注意，对于 TASK、 Condition 和 t1.key 等变量，每个申请人都有有效值。

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  nunique.TASK = length(unique(TASK)),
  nunique.Condition = length(unique(Condition)),
  nunique.t1.key = length(unique(t1.key))
) %>% distinct(nunique.TASK, nunique.Condition, nunique.t1.key)
#  A tibble: 1 x 3
#  nunique.TASK nunique.Condition nunique.t1.key
#         <int>             <int>          <int>
#1            2                 2              2

但是，如果我们查看这些变量中不同值的出现比例，它们在每个患者中都是不同的。

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  prop.TASK = sum(TASK=="left")/sum(TASK=="right")) %>% 
  distinct()

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  prop.Condition = sum(Condition=="NR")/sum(Condition=="R"))%>% 
  distinct()

data_Confidence_in_Action %>% group_by(ID) %>% summarise(
  prop.t1.key = sum(t1.key=="None")/sum(t1.key=="space"))%>% 
  distinct()

所以写清楚你想要总结什么以及如何总结，因为我不清楚你想要得到什么。

更新 2

好的。我看得出来你开始明白一些事情了。不过，我不知道你想总结什么。往下看。首先，让我们收集所有代码来准备数据

library(tidyverse)
library(readxl)
library(tableone)
data_Confidence_in_Action <- read_excel("data_Confidence in Action.xlsx")

data_Confidence_in_Action = data_Confidence_in_Action %>%
  mutate(
    gender = ifelse(str_detect(gender, "[k,K,21]"),"k","m"),
    age = ifelse(is.na(age), 21, age)
  ) %>% rename(Condition=`Condition (whether a person responded)`,
               Go.Nogo = `Go/Nogo (whether a person should respond)`)

data_Confidence_in_Action = data_Confidence_in_Action %>%
  mutate(
    ID = ID %>% fct_inorder(),
    gender = gender %>% fct_infreq(),
    t1.key = t1.key %>% fct_infreq(),
    Condition = Condition %>% fct_infreq(),
    CR.key = CR.key %>% fct_infreq(),
    TASK = TASK %>% fct_infreq(),
    Go.Nogo = Go.Nogo %>% fct_infreq(),
    difficulty = difficulty %>% factor(c("easy", "medium", "hard"))
  )

现在是摘要。如果我们这样做：

CreateTableOne(
  data = data_Confidence_in_Action,
  vars = c('TASK', 'Condition', 't1.key', 'T1.response', 'age', 'T1.ACC'), 
  strata = 'gender',
  factorVars = c('TASK', 'Condition', 't1.key'), 
  argsApprox = list(correct = FALSE), 
  smd = TRUE, 
  addOverall = TRUE, 
  test = TRUE) %>% 
  kableone()

输出

|                        |Overall      |k            |m            |p      |test |
|:-----------------------|:------------|:------------|:------------|:------|:----|
|n                       |41713        |37823        |3890         |       |     |
|TASK = right (%)        |20832 (49.9) |18889 (49.9) |1943 (49.9)  |0.992  |     |
|Condition = R (%)       |20033 (48.0) |18130 (47.9) |1903 (48.9)  |0.241  |     |
|t1.key = space (%)      |20033 (48.0) |18130 (47.9) |1903 (48.9)  |0.241  |     |
|T1.response (mean (SD)) |0.48 (0.50)  |0.48 (0.50)  |0.49 (0.50)  |0.241  |     |
|age (mean (SD))         |20.74 (2.67) |20.75 (2.70) |20.60 (2.33) |0.001  |     |
|T1.ACC (mean (SD))      |0.70 (0.46)  |0.70 (0.46)  |0.73 (0.45)  |<0.001 |     |

我们得到所有观察结果的摘要 n == 41713。而且由于每个病人都有很多观察，这样的总结用处不大。至少我是这么认为的。但是，我们可以针对一些选定的患者进行总结。

CreateTableOne(
  data = data_Confidence_in_Action %>% 
    filter(ID %in% c('P1323', 'P1403', 'P1404')) %>% 
    mutate(ID = ID %>% fct_drop()),
  vars = c('TASK', 'Condition', 't1.key', 'T1.response', 'age', 'T1.ACC'), 
  strata = c('ID'),
  factorVars = c('TASK', 'Condition', 't1.key'), 
  argsApprox = list(correct = FALSE), 
  smd = TRUE, 
  addOverall = TRUE, 
  test = TRUE) %>% 
  kableone()

输出

|                        |Overall      |P1323        |P1403        |P1404        |p      |test |
|:-----------------------|:------------|:------------|:------------|:------------|:------|:----|
|n                       |2323         |775          |776          |772          |       |     |
|TASK = right (%)        |1164 (50.1)  |390 (50.3)   |386 (49.7)   |388 (50.3)   |0.969  |     |
|Condition = R (%)       |1168 (50.3)  |385 (49.7)   |435 (56.1)   |348 (45.1)   |<0.001 |     |
|t1.key = space (%)      |1168 (50.3)  |385 (49.7)   |435 (56.1)   |348 (45.1)   |<0.001 |     |
|T1.response (mean (SD)) |0.50 (0.50)  |0.50 (0.50)  |0.56 (0.50)  |0.45 (0.50)  |<0.001 |     |
|age (mean (SD))         |19.66 (0.94) |19.00 (0.00) |19.00 (0.00) |21.00 (0.00) |<0.001 |     |
|T1.ACC (mean (SD))      |0.70 (0.46)  |0.67 (0.47)  |0.77 (0.42)  |0.65 (0.48)  |<0.001 |     |

现在这更有意义，但对每个患者来说都是分开的。

或者，您可以在不使用 CreateTableOne 的情况下完成此摘要，例如是的

data_Confidence_in_Action %>% group_by(gender, ID) %>% 
  summarise(
    age = min(age)) %>% group_by(gender) %>% 
  summarise(
    n = n(),
    Min = min(age),
    Q1 = quantile(age,1/4,8),
    mean = mean(age),
    median = median(age),
    Q3 = quantile(age,3/4,8),
    Max = max(age),
    IQR = IQR(age),
    Kurt = e1071::kurtosis(age),
    skew = e1071::skewness(age),
    SD = sd(age))

输出

# A tibble: 2 x 12
  gender     n   Min    Q1  mean median    Q3   Max   IQR  Kurt  skew    SD
  <fct>  <int> <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 k         49    19    19  20.8     20    21    32     2  7.47 2.79   2.73
2 m          5    19    19  20.6     19    21    25     2 -1.29 0.823  2.61

仔细考虑并写下您真正期望的内容。当然，除非您仍然对这个话题感兴趣。

R：如何修改 CreateTable() 函数以重复观察并使用错误索引？

R: How to modify a CreateTable() function for reiterated observations and with the wrongs index?

r

create-table

median

qq