R中的ridit变换序数变量
ridit transform ordinal variable in R
Ridit 评分 (https://en.wikipedia.org/wiki/Ridit_scoring) 通常用于将有序分类变量转换为相对频率(低于给定值的个案比例加上该值比例的一半)。
在 R
中你会怎么做?
以下软件包可能会解决您的问题。特别是命令 Ridit::ridit
很有用,如下所述。
Kruskal-Wallis 检验的扩展,允许指定任意参考组。还提供
每组的平均 Ridit。一组的平均 Ridit 是随机观察的概率估计
来自该组的将大于或等于来自参考组的随机观察值。
https://cran.r-project.org/web/packages/Ridit/Ridit.pdf
另一种方法是使用二元选择模型,如 Probit、Logit 或 Exact Logit,并提取预测的自变量,即 0 或 1。
进一步更新
这些和其他几个功能现在在 CRAN 包 ridittools
中可用,由您真正维护。
更新
删除涉及构建转换矩阵的相当愚蠢的代码,我忘记了 cumsum()
# Convert vector of counts to ridits
to.ridit <- function(v) {
(cumsum(v) - .5 * v) / sum(v)
}
# Calculate mean ridit for vector of counts relative to reference group
mean.ridit <- function(v, ref) {
sum(to.ridit(ref) * v ) / sum(v)
}
# Calculate mean ridits for several groups
# x is matrix of counts
# margin is 1 for groups in rows, 2 for groups in columns
# If ref is omitted, totals across groups are used as reference group
# If ref is a vector of counts, it's used as reference group
# Otherwise, ref is the number (or name if it exists) of the group to use as reference
ridits <- function(x, margin, ref=NULL) {
if (length(ref) > 1) {
refgroup <- ref
} else if (length(ref) == 1) {
if (margin==1) {
refgroup <- x[ref,]
} else {
refgroup <- x[, ref]
}
} else {
refgroup <- apply(x, 3-margin, sum)
}
apply(x, margin, mean.ridit, refgroup)
}
示例(Fleiss,1981 年:车祸的严重程度):
to.ridit(c(17, 54, 60, 19, 9, 6, 14))
[1] 0.04748603 0.24581006 0.56424581 0.78491620 0.86312849 0.90502793 0.96089385
备注
虽然我的代码不如另一个答案中提到的 Ridit::ridit 包灵活,但它似乎快了很多:
# Influenza subtypes by age as of week ending 2/24/18 (US CDC)
> flu.age
BY BV BU H3 H1
0-4 274 91 92 1808 500
5-24 1504 274 698 5090 951
25-64 1665 101 567 7538 1493
65+ 1476 35 330 9541 515
# Using CRAN package
> system.time(ridit(flu.age,2))
user system elapsed
3.746 0.007 3.756
# Using my code
> system.time(ridits(flu.age,2))
user system elapsed
0.001 0.000 0.000
Ridit 评分 (https://en.wikipedia.org/wiki/Ridit_scoring) 通常用于将有序分类变量转换为相对频率(低于给定值的个案比例加上该值比例的一半)。
在 R
中你会怎么做?
以下软件包可能会解决您的问题。特别是命令 Ridit::ridit
很有用,如下所述。
Kruskal-Wallis 检验的扩展,允许指定任意参考组。还提供 每组的平均 Ridit。一组的平均 Ridit 是随机观察的概率估计 来自该组的将大于或等于来自参考组的随机观察值。
https://cran.r-project.org/web/packages/Ridit/Ridit.pdf
另一种方法是使用二元选择模型,如 Probit、Logit 或 Exact Logit,并提取预测的自变量,即 0 或 1。
进一步更新
这些和其他几个功能现在在 CRAN 包 ridittools
中可用,由您真正维护。
更新 删除涉及构建转换矩阵的相当愚蠢的代码,我忘记了 cumsum()
# Convert vector of counts to ridits
to.ridit <- function(v) {
(cumsum(v) - .5 * v) / sum(v)
}
# Calculate mean ridit for vector of counts relative to reference group
mean.ridit <- function(v, ref) {
sum(to.ridit(ref) * v ) / sum(v)
}
# Calculate mean ridits for several groups
# x is matrix of counts
# margin is 1 for groups in rows, 2 for groups in columns
# If ref is omitted, totals across groups are used as reference group
# If ref is a vector of counts, it's used as reference group
# Otherwise, ref is the number (or name if it exists) of the group to use as reference
ridits <- function(x, margin, ref=NULL) {
if (length(ref) > 1) {
refgroup <- ref
} else if (length(ref) == 1) {
if (margin==1) {
refgroup <- x[ref,]
} else {
refgroup <- x[, ref]
}
} else {
refgroup <- apply(x, 3-margin, sum)
}
apply(x, margin, mean.ridit, refgroup)
}
示例(Fleiss,1981 年:车祸的严重程度):
to.ridit(c(17, 54, 60, 19, 9, 6, 14))
[1] 0.04748603 0.24581006 0.56424581 0.78491620 0.86312849 0.90502793 0.96089385
备注 虽然我的代码不如另一个答案中提到的 Ridit::ridit 包灵活,但它似乎快了很多:
# Influenza subtypes by age as of week ending 2/24/18 (US CDC)
> flu.age
BY BV BU H3 H1
0-4 274 91 92 1808 500
5-24 1504 274 698 5090 951
25-64 1665 101 567 7538 1493
65+ 1476 35 330 9541 515
# Using CRAN package
> system.time(ridit(flu.age,2))
user system elapsed
3.746 0.007 3.756
# Using my code
> system.time(ridits(flu.age,2))
user system elapsed
0.001 0.000 0.000