如何对向量进行排序并仅使用基于条件的两个相邻值

Question

一些背景知识：我是一名病毒学家，只涉足 r，所以想通过制作 r TCID50 (tissue-culture infectious dose 50% end-point) calculator based off the Reed-Muench 方法来磨练我的技能（link 可能需要付费）。基本上，它所做的是计算与得分 (positive/negative) 矩阵的比例距离，并找到一个点，在这个点上，50% 的值将为正。数据看起来有点像这样（注意第 3/4 列中的 1/0 转换）：

1	2	3	4
1	1	1	0
1	1	1	1
1	1	1	0
1	1	0	0
1	1	1	0

其中每列是传染源稀释系列中的一个步骤，1/0 是 positive/negative。我也有一个 Excel sheet 来做这个，但我喜欢玩 r.

实际数据和包数：

library(tidyverse)
library(FSA)

Scored_Data <- structure(list(X1 = c(0, 0, 0, 0, 0, 0, 0, 0), X2 = c(0, 0, 0, 
0, 0, 0, 0, 0), X3 = c(0, 0, 0, 0, 0, 0, 0, 0), X4 = c(0, 0, 
0, 0, 0, 0, 0, 0), X5 = c(0, 0, 0, 0, 0, 0, 0, 0), X6 = c(0, 
0, 0, 0, 0, 0, 0, 0), X7 = c(0, 0, 0, 0, 0, 0, 0, 0), X8 = c(0, 
0, 0, 0, 0, 0, 1, 0), X9 = c(1, 0, 0, 0, 0, 0, 1, 1), X10 = c(1, 
1, 1, 1, 1, 1, 0, 1), X11 = c(1, 1, 1, 1, 1, 1, 1, 1), X12 = c(1, 
1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -8L), class = c("tbl_df", 
"tbl", "data.frame"))

第一步是使用阳性和阴性的累积总和计算每列的感染率，从阳性到阴性的转变值最低。为此，我使用了包 FSA

中的函数 rcumsum

等式为：

Infection rate =  cumulative positive/(cumulative positive + cumulative negative)

我有（有点冗长，我认为绝对可以缩短）：

Scored_Data <- as.tibble(Scored_Data)
Neg_Col <- colSums(Scored_Data)
Pos_Col <- colSums(Scored_Data == 0)
Cum_Neg_Col <- cumsum(Neg_Col)
Cum_Pos_Col <- rcumsum(Pos_Col)

Inf_Rate <- Cum_Pos_Col/(Cum_Pos_Col + Cum_Neg_Col)

这给了我

Dput(Inf_Rate)
    X1         X2         X3         X4         X5         X6              X7         X8         X9        X10        X11        X12 
1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 1.00000000 0.92857143 0.60000000 0.08333333 0.00000000 0.00000000

现在困难的是计算下一步。为此，我需要找到刚好高于和低于 50% 的值来计算比例距离 (PD)。等式是：

PD = (Value above 0.5 - 0.5)/(Value above 0.5 - Value below 0.5)

在我的数据中，高于 50% 的值为 X9，低于 50% 的值为 X10

我有：

PD <- for (i in Inf_Rate){
  for (j in Inf_Rate){
    if_else(i >= 0.5 & j < 0.5, (i-0.5)/(i-j), NULL)
    }
  }

哪个（以我天真的、非数学思维）应该像这样工作...查看数据并找到满足这些条件的值，return PD 或什么都没有。但这只有 returns "NULL".

我确定有办法让它工作，但它是什么？

Answer 1

我不确定您要如何处理其中一个数字恰好为 0.5 的情况。但这是你想要的吗？

Inf_Rate <- c(X1 = 1, X2 = 1, X3 = 1, X4 = 1, X5 = 1, X6 = 1, X7 = 1, X8 = 0.928571428571429, 
              X9 = 0.6, X10 = 0.0833333333333333, X11 = 0, X12 = 0)

value_above <- min(Inf_Rate[Inf_Rate > 0.5])
value_below <- max(Inf_Rate[Inf_Rate < 0.5])

PD <- (value_above - 0.5)/(value_above - value_below)

如何对向量进行排序并仅使用基于条件的两个相邻值

How to sequence through a vector and only use two adjacent values based on a conditional

r

sequence