R中的近似比例保留和（1 = 100％）

Question

我有一个大 table，其中我按子类别 countsperc 计算了计数（子类别名称未显示）对于每个类别 (id)，则总数 sumofcounts 列中每个类别 (id) 的观察值，以及子类别的比例 apppropor 中的总数 (counsperc/sumofcounts)（大约比例），需要近似（小数点后 3）。
问题是，类别 (id) 的近似比例总和 (old_sum) 必须是 1.000 而不是 0.999，等等
所以，我想寻求一种方法，在 apppropor 列的任何子项上添加或减去 0.001，以便始终获得 1.000 作为总和。例如，在第 1 行中，数字可能是 0.334 而不是 0.333
编辑：该任务的目标不是仅仅产生 1 的精确总和，这没有任何用处，而是产生对其他程序的输入，该程序将按原样考虑列 apppropor （要求它的总和为 1.000根据 id，请参阅下面的错误消息）。

text1<-"
id    countsperc sumofcounts   apppropor     
item1          1           3       0.333     
item1          1           3       0.333     
item1          1           3       0.333     
item2          1         121       0.008     
item2        119         121       0.983     
item2          1         121       0.008     
item3          1          44       0.023    
item3          1          44       0.023     
item3         41          44       0.932     
item3          1          44       0.023     
item4          1          29       0.034     
item4          3          29       0.103      
item4          1          29       0.034   
item4         24          29       0.828"
table1<-read.table(text=text1,header=T)
library(data.table)
sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, old_sum = V1)])
table1<-merge(table1,sums)
table1

chromEvol Version: 2.0. Last updated December 2013

The count probabilities for taxa Ad_mic not sum to 1.0 chromEvol: errorMsg.cpp:41: static void errorMsg::reportError(const string&, int): Assertion `0' failed. Aborted (core dumped)

Answer 1

如果您需要 sum_of_prop 在每一行中都等于 1，那么您的计算方式有误。您不添加 0.333 + 0.333 + 0.333 然后强制该总和为 1。您添加 (1/3) + (1/3) + (1/3) 然后总和实际上是 1.

假设没有其他列可以更改，尝试这样计算 sum_of_prop：

n <- length(table1$id)
new_sum_of_prop <- rep(0, n)
for (i in 1:n) {
  tempitem <- table1$id[i]
  tempsum <- sum(table1$countsperc[(table1$id == tempitem)])
  new_sum_of_prop[i] <- table1$sumofcounts[i] / tempsum
}

table2 <- as.data.frame(cbind(table1, new_sum_of_prop))
table2
      id countsperc sumofcounts apppropor sum_of_prop new_sum_of_prop
1  item1          1           3     0.333       0.999               1
2  item1          1           3     0.333       0.999               1
3  item1          1           3     0.333       0.999               1
4  item2          1         121     0.008       0.999               1
5  item2        119         121     0.983       0.999               1
6  item2          1         121     0.008       0.999               1
7  item3          1          44     0.023       1.001               1
8  item3          1          44     0.023       1.001               1
9  item3         41          44     0.932       1.001               1
10 item3          1          44     0.023       1.001               1
11 item4          1          29     0.034       0.999               1
12 item4          3          29     0.103       0.999               1
13 item4          1          29     0.034       0.999               1
14 item4         24          29     0.828       0.999               1

我知道这并不完全符合您的要求，但从长远来看运行，如果您不在数学上偷工减料，您的结果总是会更健康。

Answer 2

我找到方法了。

table1$dif<-1-table1$old_sum
table1<-table1[order(table1$id),]
len<-rle(as.vector(table1$id))[[1]]
table1$apppropor[cumsum(len)]<-table1$apppropor[cumsum(len)]+table1$dif[cumsum(len)]
#verify
library(data.table)
sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, new_sum = V1)])
table1<-merge(table1,sums)
table1

R中的近似比例保留和（1 = 100％）

approximate proportions preserving sum (1 = 100%) in R

r

summarization

approximation