Monte Carlo 跨多行 CSV 成本数据的模拟（三角分布）

Question

我有一个包含成本估算的 csv 列表，每行包含每个行项目估算的下限 (l)、中心 (c) 和上限 (u) 范围估算，该估算由非 excel 准备R 用户。我读入 R 的 CSV 数据示例如下：

         Item     l     c     u
        <chr> <int> <int> <int>
1 “CostItem1”  1500  1900  2600
2 “CostItem2”  2400  3200  4400
3 “CostItem3”   500  1000  1500

每一行然后在三角分布函数 (library(triangle)) 中使用，如下所示经过多次迭代（在本例中为 runs = 10000）：

CostItem1 <- rtriangle(runs, l, u, c)

我目前在 rtriangle 函数中手动输入每个成本项目（CostItem1、CostItem2 等）的范围估算数据。

我的问题是：

当 CSV 文件读入 R 时，如何创建循环函数或其他方法来直接执行此操作？作为新手，我不知道如何解决这个问题，所有 Google 搜索都没有显示任何内容。

然后将成本项目数据合并到一个新的数据框 (TotalCostEstimate) 中，该数据框包含 10000 次模拟，每行求和以提供建模的总成本数据 (TotalCost)：

 TotalCostEstimate<-data.frame(CostItem1 ,CostItem2 ,TotalCost=rowSums(x))

从这里可以绘制和呈现数据以供分析和决策制定。对于少量的成本项目，手动输入还不错，但有时我的行数 > 50，我不想这样做 50 次以上！！

非常感谢您抽出宝贵时间查看此内容。

Answer 1

与其直接从 CSV 中执行此操作，不如将 CSV 读入矩阵，创建总成本矩阵，然后运行 for 循环来模拟值。

例如这样：

runs<-1000 #Set number of runs
Info_costs<- read.csv( "Your_file_name.csv") #Read in the information
Total_cost_items<-matrix(,nrow=runs,ncol=length(Info_costs$Item)) #Create an empty matrix to contain your simulations
for (i in 1:length(Info_costs$Item))
   {Total_cost_items[,i]<-rtriangle(n=runs,Info_costs$l[i],Info_costs$u[i],Info_costs$c[i]) } 
#Fill the matrix
Total_cost_items<-data.frame(Total_cost_items, rowSums(Total_cost_items)) #append the matrix with the row sums

您可能需要使用选项调整 read.csv 函数，当然还有正确的文件名，以便它正确读取您的文件。您也可以稍后将数据框的列重命名为更有用的东西

Answer 2

您可以使用 read.csv 读取数据并将其保存为 data.frame。这是一些虚拟数据：

df <- data.frame(Item=letters[1:3], l=1:3, c=2:4, u=3:5)
df

  Item l c u
1    a 1 2 3
2    b 2 3 4
3    c 3 4 5

您可以使用 foreach 和 dplyr 来完成您想要的：

library(foreach)
library(dplyr)

df <- foreach(I=1:nrow(df), .combine=rbind) %do% rtriangle(10,df$l[I],df$c[I],df$u[I]) %>%
as.data.frame() %>%
mutate( sum = rowSums(.))

这将遍历df的每一行，执行rtriangle，将结果数据绑定到matrix，将matrix转换成data.frame, 可以计算出 rowSums.

我的输出

   V1  V2  V3  V4  V5  V6  V7  V8  V9 V10 sum
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Answer 3

已解决 - 感谢@Maarten Punt！

以为我会 post 最终的工作解决方案：

TotalCostEstimate<-matrix(,nrow=runs,ncol=length(basedata$Item)) #Create an empty matrix to contain your simulations
for (i in 1:length(basedata$Item)) # Prepare distributions based on the distribution type select (1 [triangle] or 2 [discrete])
{if (basedata$DistType[i] == 1) { 
        TotalCostEstimate[,i]<-rtriangle(n=runs,basedata$l[i],basedata$u[i],basedata$c[i]) 
}else{
        TotalCostEstimate[,i]<- sample(c(0,basedata$u[i]),runs,replace=TRUE)        
        }}
#Fill the matrix
TotalCostEstimate<-data.frame(TotalCostEstimate, rowSums(TotalCostEstimate)) #append the matrix with the row sums
for (i in 1:length(basedata$Item))
{colnames(TotalCostEstimate)[i]<-basedata$Item[i] } # Rename the column names to the cost items from base data
#Rename the last column based on the number of cost items
i<-length(basedata$Item)
colnames(TotalCostEstimate)[i+1]<-"TotalCost"

重要的是要注意，我修改了 CSV 以包含一个新字段 'DistType'，它允许用户 select 在模拟中使用的分布类型 - 离散（开或关）或三角形：

          Item     l     c     u DistType
            <chr> <int> <int> <int>    <int>
1     “CostItem1”  1500  1900  2600        1
2     “CostItem2”  2400  3200  4400        1
3     “CostItem3”   500  1000  1500        1
4 “DiscCostItem4”     0     0  1500        2

我还修改了循环函数以获取 CSV 文件的成本项名称并将它们分配到输出的列中，最后求和的列 [i+1] 被命名为 'TotalCost'。这允许 outputs/plots 根据列名自动命名（再次使用循环）。

Monte Carlo 跨多行 CSV 成本数据的模拟（三角分布）

Monte Carlo Simulation (Triangular Distribution) across rows of CSV cost data

r

csv

montecarlo

我的问题是：

已解决 - 感谢@Maarten Punt！