通过迭代添加行来创建数据框

Question

我正在尝试创建一个数据框 (BOS.df)，以便探索我将在接收实际数据之前执行的未来分析的结构。在这种情况下，假设有 4 家餐厅正在寻找运行广告活动（"Restaurant" 变量）。活动将持续的总天数是 cmp.lngth。我想要随机数来表示他们为广告支付了多少费用 (ra.num)。广告活动从 StartDate 开始。最终，我想创建一个循环遍历每家餐厅的数据框，并通过添加行为广告活动的每一天添加一个随机账单编号。

#Create Data Placeholders
set.seed(123)
Restaurant <- c('B1', 'B2', 'B3', 'B4')
cmp.lngth <- 42
ra.num <- rnorm(cmp.lngth, mean = 100, sd = 10)
StartDate <- as.Date("2017-07-14")


BOS.df <- data.frame(matrix(NA, nrow =0, ncol = 3))
colnames(BOS.df) <- c("Restaurant", "Billings", "Date")


for(i in 1:length(Restaurant)){
  for(z in 1:cmp.lngth){
    BOS.row <- c(as.character(Restaurant[i]),ra.num[z],StartDate + 
    cmp.lngth[z]-1)
    BOS.df <- rbind(BOS.df, BOS.row)
  }
}

我的代码目前无法正常运行。列名不正确，并且数据没有正确放置（如果有的话）。输出如下：

  X.B1. X.94.3952435344779. X.17402.
1    B1    94.3952435344779    17402
2    B1                <NA>     <NA>
3    B1                <NA>     <NA>
4    B1                <NA>     <NA>
5    B1                <NA>     <NA>
6    B1                <NA>     <NA>

如何获得正确的输出？有没有比使用 for 循环更有效的方法？

Answer 1

您可以使用 rbind，但这是另一种方法。
另外，数据框的长度应该是cmp.lngth*length(Restaurant)，而不是cmp.lngth。

#Create Data Placeholders
set.seed(123)
Restaurant <- c('B1', 'B2', 'B3', 'B4')
cmp.lngth <- 42
ra.num <- rnorm(cmp.lngth, mean = 100, sd = 10)
StartDate <- as.Date("2017-07-14")


BOS.df <- data.frame(matrix(NA, nrow = cmp.lngth*length(Restaurant), ncol = 3))
colnames(BOS.df) <- c("Restaurant", "Billings", "Date")

count <- 1
for(name in Restaurant){
    for(z in 1:cmp.lngth){
        BOS.row <- c(name, ra.num[z], as.character(StartDate + z - 1))
        BOS.df[count,] <- BOS.row
        count <- count + 1
    }
}

我还建议您查看名为 tidyverse 的包，并使用 add_row 和 tibble 而不是数据框。这是一个示例代码：

library(tidyverse)
BOS.tb <- tibble(Restaurant = character(),
                 Billings = numeric(),
                 Date = character())

for(name in Restaurant){
    for(z in 1:cmp.lngth){
        BOS.row <- c(name, ra.num[z], as.character(StartDate + z - 1))
        BOS.tb <- add_row(BOS.tb, 
                          Restaurant = name, 
                          Billings = ra.num[z], 
                          Date = as.character(StartDate + z - 1))
    }
}

Answer 2

使用expand.grid:

cmp.lngth <- 2
StartDate <- as.Date("2017-07-14")

set.seed(1)
df1 <- data.frame(expand.grid(Restaurant, seq(cmp.lngth) + StartDate))
colnames(df1) <- c("Restaurant", "Date")
df1$Billings <- rnorm(nrow(df1), mean = 100, sd = 10)
df1 <- df1[ order(df1$Restaurant, df1$Date), ]

df1
#   Restaurant       Date  Billings
# 1         B1 2017-07-15  93.73546
# 5         B1 2017-07-16 103.29508
# 2         B2 2017-07-15 101.83643
# 6         B2 2017-07-16  91.79532
# 3         B3 2017-07-15  91.64371
# 7         B3 2017-07-16 104.87429
# 4         B4 2017-07-15 115.95281
# 8         B4 2017-07-16 107.38325

通过迭代添加行来创建数据框

Create data frame by iteratively adding rows

simulation

r

dataframe