循环数据框并创建新行
loop over data frame and create new rows
我有一个包含两列的数据集。帐户名和帐户号。它有 35 行。我想用 AccountName、AccountNumber 和 LocationNumber 创建一个新的数据框。 LocationNumber 存储在另一个数据框中,其中包含 1 列 350 行。
所以基本上对于每个帐户名称和编号,对于每个位置编号,添加另一行,其中包含帐户名称 + 编号 + 位置编号。所以如果我有 35 个帐号和 350 个位置,最终目标是有 12,250 行。我试过使用 for
循环无济于事。
账户(姓名 | 号码)
STR EXP-VACATION ESTIMATE-0200900 200900
STR EXP-HOLIDAY PAY-0200920 200920
STR EXP-SICK PAY-0200930 200930
STR EXP-MISC TIME PAID,NOT WORKED-0200990 200990
地点:
Lo.702-002
Lo.702-003
Lo.702-004
Lo.702-005
每个帐号的最终结果
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-002
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-003
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-004
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-005
将产生我想要的结果的 PHP 代码:
foreach($accounts as $name => $number) {
foreach($locations as $location) {
echo sprintf("%s,%s,%s\n", $name, $number, $location);
}
}
我的解决方案:
acc.run <- function() {
locFileName <- 'location-list.csv'
accFileName <- 'account-list.csv'
locations <- read.csv(locFileName, sep=',', quote='\"', header=T)
accounts <- read.csv(accFileName, sep=',', quote='\"', header=T)
#Add row numbers
accounts$rowNum <- 1:nrow(accounts)
merged <- merge(accounts, locations)
sorted <- merged[order(merged$rowNum), ]
final <- sorted[, !(names(sorted) %in% c('rowNum'))]
# Random file extension to prevent duplicate/overwriting
rExt <- paste(round(runif(6,10,100)), sep='', collapse='')
write.csv(final, paste('accounts-concat', rExt, '.csv', sep='', collapse=''), row.names=F)
}
告诉我如何改进它?
这是我的原始答案的编辑版本,
修改以包含您的测试信息。
这符合您的需求吗?
# Generate some usable test data
accounts <- read.csv(text = "
AccountName|AccountNumber
STR EXP-VACATION ESTIMATE-0200900|200900
STR EXP-HOLIDAY PAY-0200920|200920
STR EXP-SICK PAY-0200930|200930
STR EXP-MISC TIME PAID,NOT WORKED-0200990|200990
", sep = "|")
locations <- read.table(header = TRUE, text = "
Location
Lo.702-002
Lo.702-003
Lo.702-004
Lo.702-005
")$Location
# Combine the data into wide format
df <- cbind(accounts, locations = t(locations))
# Restructure the data in long format
reshape(df, varying = grep("locations", names(df)), direction = "long" )
我有一个包含两列的数据集。帐户名和帐户号。它有 35 行。我想用 AccountName、AccountNumber 和 LocationNumber 创建一个新的数据框。 LocationNumber 存储在另一个数据框中,其中包含 1 列 350 行。
所以基本上对于每个帐户名称和编号,对于每个位置编号,添加另一行,其中包含帐户名称 + 编号 + 位置编号。所以如果我有 35 个帐号和 350 个位置,最终目标是有 12,250 行。我试过使用 for
循环无济于事。
账户(姓名 | 号码)
STR EXP-VACATION ESTIMATE-0200900 200900
STR EXP-HOLIDAY PAY-0200920 200920
STR EXP-SICK PAY-0200930 200930
STR EXP-MISC TIME PAID,NOT WORKED-0200990 200990
地点:
Lo.702-002
Lo.702-003
Lo.702-004
Lo.702-005
每个帐号的最终结果
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-002
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-003
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-004
STR EXP-VACATION ESTIMATE-0200900 200900 Lo.702-005
将产生我想要的结果的 PHP 代码:
foreach($accounts as $name => $number) {
foreach($locations as $location) {
echo sprintf("%s,%s,%s\n", $name, $number, $location);
}
}
我的解决方案:
acc.run <- function() {
locFileName <- 'location-list.csv'
accFileName <- 'account-list.csv'
locations <- read.csv(locFileName, sep=',', quote='\"', header=T)
accounts <- read.csv(accFileName, sep=',', quote='\"', header=T)
#Add row numbers
accounts$rowNum <- 1:nrow(accounts)
merged <- merge(accounts, locations)
sorted <- merged[order(merged$rowNum), ]
final <- sorted[, !(names(sorted) %in% c('rowNum'))]
# Random file extension to prevent duplicate/overwriting
rExt <- paste(round(runif(6,10,100)), sep='', collapse='')
write.csv(final, paste('accounts-concat', rExt, '.csv', sep='', collapse=''), row.names=F)
}
告诉我如何改进它?
这是我的原始答案的编辑版本, 修改以包含您的测试信息。 这符合您的需求吗?
# Generate some usable test data
accounts <- read.csv(text = "
AccountName|AccountNumber
STR EXP-VACATION ESTIMATE-0200900|200900
STR EXP-HOLIDAY PAY-0200920|200920
STR EXP-SICK PAY-0200930|200930
STR EXP-MISC TIME PAID,NOT WORKED-0200990|200990
", sep = "|")
locations <- read.table(header = TRUE, text = "
Location
Lo.702-002
Lo.702-003
Lo.702-004
Lo.702-005
")$Location
# Combine the data into wide format
df <- cbind(accounts, locations = t(locations))
# Restructure the data in long format
reshape(df, varying = grep("locations", names(df)), direction = "long" )