如何在 JSON 对象中循环遍历 JSON 数组
How to loop over JSON array in a JSON object
我一直在努力学习 R,我有一个 JSON 文件,里面全是单行 JSON 对象,每个对象都有一个帐户数据数组。我想要做的是解析每一行,然后从解析的 JSON 对象中获取 JSON 数组,提取帐户类型和金额。但我的问题是我不知道如何最好地提取这两个属性。
我尝试使用 dplyr 包从我的每个 JSON 行中提取“accountHistory”,但我收到控制台错误。当我尝试时:
select(JsonAcctData, "accountHistory.type", "accountHistory.amount")
实际情况是,我的代码仅 returns 每行类型和金额的最后一个帐户。
现在我的代码正在写入一个 csv 文件,我可以看到我需要的所有数据,但我只想删除 ext
library("rjson")
library("dplyr")
parseJsonData <- function (sourceFile, outputFile)
{
#Get all total lines in the source file provided
totalLines <- readLines(sourceFile)
#Clean up old output file
if(file.exists(outputFile)){
file.remove(outputFile)
}
#Loop over each line in the sourceFile,
#parse the JSON and append to DataFrame
JsonAcctData <- NULL
for(i in 1:length(totalLines)){
jsonValue <- fromJSON(totalLines[[i]])
frame <- data.frame(jsonValue)
JsonAcctData <- rbind(JsonAcctData, frame)
}
#Try to get filtered data
filteredColumns <-
select(JsonAcctData, "accountHistory.type", "accountHistory.amount")
print(filteredColumns)
#Write the DataFrame to the output file in CSV format
write.csv(JsonAcctData, file = outputFile)
}
测试JSON文件数据:
{"name":"Test1", "accountHistory":[{"amount":"107.62","date":"2012-02-
02T06:00:00.000Z","business":"CompanyA","name":"Home Loan Account
6220","type":"payment","account":"11111111"},
{"amount":"650.88","date":"2012-02-
02T06:00:00.000Z","business":"CompanyF","name":"Checking Account
9001","type":"payment","account":"123123123"},
{"amount":"878.63","date":"2012-02-
02T06:00:00.000Z","business":"CompanyG","name":"Money Market Account
8743","type":"deposit","account":"123123123"}]}
{"name":"Test2", "accountHistory":[{"amount":"199.29","date":"2012-02-
02T06:00:00.000Z","business":"CompanyB","name":"Savings Account
3580","type":"invoice","account":"12312312"},
{"amount":"841.48","date":"2012-02-
02T06:00:00.000Z","business":"Company","name":"Home Loan Account
5988","type":"payment","account":"123123123"},
{"amount":"116.55","date":"2012-02-
02T06:00:00.000Z","business":"Company","name":"Auto Loan Account
1794","type":"withdrawal","account":"12312313"}]}
我希望得到一个 csv,其中只包含帐户类型和每个帐户中持有的金额。
这是使用 regex
的方法(在 base R
中)
# read json
json <- readLines('test.json', warn = FALSE)
# extract with regex
amount <- grep('\"amount\":\"\d+\.\d+\"', json, value = TRUE)
amount <- as.numeric(gsub('.*amount\":\"(\d+\.+\d+)\".*', '\1', amount, perl = TRUE))
type <- grep('\"type\":\"\w+\"', json, value = TRUE)
type <- gsub('.*type\":\"(\w+)\".*', '\1', type, perl = TRUE)
# output
data.frame(type, amount)
# type amount
# 1 payment 107.62
# 2 payment 650.88
# 3 deposit 878.63
# 4 invoice 199.29
# 5 payment 841.48
# 6 withdrawal 116.55
我一直在努力学习 R,我有一个 JSON 文件,里面全是单行 JSON 对象,每个对象都有一个帐户数据数组。我想要做的是解析每一行,然后从解析的 JSON 对象中获取 JSON 数组,提取帐户类型和金额。但我的问题是我不知道如何最好地提取这两个属性。
我尝试使用 dplyr 包从我的每个 JSON 行中提取“accountHistory”,但我收到控制台错误。当我尝试时:
select(JsonAcctData, "accountHistory.type", "accountHistory.amount")
实际情况是,我的代码仅 returns 每行类型和金额的最后一个帐户。
现在我的代码正在写入一个 csv 文件,我可以看到我需要的所有数据,但我只想删除 ext
library("rjson")
library("dplyr")
parseJsonData <- function (sourceFile, outputFile)
{
#Get all total lines in the source file provided
totalLines <- readLines(sourceFile)
#Clean up old output file
if(file.exists(outputFile)){
file.remove(outputFile)
}
#Loop over each line in the sourceFile,
#parse the JSON and append to DataFrame
JsonAcctData <- NULL
for(i in 1:length(totalLines)){
jsonValue <- fromJSON(totalLines[[i]])
frame <- data.frame(jsonValue)
JsonAcctData <- rbind(JsonAcctData, frame)
}
#Try to get filtered data
filteredColumns <-
select(JsonAcctData, "accountHistory.type", "accountHistory.amount")
print(filteredColumns)
#Write the DataFrame to the output file in CSV format
write.csv(JsonAcctData, file = outputFile)
}
测试JSON文件数据:
{"name":"Test1", "accountHistory":[{"amount":"107.62","date":"2012-02-
02T06:00:00.000Z","business":"CompanyA","name":"Home Loan Account
6220","type":"payment","account":"11111111"},
{"amount":"650.88","date":"2012-02-
02T06:00:00.000Z","business":"CompanyF","name":"Checking Account
9001","type":"payment","account":"123123123"},
{"amount":"878.63","date":"2012-02-
02T06:00:00.000Z","business":"CompanyG","name":"Money Market Account
8743","type":"deposit","account":"123123123"}]}
{"name":"Test2", "accountHistory":[{"amount":"199.29","date":"2012-02-
02T06:00:00.000Z","business":"CompanyB","name":"Savings Account
3580","type":"invoice","account":"12312312"},
{"amount":"841.48","date":"2012-02-
02T06:00:00.000Z","business":"Company","name":"Home Loan Account
5988","type":"payment","account":"123123123"},
{"amount":"116.55","date":"2012-02-
02T06:00:00.000Z","business":"Company","name":"Auto Loan Account
1794","type":"withdrawal","account":"12312313"}]}
我希望得到一个 csv,其中只包含帐户类型和每个帐户中持有的金额。
这是使用 regex
的方法(在 base R
中)
# read json
json <- readLines('test.json', warn = FALSE)
# extract with regex
amount <- grep('\"amount\":\"\d+\.\d+\"', json, value = TRUE)
amount <- as.numeric(gsub('.*amount\":\"(\d+\.+\d+)\".*', '\1', amount, perl = TRUE))
type <- grep('\"type\":\"\w+\"', json, value = TRUE)
type <- gsub('.*type\":\"(\w+)\".*', '\1', type, perl = TRUE)
# output
data.frame(type, amount)
# type amount
# 1 payment 107.62
# 2 payment 650.88
# 3 deposit 878.63
# 4 invoice 199.29
# 5 payment 841.48
# 6 withdrawal 116.55