如何解析 R 中的特定实体?
How to parse specific entities in R?
我有 JSON 文件,其中包含事件和这些事件中的日志,示例如下所示:
{
"sessionEvents": [
{
"u": "BC0F6A3A2840B6F48386BABC5F34B480BA4F9929",
"v": "0.1.0",
"dv": "Unidentified",
"t": 1462924115818,
"uid": "",
"len": 148012,
"by": 0,
"g": "U",
"cy": "PH",
"cr": "Unknown",
"dm": "O+ Xfinit",
"lat": 0.0,
"lon": 0.0,
"l": [
{
"e": "100_SESSION_START",
"o": 24,
"d": 147988,
"p": {
"User_Timezone": "-08:00",
"Session_nb": "0",
"Energy_Balance": "89",
"Global_Playtime": "0",
"Device_id": "75e64b654c01949",
"Game_Language": "en",
"Connection_Type": "WIFI",
"User_Country": "US",
"Push_Impact": "None"
}
},
{
"e": "008_TUTORIAL_STEP_OTHER",
"o": 7561,
"d": 0,
"p": {
"Screen_id": "scene_screen",
"Misclicks": "0",
"Tutorial_Step": "19",
"Average_Time_Per_Frame": "0",
"Total_Time": "0"
}
}
]
},
{
"u": "C950FC733D883E11E36E15A705E05A3CC7748C3A",
"v": "0.1.0",
"dv": "OPPO Mirror 5",
"t": 1462908916463,
"uid": "",
"len": 5368,
"by": 0,
"g": "U",
"cy": "PH",
"cr": "Unknown",
"dm": "A51w",
"lat": 0.0,
"lon": 0.0,
"l": [
{
"e": "100_SESSION_START",
"o": 169,
"d": 5199,
"p": {
"User_Timezone": "-08:00",
"Session_nb": "0",
"Energy_Balance": "0",
"Global_Playtime": "0",
"Device_id": "d0de71513e48fba",
"Game_Language": "en",
"Connection_Type": "WIFI",
"User_Country": "US",
"Push_Impact": "None"
}
}
]
}
]
}
如您所见,有一个带有事件日志的二级对象 "l" 和带有参数的第三级对象 "p",这让我很痛苦。我正在尝试将其转换为数据框,但我只需要 table 中的“100_SESSION_START”日志值("l" 和 "p" 名称的所有参数名称都是相同),另外,我需要添加来自更高级别对象的所有参数 - 事件('u'、'v'、'dv'、't'...)。有谁知道如何做用户 R 吗?
upd: 结果是 table 像这样就好了
click
假设您已将 json
文件加载到 data
变量中
data <- fromJSON("/home/joel/tmp/input.json")
然后您可以根据需要迭代每个事件和每个事件日志:
n<-length(data$sessionEvents$u)
for (i in 1:n) { # Iterate over events
print(data$sessionEvents$u[i])
print(data$sessionEvents$v[i])
print(data$sessionEvents$dv[i])
print(data$sessionEvents$t[i])
m<-length(data$sessionEvents$l[i][[1]]$e)
for(j in 1:m){ # Iterate over logs
print(data$sessionEvents$l[i][[1]]$e[j])
}
}
希望对您有所帮助。
您可以使用 lapply
来做类似的事情。
topLevel <- c("u", "v", "dv", "t")
midLevel <- c("e", "o", "d")
botLevel <- c("User_Timezone", "Session_nb", "Energy_Balance", "Global_Playtime")
do.call(rbind, lapply(li[[1]], function(x) {
do.call(rbind, lapply(x$l, function(y) {
if(y$e == "100_SESSION_START") {
c(y[midLevel], y$p[botLevel], x[topLevel])
}
}))
}))
e o d User_Timezone Session_nb Energy_Balance Global_Playtime
[1,] "100_SESSION_START" 24 147988 "-08:00" "0" "89" "0"
[2,] "100_SESSION_START" 169 5199 "-08:00" "0" "0" "0"
u v dv t
[1,] "BC0F6A3A2840B6F48386BABC5F34B480BA4F9929" "0.1.0" "Unidentified" 1.462924e+12
[2,] "C950FC733D883E11E36E15A705E05A3CC7748C3A" "0.1.0" "OPPO Mirror 5" 1.462909e+12
我有 JSON 文件,其中包含事件和这些事件中的日志,示例如下所示:
{
"sessionEvents": [
{
"u": "BC0F6A3A2840B6F48386BABC5F34B480BA4F9929",
"v": "0.1.0",
"dv": "Unidentified",
"t": 1462924115818,
"uid": "",
"len": 148012,
"by": 0,
"g": "U",
"cy": "PH",
"cr": "Unknown",
"dm": "O+ Xfinit",
"lat": 0.0,
"lon": 0.0,
"l": [
{
"e": "100_SESSION_START",
"o": 24,
"d": 147988,
"p": {
"User_Timezone": "-08:00",
"Session_nb": "0",
"Energy_Balance": "89",
"Global_Playtime": "0",
"Device_id": "75e64b654c01949",
"Game_Language": "en",
"Connection_Type": "WIFI",
"User_Country": "US",
"Push_Impact": "None"
}
},
{
"e": "008_TUTORIAL_STEP_OTHER",
"o": 7561,
"d": 0,
"p": {
"Screen_id": "scene_screen",
"Misclicks": "0",
"Tutorial_Step": "19",
"Average_Time_Per_Frame": "0",
"Total_Time": "0"
}
}
]
},
{
"u": "C950FC733D883E11E36E15A705E05A3CC7748C3A",
"v": "0.1.0",
"dv": "OPPO Mirror 5",
"t": 1462908916463,
"uid": "",
"len": 5368,
"by": 0,
"g": "U",
"cy": "PH",
"cr": "Unknown",
"dm": "A51w",
"lat": 0.0,
"lon": 0.0,
"l": [
{
"e": "100_SESSION_START",
"o": 169,
"d": 5199,
"p": {
"User_Timezone": "-08:00",
"Session_nb": "0",
"Energy_Balance": "0",
"Global_Playtime": "0",
"Device_id": "d0de71513e48fba",
"Game_Language": "en",
"Connection_Type": "WIFI",
"User_Country": "US",
"Push_Impact": "None"
}
}
]
}
]
}
如您所见,有一个带有事件日志的二级对象 "l" 和带有参数的第三级对象 "p",这让我很痛苦。我正在尝试将其转换为数据框,但我只需要 table 中的“100_SESSION_START”日志值("l" 和 "p" 名称的所有参数名称都是相同),另外,我需要添加来自更高级别对象的所有参数 - 事件('u'、'v'、'dv'、't'...)。有谁知道如何做用户 R 吗?
upd: 结果是 table 像这样就好了 click
假设您已将 json
文件加载到 data
变量中
data <- fromJSON("/home/joel/tmp/input.json")
然后您可以根据需要迭代每个事件和每个事件日志:
n<-length(data$sessionEvents$u)
for (i in 1:n) { # Iterate over events
print(data$sessionEvents$u[i])
print(data$sessionEvents$v[i])
print(data$sessionEvents$dv[i])
print(data$sessionEvents$t[i])
m<-length(data$sessionEvents$l[i][[1]]$e)
for(j in 1:m){ # Iterate over logs
print(data$sessionEvents$l[i][[1]]$e[j])
}
}
希望对您有所帮助。
您可以使用 lapply
来做类似的事情。
topLevel <- c("u", "v", "dv", "t")
midLevel <- c("e", "o", "d")
botLevel <- c("User_Timezone", "Session_nb", "Energy_Balance", "Global_Playtime")
do.call(rbind, lapply(li[[1]], function(x) {
do.call(rbind, lapply(x$l, function(y) {
if(y$e == "100_SESSION_START") {
c(y[midLevel], y$p[botLevel], x[topLevel])
}
}))
}))
e o d User_Timezone Session_nb Energy_Balance Global_Playtime
[1,] "100_SESSION_START" 24 147988 "-08:00" "0" "89" "0"
[2,] "100_SESSION_START" 169 5199 "-08:00" "0" "0" "0"
u v dv t
[1,] "BC0F6A3A2840B6F48386BABC5F34B480BA4F9929" "0.1.0" "Unidentified" 1.462924e+12
[2,] "C950FC733D883E11E36E15A705E05A3CC7748C3A" "0.1.0" "OPPO Mirror 5" 1.462909e+12