如何在已经展平的数据框中展平一列嵌套 json 对象
How to flatten a column of nested json objects in an already flattened dataframe
我有一个包含嵌套对象的 json 文件,该文件在 pandas 数据框中展平。有一个包含嵌套 json 对象的列,我发现很难展平。
我尝试了很多方法,这是让我走得最远的方法。
不胜感激,谢谢。
不幸的是,我无法找到类似 jsfiddle 的 python 替代方案来提供工作示例。
我知道使用 json_normalize 的元参数我可以向我的数据框添加列。但是这种方法在不平坦的列上不起作用,因为我通过将 record_path 设置为 'markets' 来使 json_normalize 在我的设置中运行良好,这是我的主要 json 对象文件。因此,在此设置中,我无法 record_path 到 'marketStats' 并通过元参数添加任何相关列。
目标
目标是将 marketStats 对象中的一个或所有 json 个对象转换为数据框的列。
代码
with open('Data/20012022.json') as file:
data = json.loads(file.read())
# Flatten data
df0 = pd.json_normalize(
data,
record_path =['markets']
)
df0.head(3)
截图
这是 table 当前的屏幕截图,marketStats 列包含嵌套的 json。
数据
这是来自 json 文件的片段。 `
{
"markets": [
{
"id": 335,
"baseCurrency": "eth",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-ETH",
"marketName": "btc-eth",
"symbol": "ETHBTC",
"volume": "40624.5823",
"quoteVolume": "3026.13646935",
"btcVolume": "3026.13646935",
"usdVolume": "127009429.050524367",
"currentPrice": 0.074681,
"latestBase": {
"id": 161774475,
"time": 1639576800,
"date": "2021-12-15T14:00:00.000+00:00",
"price": "0.077653",
"lowestPrice": "0.0729",
"bounce": "6.283",
"currentDrop": "-3.8272829124438206",
"crackedAt": "2022-01-07T03:00:00.000Z",
"respectedAt": "2022-01-15T15:00:00.000Z",
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "50.0",
"medianDrop": "-4.08",
"medianBounce": "5.51",
"hoursToRespected": 106,
"crackedCount": 2,
"respectedCount": 1
},
{
"algorithm": "day_trade",
"ratio": "100.0",
"medianDrop": "-6.12",
"medianBounce": "6.28",
"hoursToRespected": 204,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "conservative",
"ratio": "100.0",
"medianDrop": "-6.12",
"medianBounce": "8.38",
"hoursToRespected": 204,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "position",
"ratio": "50.0",
"medianDrop": "-6.12",
"medianBounce": "6.19",
"hoursToRespected": 204,
"crackedCount": 2,
"respectedCount": 1
},
{
"algorithm": "hodloo",
"ratio": "50.0",
"medianDrop": "-3.29",
"medianBounce": "0.0",
"hoursToRespected": 225,
"crackedCount": 4,
"respectedCount": 2
}
]
},
{
"id": 337,
"baseCurrency": "ltc",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-LTC",
"marketName": "btc-ltc",
"symbol": "LTCBTC",
"volume": "68309.637",
"quoteVolume": "223.79294524",
"btcVolume": "223.79294524",
"usdVolume": "9392773.4219378968",
"currentPrice": 0.003275,
"latestBase": {
"id": 163982984,
"time": 1642374000,
"date": "2022-01-16T23:00:00.000+00:00",
"price": "0.003346",
"lowestPrice": "0.00322",
"bounce": "3.839",
"currentDrop": "-2.1219366407650926",
"crackedAt": "2022-01-18T23:00:00.000Z",
"respectedAt": null,
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "57.14",
"medianDrop": "-3.28",
"medianBounce": "3.84",
"hoursToRespected": 186,
"crackedCount": 7,
"respectedCount": 4
},
{
"algorithm": "day_trade",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "5.68",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "conservative",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "5.68",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "position",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "8.16",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "hodloo",
"ratio": "75.0",
"medianDrop": "-3.7",
"medianBounce": "0.0",
"hoursToRespected": 35,
"crackedCount": 4,
"respectedCount": 3
}
]
},
{
"id": 339,
"baseCurrency": "bnb",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-BNB",
"marketName": "btc-bnb",
"symbol": "BNBBTC",
"volume": "154576.177",
"quoteVolume": "1724.66664804",
"btcVolume": "1724.66664804",
"usdVolume": "72385673.4448901928",
"currentPrice": 0.01099,
"latestBase": {
"id": 163753765,
"time": 1642068000,
"date": "2022-01-13T10:00:00.000+00:00",
"price": "0.01093",
"lowestPrice": "0.01093",
"bounce": "3.102",
"currentDrop": "0.5489478499542543",
"crackedAt": null,
"respectedAt": null,
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "100.0",
"medianDrop": "-7.18",
"medianBounce": "4.34",
"hoursToRespected": 62,
"crackedCount": 2,
"respectedCount": 2
},
{
"algorithm": "day_trade",
"ratio": "100.0",
"medianDrop": "-6.19",
"medianBounce": "4.3",
"hoursToRespected": 63,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "conservative",
"ratio": "66.67",
"medianDrop": "-3.15",
"medianBounce": "4.05",
"hoursToRespected": 62,
"crackedCount": 3,
"respectedCount": 2
},
{
"algorithm": "position",
"ratio": "100.0",
"medianDrop": "-3.15",
"medianBounce": "4.46",
"hoursToRespected": 60,
"crackedCount": 2,
"respectedCount": 2
},
{
"algorithm": "hodloo",
"ratio": "100.0",
"medianDrop": "-7.46",
"medianBounce": "0.0",
"hoursToRespected": 62,
"crackedCount": 5,
"respectedCount": 5
}
]
}
]
}
您可以将一些 post-processing 应用到 df0
来实现您想要的效果。在这里,您可以应用 explode
,然后将 apply(pf.Series)
应用到 'marketStats'
列:
df1 = df0.explode('marketStats')['marketStats'].apply(pd.Series)
df1
看起来像这样:
algorithm ratio medianDrop medianBounce hoursToRespected crackedCount respectedCount
-- ------------ ------- ------------ -------------- ------------------ -------------- ----------------
0 original 50 -4.08 5.51 106 2 1
0 day_trade 100 -6.12 6.28 204 1 1
0 conservative 100 -6.12 8.38 204 1 1
0 position 50 -6.12 6.19 204 2 1
0 hodloo 50 -3.29 0 225 4 2
1 original 57.14 -3.28 3.84 186 7 4
1 day_trade 0 0 5.68 0 1 0
1 conservative 0 0 5.68 0 1 0
1 position 0 0 8.16 0 1 0
1 hodloo 75 -3.7 0 35 4 3
2 original 100 -7.18 4.34 62 2 2
2 day_trade 100 -6.19 4.3 63 1 1
2 conservative 66.67 -3.15 4.05 62 3 2
2 position 100 -3.15 4.46 60 2 2
2 hodloo 100 -7.46 0 62 5 5
如果您希望它与所有其他列结合使用,您可以使用 join
:
df0.join(df1)
我不会post这个命令的输出,因为它相当大
我有一个包含嵌套对象的 json 文件,该文件在 pandas 数据框中展平。有一个包含嵌套 json 对象的列,我发现很难展平。
我尝试了很多方法,这是让我走得最远的方法。
不胜感激,谢谢。
不幸的是,我无法找到类似 jsfiddle 的 python 替代方案来提供工作示例。
我知道使用 json_normalize 的元参数我可以向我的数据框添加列。但是这种方法在不平坦的列上不起作用,因为我通过将 record_path 设置为 'markets' 来使 json_normalize 在我的设置中运行良好,这是我的主要 json 对象文件。因此,在此设置中,我无法 record_path 到 'marketStats' 并通过元参数添加任何相关列。
目标
目标是将 marketStats 对象中的一个或所有 json 个对象转换为数据框的列。
代码
with open('Data/20012022.json') as file:
data = json.loads(file.read())
# Flatten data
df0 = pd.json_normalize(
data,
record_path =['markets']
)
df0.head(3)
截图
这是 table 当前的屏幕截图,marketStats 列包含嵌套的 json。
数据
这是来自 json 文件的片段。 `
{
"markets": [
{
"id": 335,
"baseCurrency": "eth",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-ETH",
"marketName": "btc-eth",
"symbol": "ETHBTC",
"volume": "40624.5823",
"quoteVolume": "3026.13646935",
"btcVolume": "3026.13646935",
"usdVolume": "127009429.050524367",
"currentPrice": 0.074681,
"latestBase": {
"id": 161774475,
"time": 1639576800,
"date": "2021-12-15T14:00:00.000+00:00",
"price": "0.077653",
"lowestPrice": "0.0729",
"bounce": "6.283",
"currentDrop": "-3.8272829124438206",
"crackedAt": "2022-01-07T03:00:00.000Z",
"respectedAt": "2022-01-15T15:00:00.000Z",
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "50.0",
"medianDrop": "-4.08",
"medianBounce": "5.51",
"hoursToRespected": 106,
"crackedCount": 2,
"respectedCount": 1
},
{
"algorithm": "day_trade",
"ratio": "100.0",
"medianDrop": "-6.12",
"medianBounce": "6.28",
"hoursToRespected": 204,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "conservative",
"ratio": "100.0",
"medianDrop": "-6.12",
"medianBounce": "8.38",
"hoursToRespected": 204,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "position",
"ratio": "50.0",
"medianDrop": "-6.12",
"medianBounce": "6.19",
"hoursToRespected": 204,
"crackedCount": 2,
"respectedCount": 1
},
{
"algorithm": "hodloo",
"ratio": "50.0",
"medianDrop": "-3.29",
"medianBounce": "0.0",
"hoursToRespected": 225,
"crackedCount": 4,
"respectedCount": 2
}
]
},
{
"id": 337,
"baseCurrency": "ltc",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-LTC",
"marketName": "btc-ltc",
"symbol": "LTCBTC",
"volume": "68309.637",
"quoteVolume": "223.79294524",
"btcVolume": "223.79294524",
"usdVolume": "9392773.4219378968",
"currentPrice": 0.003275,
"latestBase": {
"id": 163982984,
"time": 1642374000,
"date": "2022-01-16T23:00:00.000+00:00",
"price": "0.003346",
"lowestPrice": "0.00322",
"bounce": "3.839",
"currentDrop": "-2.1219366407650926",
"crackedAt": "2022-01-18T23:00:00.000Z",
"respectedAt": null,
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "57.14",
"medianDrop": "-3.28",
"medianBounce": "3.84",
"hoursToRespected": 186,
"crackedCount": 7,
"respectedCount": 4
},
{
"algorithm": "day_trade",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "5.68",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "conservative",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "5.68",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "position",
"ratio": "0.0",
"medianDrop": "0.0",
"medianBounce": "8.16",
"hoursToRespected": 0,
"crackedCount": 1,
"respectedCount": 0
},
{
"algorithm": "hodloo",
"ratio": "75.0",
"medianDrop": "-3.7",
"medianBounce": "0.0",
"hoursToRespected": 35,
"crackedCount": 4,
"respectedCount": 3
}
]
},
{
"id": 339,
"baseCurrency": "bnb",
"quoteCurrency": "btc",
"exchangeName": "Binance",
"exchangeCode": "BINA",
"longName": "BTC-BNB",
"marketName": "btc-bnb",
"symbol": "BNBBTC",
"volume": "154576.177",
"quoteVolume": "1724.66664804",
"btcVolume": "1724.66664804",
"usdVolume": "72385673.4448901928",
"currentPrice": 0.01099,
"latestBase": {
"id": 163753765,
"time": 1642068000,
"date": "2022-01-13T10:00:00.000+00:00",
"price": "0.01093",
"lowestPrice": "0.01093",
"bounce": "3.102",
"currentDrop": "0.5489478499542543",
"crackedAt": null,
"respectedAt": null,
"isLowest": false
},
"marketStats": [
{
"algorithm": "original",
"ratio": "100.0",
"medianDrop": "-7.18",
"medianBounce": "4.34",
"hoursToRespected": 62,
"crackedCount": 2,
"respectedCount": 2
},
{
"algorithm": "day_trade",
"ratio": "100.0",
"medianDrop": "-6.19",
"medianBounce": "4.3",
"hoursToRespected": 63,
"crackedCount": 1,
"respectedCount": 1
},
{
"algorithm": "conservative",
"ratio": "66.67",
"medianDrop": "-3.15",
"medianBounce": "4.05",
"hoursToRespected": 62,
"crackedCount": 3,
"respectedCount": 2
},
{
"algorithm": "position",
"ratio": "100.0",
"medianDrop": "-3.15",
"medianBounce": "4.46",
"hoursToRespected": 60,
"crackedCount": 2,
"respectedCount": 2
},
{
"algorithm": "hodloo",
"ratio": "100.0",
"medianDrop": "-7.46",
"medianBounce": "0.0",
"hoursToRespected": 62,
"crackedCount": 5,
"respectedCount": 5
}
]
}
]
}
您可以将一些 post-processing 应用到 df0
来实现您想要的效果。在这里,您可以应用 explode
,然后将 apply(pf.Series)
应用到 'marketStats'
列:
df1 = df0.explode('marketStats')['marketStats'].apply(pd.Series)
df1
看起来像这样:
algorithm ratio medianDrop medianBounce hoursToRespected crackedCount respectedCount
-- ------------ ------- ------------ -------------- ------------------ -------------- ----------------
0 original 50 -4.08 5.51 106 2 1
0 day_trade 100 -6.12 6.28 204 1 1
0 conservative 100 -6.12 8.38 204 1 1
0 position 50 -6.12 6.19 204 2 1
0 hodloo 50 -3.29 0 225 4 2
1 original 57.14 -3.28 3.84 186 7 4
1 day_trade 0 0 5.68 0 1 0
1 conservative 0 0 5.68 0 1 0
1 position 0 0 8.16 0 1 0
1 hodloo 75 -3.7 0 35 4 3
2 original 100 -7.18 4.34 62 2 2
2 day_trade 100 -6.19 4.3 63 1 1
2 conservative 66.67 -3.15 4.05 62 3 2
2 position 100 -3.15 4.46 60 2 2
2 hodloo 100 -7.46 0 62 5 5
如果您希望它与所有其他列结合使用,您可以使用 join
:
df0.join(df1)
我不会post这个命令的输出,因为它相当大