如何将 pandas 系列转换为所需的 JSON 格式?
How to convert pandas Series to desired JSON format?
我有以下数据,我需要在这些数据上应用聚合函数,然后进行 groupby。
我的数据如下:data.csv
id,category,sub_category,count
0,x,sub1,10
1,x,sub2,20
2,x,sub2,10
3,y,sub3,30
4,y,sub3,5
5,y,sub4,15
6,z,sub5,20
在这里,我试图通过子类别明智地获得计数。之后我需要以 JSON 格式存储结果。下面的一段代码帮助我实现了这一点。 test.py
import pandas as pd
df = pd.read_csv('data.csv')
sub_category_total = df['count'].groupby([df['category'], df['sub_category']]).sum()
print sub_category_total.reset_index().to_json(orient = "records")
以上代码给出了以下格式。
[{"category":"x","sub_category":"sub1","count":10},{"category":"x","sub_category":"sub2","count":30},{"category":"y","sub_category":"sub3","count":35},{"category":"y","sub_category":"sub4","count":15},{"category":"z","sub_category":"sub5","count":20}]
但是,我想要的格式如下:
{
"x":[{
"sub_category":"sub1",
"count":10
},
{
"sub_category":"sub2",
"count":30}],
"y":[{
"sub_category":"sub3",
"count":35
},
{
"sub_category":"sub4",
"count":15}],
"z":[{
"sub_category":"sub5",
"count":20}]
}
通过关注@ 的讨论,我将 test.py
的最后两行替换为
g = df.groupby('category')[["sub_category","count"]].apply(lambda x: x.to_dict(orient='records'))
print g.to_json()
它给了我以下输出。
{"x":[{"count":10,"sub_category":"sub1"},{"count":20,"sub_category":"sub2"},{"count":10,"sub_category":"sub2"}],"y":[{"count":30,"sub_category":"sub3"},{"count":5,"sub_category":"sub3"},{"count":15,"sub_category":"sub4"}],"z":[{"count":20,"sub_category":"sub5"}]}
虽然上面的结果与我想要的格式有点相似,但我无法在这里执行任何聚合函数,因为它会抛出错误 'numpy.int64' object has no attribute 'to_dict'
。因此,我最终得到了数据文件中的所有行。
有人可以帮助我实现上述 JSON 格式吗?
我想你可以先用sum
, parameter as_index=False
was added to groupby
, so output is Dataframe
df1
and then use 聚合:
df1 = (df.groupby(['category','sub_category'], as_index=False)['count'].sum())
print (df1)
category sub_category count
0 x sub1 10
1 x sub2 30
2 y sub3 35
3 y sub4 15
4 z sub5 20
g = df1.groupby('category')[["sub_category","count"]]
.apply(lambda x: x.to_dict(orient='records'))
print (g.to_json())
{
"x": [{
"sub_category": "sub1",
"count": 10
}, {
"sub_category": "sub2",
"count": 30
}],
"y": [{
"sub_category": "sub3",
"count": 35
}, {
"sub_category": "sub4",
"count": 15
}],
"z": [{
"sub_category": "sub5",
"count": 20
}]
}
我有以下数据,我需要在这些数据上应用聚合函数,然后进行 groupby。
我的数据如下:data.csv
id,category,sub_category,count
0,x,sub1,10
1,x,sub2,20
2,x,sub2,10
3,y,sub3,30
4,y,sub3,5
5,y,sub4,15
6,z,sub5,20
在这里,我试图通过子类别明智地获得计数。之后我需要以 JSON 格式存储结果。下面的一段代码帮助我实现了这一点。 test.py
import pandas as pd
df = pd.read_csv('data.csv')
sub_category_total = df['count'].groupby([df['category'], df['sub_category']]).sum()
print sub_category_total.reset_index().to_json(orient = "records")
以上代码给出了以下格式。
[{"category":"x","sub_category":"sub1","count":10},{"category":"x","sub_category":"sub2","count":30},{"category":"y","sub_category":"sub3","count":35},{"category":"y","sub_category":"sub4","count":15},{"category":"z","sub_category":"sub5","count":20}]
但是,我想要的格式如下:
{
"x":[{
"sub_category":"sub1",
"count":10
},
{
"sub_category":"sub2",
"count":30}],
"y":[{
"sub_category":"sub3",
"count":35
},
{
"sub_category":"sub4",
"count":15}],
"z":[{
"sub_category":"sub5",
"count":20}]
}
通过关注@test.py
的最后两行替换为
g = df.groupby('category')[["sub_category","count"]].apply(lambda x: x.to_dict(orient='records'))
print g.to_json()
它给了我以下输出。
{"x":[{"count":10,"sub_category":"sub1"},{"count":20,"sub_category":"sub2"},{"count":10,"sub_category":"sub2"}],"y":[{"count":30,"sub_category":"sub3"},{"count":5,"sub_category":"sub3"},{"count":15,"sub_category":"sub4"}],"z":[{"count":20,"sub_category":"sub5"}]}
虽然上面的结果与我想要的格式有点相似,但我无法在这里执行任何聚合函数,因为它会抛出错误 'numpy.int64' object has no attribute 'to_dict'
。因此,我最终得到了数据文件中的所有行。
有人可以帮助我实现上述 JSON 格式吗?
我想你可以先用sum
, parameter as_index=False
was added to groupby
, so output is Dataframe
df1
and then use
df1 = (df.groupby(['category','sub_category'], as_index=False)['count'].sum())
print (df1)
category sub_category count
0 x sub1 10
1 x sub2 30
2 y sub3 35
3 y sub4 15
4 z sub5 20
g = df1.groupby('category')[["sub_category","count"]]
.apply(lambda x: x.to_dict(orient='records'))
print (g.to_json())
{
"x": [{
"sub_category": "sub1",
"count": 10
}, {
"sub_category": "sub2",
"count": 30
}],
"y": [{
"sub_category": "sub3",
"count": 35
}, {
"sub_category": "sub4",
"count": 15
}],
"z": [{
"sub_category": "sub5",
"count": 20
}]
}