将 DataFrame 转储到 JSON 条记录

Dumping DataFrame to JSON records

我有一个数据框 df 作为:

   task_count  task   date
0  82586       foo     2015-10-31
1  57417       foo     2016-08-31
2  47800       bar     2016-12-31
3  62331       foo     2016-02-29
4  45852       bar     2017-07-31

我想生成如下输出:

[
  {
    "task": "foo",
    "task_count": [82586,57417,62331],
    "date": ["2015-10-31","2016-08-31","2016-02-29"]
  },
  {
    "task": "bar",
    "task_count": [47800,45852],
    "date": ["2016-12-31","2017-07-31"]
  }
]

到目前为止,这就是我所做的,但我无法对多列执行 groupby。

result = df.groupby('task')['task_count'].apply(list).reset_index().to_json(orient='records')
print(json.dumps(json.loads(result),indent=2)

我应该采用什么方法来获得所需的输出?

您可以使用 groupby + agg + to_dict -

df.groupby('task', as_index=False).agg(lambda x: x.tolist()).to_dict('r')
[
    {
        "date": [
            "2016-12-31",
            "2017-07-31"
        ],
        "task_count": [
            47800,
            45852
        ],
        "task": "bar"
    },
    {
        "date": [
            "2015-10-31",
            "2016-08-31",
            "2016-02-29"
        ],
        "task_count": [
            82586,
            57417,
            62331
        ],
        "task": "foo"
    }
]

如果要生成 JSON 并将结果转储到文件中,请使用 to_json 而不是 to_dict -

df.groupby('task', as_index=False)\
  .agg(lambda x: x.tolist())\
  .to_json('file.json', orient='records')

这会创建一个 file.json 包含 -

[{"task":"bar","task_count":[47800,45852],"date":["2016-12-31","2017-07-31"]},{"task":"foo","task_count":[82586,57417,62331],"date":["2015-10-31","2016-08-31","2016-02-29"]}]'