将数据框转换为嵌套的 jsonl 文件

Convert dataframe to nested jsonl file

我需要以特定方式将数据帧转换为嵌套的 jsonl 文件。我有下面的数据框,我自己构建了“数量详细信息”列,这意味着它之前是 2 个单独的列。

      id    price     quantity details
0     12     11.00    "quantity" : 4.0, "locationId" : 1234567
1     34     22.00    "quantity" : 7.0, "locationId" : 1234567
2     56     33.00    "quantity" : 13.0, "locationId" : 1234567
3     78     44.00    "quantity" : 2.0, "locationId" : 1234567
4     90     55.00    "quantity" : 3.0, "locationId" : 1234567

我使用下面的代码将“输入”添加到前面,同时将其转换为 jsonl,感谢这个线程

json_as_str=df.to_json(orient="index")
json_value=json.loads(json_as_str)
string_formatted=[]
for key,val in json_value.items():
    string_formatted.append("{'input':%s}" %val)
with open("file_name_here.jsonl","a") as fh:
    for i in string_formatted:
        i=i.replace("'",'"')
        fh.write(f"{i}\n")

我得到的jsonl文件:

{"input":{"id": "12", "price": 11, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "34", "price": 22, "quantity details": ""availableQuantity": 15.0, "locationId": 1234567"}}
{"input":{"id": "56", "price": 33, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "78", "price": 44, "quantity details": ""availableQuantity": 14.0, "locationId": 1234567"}}
{"input":{"id": "90", "price": 55, "quantity details": ""availableQuantity": 10.0, "locationId": 1234567"}}

这是 jsonl 文件的所需输出

{"input":{"id": "12", "price": 11, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "34", "price": 22, "quantity details": {"availableQuantity": 15.0, "locationId": 1234567}}}
{"input":{"id": "56", "price": 33, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "78", "price": 44, "quantity details": {"availableQuantity": 14.0, "locationId": 1234567}}}
{"input":{"id": "90", "price": 55, "quantity details": {"availableQuantity": 10.0, "locationId": 1234567}}}

非常感谢任何帮助。感谢您阅读本文

将列 "quantity details" 中的每个值转换为字典,然后将每一行写入文件,如下所示:

import pandas as pd
import json

# toy data
df = pd.DataFrame.from_dict(
    {'id': {0: 12, 1: 34, 2: 56, 3: 78, 4: 90}, 'price': {0: 11.0, 1: 22.0, 2: 33.0, 3: 44.0, 4: 55.0},
     'quantity details': {0: '"quantity" : 4.0, "locationId" : 1234567', 1: '"quantity" : 7.0, "locationId" : 1234567',
                          2: '"quantity" : 13.0, "locationId" : 1234567', 3: '"quantity" : 2.0, "locationId" : 1234567',
                          4: '"quantity" : 3.0, "locationId" : 1234567'}})

df["quantity details"] = df["quantity details"].apply("{{{}}}".format).apply(json.loads)

with open("file_name_here.jsonl", "a") as fh:
    for value in df.to_dict(orient="index").values():
        json.dump({"input": value}, fh)
        fh.write("\n")

输出 (file_name_here.jsonl)

{"input": {"id": 12, "price": 11.0, "quantity details": {"quantity": 4.0, "locationId": 1234567}}}
{"input": {"id": 34, "price": 22.0, "quantity details": {"quantity": 7.0, "locationId": 1234567}}}
{"input": {"id": 56, "price": 33.0, "quantity details": {"quantity": 13.0, "locationId": 1234567}}}
{"input": {"id": 78, "price": 44.0, "quantity details": {"quantity": 2.0, "locationId": 1234567}}}
{"input": {"id": 90, "price": 55.0, "quantity details": {"quantity": 3.0, "locationId": 1234567}}}