将数据框转换为嵌套的 jsonl 文件
Convert dataframe to nested jsonl file
我需要以特定方式将数据帧转换为嵌套的 jsonl 文件。我有下面的数据框,我自己构建了“数量详细信息”列,这意味着它之前是 2 个单独的列。
id price quantity details
0 12 11.00 "quantity" : 4.0, "locationId" : 1234567
1 34 22.00 "quantity" : 7.0, "locationId" : 1234567
2 56 33.00 "quantity" : 13.0, "locationId" : 1234567
3 78 44.00 "quantity" : 2.0, "locationId" : 1234567
4 90 55.00 "quantity" : 3.0, "locationId" : 1234567
我使用下面的代码将“输入”添加到前面,同时将其转换为 jsonl,感谢这个线程 。
json_as_str=df.to_json(orient="index")
json_value=json.loads(json_as_str)
string_formatted=[]
for key,val in json_value.items():
string_formatted.append("{'input':%s}" %val)
with open("file_name_here.jsonl","a") as fh:
for i in string_formatted:
i=i.replace("'",'"')
fh.write(f"{i}\n")
我得到的jsonl文件:
{"input":{"id": "12", "price": 11, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "34", "price": 22, "quantity details": ""availableQuantity": 15.0, "locationId": 1234567"}}
{"input":{"id": "56", "price": 33, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "78", "price": 44, "quantity details": ""availableQuantity": 14.0, "locationId": 1234567"}}
{"input":{"id": "90", "price": 55, "quantity details": ""availableQuantity": 10.0, "locationId": 1234567"}}
这是 jsonl 文件的所需输出:
{"input":{"id": "12", "price": 11, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "34", "price": 22, "quantity details": {"availableQuantity": 15.0, "locationId": 1234567}}}
{"input":{"id": "56", "price": 33, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "78", "price": 44, "quantity details": {"availableQuantity": 14.0, "locationId": 1234567}}}
{"input":{"id": "90", "price": 55, "quantity details": {"availableQuantity": 10.0, "locationId": 1234567}}}
非常感谢任何帮助。感谢您阅读本文
将列 "quantity details"
中的每个值转换为字典,然后将每一行写入文件,如下所示:
import pandas as pd
import json
# toy data
df = pd.DataFrame.from_dict(
{'id': {0: 12, 1: 34, 2: 56, 3: 78, 4: 90}, 'price': {0: 11.0, 1: 22.0, 2: 33.0, 3: 44.0, 4: 55.0},
'quantity details': {0: '"quantity" : 4.0, "locationId" : 1234567', 1: '"quantity" : 7.0, "locationId" : 1234567',
2: '"quantity" : 13.0, "locationId" : 1234567', 3: '"quantity" : 2.0, "locationId" : 1234567',
4: '"quantity" : 3.0, "locationId" : 1234567'}})
df["quantity details"] = df["quantity details"].apply("{{{}}}".format).apply(json.loads)
with open("file_name_here.jsonl", "a") as fh:
for value in df.to_dict(orient="index").values():
json.dump({"input": value}, fh)
fh.write("\n")
输出 (file_name_here.jsonl)
{"input": {"id": 12, "price": 11.0, "quantity details": {"quantity": 4.0, "locationId": 1234567}}}
{"input": {"id": 34, "price": 22.0, "quantity details": {"quantity": 7.0, "locationId": 1234567}}}
{"input": {"id": 56, "price": 33.0, "quantity details": {"quantity": 13.0, "locationId": 1234567}}}
{"input": {"id": 78, "price": 44.0, "quantity details": {"quantity": 2.0, "locationId": 1234567}}}
{"input": {"id": 90, "price": 55.0, "quantity details": {"quantity": 3.0, "locationId": 1234567}}}
我需要以特定方式将数据帧转换为嵌套的 jsonl 文件。我有下面的数据框,我自己构建了“数量详细信息”列,这意味着它之前是 2 个单独的列。
id price quantity details
0 12 11.00 "quantity" : 4.0, "locationId" : 1234567
1 34 22.00 "quantity" : 7.0, "locationId" : 1234567
2 56 33.00 "quantity" : 13.0, "locationId" : 1234567
3 78 44.00 "quantity" : 2.0, "locationId" : 1234567
4 90 55.00 "quantity" : 3.0, "locationId" : 1234567
我使用下面的代码将“输入”添加到前面,同时将其转换为 jsonl,感谢这个线程
json_as_str=df.to_json(orient="index")
json_value=json.loads(json_as_str)
string_formatted=[]
for key,val in json_value.items():
string_formatted.append("{'input':%s}" %val)
with open("file_name_here.jsonl","a") as fh:
for i in string_formatted:
i=i.replace("'",'"')
fh.write(f"{i}\n")
我得到的jsonl文件:
{"input":{"id": "12", "price": 11, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "34", "price": 22, "quantity details": ""availableQuantity": 15.0, "locationId": 1234567"}}
{"input":{"id": "56", "price": 33, "quantity details": ""availableQuantity": 23.0, "locationId": 1234567"}}
{"input":{"id": "78", "price": 44, "quantity details": ""availableQuantity": 14.0, "locationId": 1234567"}}
{"input":{"id": "90", "price": 55, "quantity details": ""availableQuantity": 10.0, "locationId": 1234567"}}
这是 jsonl 文件的所需输出:
{"input":{"id": "12", "price": 11, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "34", "price": 22, "quantity details": {"availableQuantity": 15.0, "locationId": 1234567}}}
{"input":{"id": "56", "price": 33, "quantity details": {"availableQuantity": 23.0, "locationId": 1234567}}}
{"input":{"id": "78", "price": 44, "quantity details": {"availableQuantity": 14.0, "locationId": 1234567}}}
{"input":{"id": "90", "price": 55, "quantity details": {"availableQuantity": 10.0, "locationId": 1234567}}}
非常感谢任何帮助。感谢您阅读本文
将列 "quantity details"
中的每个值转换为字典,然后将每一行写入文件,如下所示:
import pandas as pd
import json
# toy data
df = pd.DataFrame.from_dict(
{'id': {0: 12, 1: 34, 2: 56, 3: 78, 4: 90}, 'price': {0: 11.0, 1: 22.0, 2: 33.0, 3: 44.0, 4: 55.0},
'quantity details': {0: '"quantity" : 4.0, "locationId" : 1234567', 1: '"quantity" : 7.0, "locationId" : 1234567',
2: '"quantity" : 13.0, "locationId" : 1234567', 3: '"quantity" : 2.0, "locationId" : 1234567',
4: '"quantity" : 3.0, "locationId" : 1234567'}})
df["quantity details"] = df["quantity details"].apply("{{{}}}".format).apply(json.loads)
with open("file_name_here.jsonl", "a") as fh:
for value in df.to_dict(orient="index").values():
json.dump({"input": value}, fh)
fh.write("\n")
输出 (file_name_here.jsonl)
{"input": {"id": 12, "price": 11.0, "quantity details": {"quantity": 4.0, "locationId": 1234567}}}
{"input": {"id": 34, "price": 22.0, "quantity details": {"quantity": 7.0, "locationId": 1234567}}}
{"input": {"id": 56, "price": 33.0, "quantity details": {"quantity": 13.0, "locationId": 1234567}}}
{"input": {"id": 78, "price": 44.0, "quantity details": {"quantity": 2.0, "locationId": 1234567}}}
{"input": {"id": 90, "price": 55.0, "quantity details": {"quantity": 3.0, "locationId": 1234567}}}