将包含单独 JSON 行的文件导入列表
Importing a file with individual JSON lines to a list
需要一些帮助。
我有一个 JSON 文件,它是 Auth0 导出数据转储的结果。每一行都是 而不是 以逗号分隔。
下面是名为 OUTPUT_USER_DUMP.json
的文件
{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "jsmith@company.com"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "banderson@company.com"}
我想做的是使用 python 脚本打开这个 json 转储文件并将内容分配到列表变量中(打印出列表变量时的示例)
[{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "jsmith@company.com"},
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "banderson@company.com"}]
有什么帮助吗?
鉴于:
bad_json='''
{
"user_id": "auth0|5f9886ee8e36ac0069e8fc3a",
"name": "John Smith",
"email": "jsmith@company.com"
}
{
"user_id": "auth0|5fa43f699e937f0068c40d8e",
"name": "Bob Anderson",
"email": "banderson@company.com"
}'''
您可以使用正则表达式:
import re
import json
t=re.sub(r"\}\s*\{", "},\n{", bad_json)
new_json=rf'[{t}]'
>>> json.loads(new_json)
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': 'jsmith@company.com'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': 'banderson@company.com'}]
编辑
您的文件似乎是 JSON 个人的 LINES。
鉴于:
cat file
{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "jsmith@company.com"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "banderson@company.com"}
您可以遍历文件 line-by-line 并边走边解码:
import json
with open('/tmp/file') as f:
data=[json.loads(line) for line in f]
>>> data
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': 'jsmith@company.com'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': 'banderson@company.com'}]
您可以直接用 pandas 读取新行分隔的 JSON 文件。您还可以使用 dataframe
上的 to_dict 函数将其转换为您请求的格式
代码
df = pd.read_json('./OUTPUT_USER_DUMP.json', lines=True)
print(df.to_dict('records'))
输出
[
{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': 'jsmith@company.com'},
{'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': 'banderson@company.com'}
]
您可以逐行读取文件并将每一行加载为 json 数据:
from json import loads
with open("OUTPUT_USER_DUMP.json", "r") as f2r:
data = [loads(each_line) for each_line in f2r]
print(data)
由于文件对象是可迭代的,在 python 中产生它们的行,您可以编写一个函数将每一行作为一个 JSON 对象处理:
from json import dumps, loads
from typing import Iterable, Iterator
FILENAME = 'OUTPUT_USER_DUMP.json'
def read_json_objects(file: Iterable[str]) -> Iterator[dict]:
"""Yield JSON objects from each line of a given file."""
for line in file:
if line := line.strip():
yield loads(line)
def main():
"""Run the script."""
with open(FILENAME, 'r', encoding='utf-8') as file:
json = list(read_json_objects(file))
print(dumps(json, indent=2))
if __name__ == '__main__':
main()
需要一些帮助。
我有一个 JSON 文件,它是 Auth0 导出数据转储的结果。每一行都是 而不是 以逗号分隔。
下面是名为 OUTPUT_USER_DUMP.json
{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "jsmith@company.com"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "banderson@company.com"}
我想做的是使用 python 脚本打开这个 json 转储文件并将内容分配到列表变量中(打印出列表变量时的示例)
[{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "jsmith@company.com"},
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "banderson@company.com"}]
有什么帮助吗?
鉴于:
bad_json='''
{
"user_id": "auth0|5f9886ee8e36ac0069e8fc3a",
"name": "John Smith",
"email": "jsmith@company.com"
}
{
"user_id": "auth0|5fa43f699e937f0068c40d8e",
"name": "Bob Anderson",
"email": "banderson@company.com"
}'''
您可以使用正则表达式:
import re
import json
t=re.sub(r"\}\s*\{", "},\n{", bad_json)
new_json=rf'[{t}]'
>>> json.loads(new_json)
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': 'jsmith@company.com'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': 'banderson@company.com'}]
编辑
您的文件似乎是 JSON 个人的 LINES。
鉴于:
cat file
{"user_id": "auth0|5f9886ee8e36ac0069e8fc3a","name": "John Smith","email": "jsmith@company.com"}
{"user_id": "auth0|5fa43f699e937f0068c40d8e","name": "Bob Anderson","email": "banderson@company.com"}
您可以遍历文件 line-by-line 并边走边解码:
import json
with open('/tmp/file') as f:
data=[json.loads(line) for line in f]
>>> data
[{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': 'jsmith@company.com'}, {'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': 'banderson@company.com'}]
您可以直接用 pandas 读取新行分隔的 JSON 文件。您还可以使用 dataframe
上的 to_dict 函数将其转换为您请求的格式代码
df = pd.read_json('./OUTPUT_USER_DUMP.json', lines=True)
print(df.to_dict('records'))
输出
[
{'user_id': 'auth0|5f9886ee8e36ac0069e8fc3a', 'name': 'John Smith', 'email': 'jsmith@company.com'},
{'user_id': 'auth0|5fa43f699e937f0068c40d8e', 'name': 'Bob Anderson', 'email': 'banderson@company.com'}
]
您可以逐行读取文件并将每一行加载为 json 数据:
from json import loads
with open("OUTPUT_USER_DUMP.json", "r") as f2r:
data = [loads(each_line) for each_line in f2r]
print(data)
由于文件对象是可迭代的,在 python 中产生它们的行,您可以编写一个函数将每一行作为一个 JSON 对象处理:
from json import dumps, loads
from typing import Iterable, Iterator
FILENAME = 'OUTPUT_USER_DUMP.json'
def read_json_objects(file: Iterable[str]) -> Iterator[dict]:
"""Yield JSON objects from each line of a given file."""
for line in file:
if line := line.strip():
yield loads(line)
def main():
"""Run the script."""
with open(FILENAME, 'r', encoding='utf-8') as file:
json = list(read_json_objects(file))
print(dumps(json, indent=2))
if __name__ == '__main__':
main()