How to write Json null value as an empty line in new file(将基于json的日志转换为列格式,即每列一个文件)

How to write Json null value as an empty line in new file (converting json based log into column format, i.e., one file per column)

日志文件示例:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}

它将生成5个文件:

1.timestamp.column

2.Field1.column

3.Field_Doc.f1.column

4.Field_Doc.f2.column

5.Field_Doc.f3.column

timestamp.column的示例内容:

2022-01-14T00:12:21.000
2022-01-18T00:15:51.000

当键的值为空时,我遇到了一个问题,例如当值 us 为空时未定义:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}

有人可以帮我吗?

注意,输入文件实际上是一个NDJSON。参见 the docs

话虽这么说,因为 furas already gave on how to process the NDJSON logfile I'm going to skip that part. Do note that there's a library to deal with NDJSON files. See PyPI

他的代码需要最小的调整来处理 undefined 边缘情况。 null 值是一个有效的 JSON 值,因此他的代码不会中断。

您可以在执行 json.loads() 时通过 string.replace() 轻松解决此问题,使其变得有效 JSON,然后您可以在编写时检查是否 value == None 替换带有空字符串的值。请注意,None 是 python 等同于 JSON 的 null

请注意在替换函数中包含 : ,这是为了防止漏报...

主循环逻辑

for line in file_obj:
    # the replace function makes it valid JSON
    data = json.loads(line.replace(': undefined', ': null'))
    print(data)
    process_dict(data, write_func)

write_func()函数调整

def write_func(key, value):
    with open(key + '.column', "a") as f:
        # if the value == None, make it an empty string.
        if value == None:
            value = ''
        f.write(str(value) + "\n")

我使用以下内容作为输入字符串:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}