使用字典和列表处理 Pandas 的 .log 文件以制作 Dataframe?

Processing .log file with Pandas with Dictionaries and Lists to make Dataframe?

我可以将此文件转换为 pandas 数据帧吗?该文件的扩展名为 .log,它有很多行的这一行(不要介意这些值):

{
    "asdasd":"1831a-12123",
    "id1":"23x.abc212.4566",
    "id2":"456a.2412.16348x5_def",
    "id3":"sdaw-p-2323",
    "abcd":"xyz",
    "asdsadas":"\"sdasdsad\": sadasd",
    "xasda":0.8,
    "id4":"409cc2e",
    "dictionary":{"sadasd":"xdasd","zxczxc":"asdsa","xczczxczx":"sdsdsadas.xyz"},
    "zxczxcz":["xczczxc"],
    "xczczxcz":"dqwdqwd",
    "dadsdsd":["sdsd"],
    "asdasdasdaxcz":true,
    "xczxczxczxc":"sdadsa.xcxcxc.ab",
    "bgfbgb":["dsvsdvsdv"],
    "cascasas":["asxsaasx"],
    "xsxasxas":[],
    "xasxasxas":"wewewe",
    "sdasdasd":"xzczxc",
    "id5":"VB 9",
    "id6":"5134132451",
    "id7":"8989898",
    "sdasdasdsadsa":[],
    "xcascxassaxa":1234,
    "sadasdadasdsad":4567
}

这是错误的

import pandas as pd
data = open('/Users/sadsad.log')
df = pd.DataFrame([data])
df
1 rows × 259800 columns

您可以打开文件并使用 json.loads 将其转换为 python 字典,然后使用 pd.DataFrame

阅读它
import json
import pandas as pd

with open('/Users/sadsad.log', 'r', encoding='utf-8') as f:
    data = json.loads(f.read())

df = pd.DataFrame([data])
print(df)

        asdasd              id1                    id2          id3 abcd  \
0  1831a-12123  23x.abc212.4566  456a.2412.16348x5_def  sdaw-p-2323  xyz

             asdsadas  xasda      id4  \
0  "sdasdsad": sadasd    0.8  409cc2e

                                                             dictionary  \
0  {'sadasd': 'xdasd', 'zxczxc': 'asdsa', 'xczczxczx': 'sdsdsadas.xyz'}

     zxczxcz xczczxcz dadsdsd  asdasdasdaxcz       xczxczxczxc       bgfbgb  \
0  [xczczxc]  dqwdqwd  [sdsd]           True  sdadsa.xcxcxc.ab  [dsvsdvsdv]

     cascasas xsxasxas xasxasxas sdasdasd   id5         id6      id7  \
0  [asxsaasx]       []    wewewe   xzczxc  VB 9  5134132451  8989898

  sdasdasdsadsa  xcascxassaxa  sadasdadasdsad
0            []          1234            4567

你的数据里面有嵌套字典,你也可以试试pd.json_normalize

df = pd.json_normalize(data)
print(df)

        asdasd              id1                    id2          id3 abcd  \
0  1831a-12123  23x.abc212.4566  456a.2412.16348x5_def  sdaw-p-2323  xyz

             asdsadas  xasda      id4    zxczxcz xczczxcz dadsdsd  \
0  "sdasdsad": sadasd    0.8  409cc2e  [xczczxc]  dqwdqwd  [sdsd]

   asdasdasdaxcz       xczxczxczxc       bgfbgb    cascasas xsxasxas  \
0           True  sdadsa.xcxcxc.ab  [dsvsdvsdv]  [asxsaasx]       []

  xasxasxas sdasdasd   id5         id6      id7 sdasdasdsadsa  xcascxassaxa  \
0    wewewe   xzczxc  VB 9  5134132451  8989898            []          1234

   sadasdadasdsad dictionary.sadasd dictionary.zxczxc dictionary.xczczxczx
0            4567             xdasd             asdsa        sdsdsadas.xyz
logs = []
for line in open('/Users/sadsad.log', 'r'):
    logs.append(json.loads(line))

df = pd.DataFrame(logs)

df.sample(n=50)

对于字典问题,我正在查看: Split / Explode a column of dictionaries into separate columns with pandas

df = pd.concat([df.drop(['dictionary'], axis=1), df['dictionary'].apply(pd.Series)], axis=1)

成功了,虽然有点慢