使用字典和列表处理 Pandas 的 .log 文件以制作 Dataframe?
Processing .log file with Pandas with Dictionaries and Lists to make Dataframe?
我可以将此文件转换为 pandas 数据帧吗?该文件的扩展名为 .log,它有很多行的这一行(不要介意这些值):
{
"asdasd":"1831a-12123",
"id1":"23x.abc212.4566",
"id2":"456a.2412.16348x5_def",
"id3":"sdaw-p-2323",
"abcd":"xyz",
"asdsadas":"\"sdasdsad\": sadasd",
"xasda":0.8,
"id4":"409cc2e",
"dictionary":{"sadasd":"xdasd","zxczxc":"asdsa","xczczxczx":"sdsdsadas.xyz"},
"zxczxcz":["xczczxc"],
"xczczxcz":"dqwdqwd",
"dadsdsd":["sdsd"],
"asdasdasdaxcz":true,
"xczxczxczxc":"sdadsa.xcxcxc.ab",
"bgfbgb":["dsvsdvsdv"],
"cascasas":["asxsaasx"],
"xsxasxas":[],
"xasxasxas":"wewewe",
"sdasdasd":"xzczxc",
"id5":"VB 9",
"id6":"5134132451",
"id7":"8989898",
"sdasdasdsadsa":[],
"xcascxassaxa":1234,
"sadasdadasdsad":4567
}
这是错误的
import pandas as pd
data = open('/Users/sadsad.log')
df = pd.DataFrame([data])
df
1 rows × 259800 columns
您可以打开文件并使用 json.loads
将其转换为 python 字典,然后使用 pd.DataFrame
阅读它
import json
import pandas as pd
with open('/Users/sadsad.log', 'r', encoding='utf-8') as f:
data = json.loads(f.read())
df = pd.DataFrame([data])
print(df)
asdasd id1 id2 id3 abcd \
0 1831a-12123 23x.abc212.4566 456a.2412.16348x5_def sdaw-p-2323 xyz
asdsadas xasda id4 \
0 "sdasdsad": sadasd 0.8 409cc2e
dictionary \
0 {'sadasd': 'xdasd', 'zxczxc': 'asdsa', 'xczczxczx': 'sdsdsadas.xyz'}
zxczxcz xczczxcz dadsdsd asdasdasdaxcz xczxczxczxc bgfbgb \
0 [xczczxc] dqwdqwd [sdsd] True sdadsa.xcxcxc.ab [dsvsdvsdv]
cascasas xsxasxas xasxasxas sdasdasd id5 id6 id7 \
0 [asxsaasx] [] wewewe xzczxc VB 9 5134132451 8989898
sdasdasdsadsa xcascxassaxa sadasdadasdsad
0 [] 1234 4567
你的数据里面有嵌套字典,你也可以试试pd.json_normalize
df = pd.json_normalize(data)
print(df)
asdasd id1 id2 id3 abcd \
0 1831a-12123 23x.abc212.4566 456a.2412.16348x5_def sdaw-p-2323 xyz
asdsadas xasda id4 zxczxcz xczczxcz dadsdsd \
0 "sdasdsad": sadasd 0.8 409cc2e [xczczxc] dqwdqwd [sdsd]
asdasdasdaxcz xczxczxczxc bgfbgb cascasas xsxasxas \
0 True sdadsa.xcxcxc.ab [dsvsdvsdv] [asxsaasx] []
xasxasxas sdasdasd id5 id6 id7 sdasdasdsadsa xcascxassaxa \
0 wewewe xzczxc VB 9 5134132451 8989898 [] 1234
sadasdadasdsad dictionary.sadasd dictionary.zxczxc dictionary.xczczxczx
0 4567 xdasd asdsa sdsdsadas.xyz
logs = []
for line in open('/Users/sadsad.log', 'r'):
logs.append(json.loads(line))
df = pd.DataFrame(logs)
df.sample(n=50)
对于字典问题,我正在查看:
Split / Explode a column of dictionaries into separate columns with pandas
df = pd.concat([df.drop(['dictionary'], axis=1), df['dictionary'].apply(pd.Series)], axis=1)
成功了,虽然有点慢
我可以将此文件转换为 pandas 数据帧吗?该文件的扩展名为 .log,它有很多行的这一行(不要介意这些值):
{
"asdasd":"1831a-12123",
"id1":"23x.abc212.4566",
"id2":"456a.2412.16348x5_def",
"id3":"sdaw-p-2323",
"abcd":"xyz",
"asdsadas":"\"sdasdsad\": sadasd",
"xasda":0.8,
"id4":"409cc2e",
"dictionary":{"sadasd":"xdasd","zxczxc":"asdsa","xczczxczx":"sdsdsadas.xyz"},
"zxczxcz":["xczczxc"],
"xczczxcz":"dqwdqwd",
"dadsdsd":["sdsd"],
"asdasdasdaxcz":true,
"xczxczxczxc":"sdadsa.xcxcxc.ab",
"bgfbgb":["dsvsdvsdv"],
"cascasas":["asxsaasx"],
"xsxasxas":[],
"xasxasxas":"wewewe",
"sdasdasd":"xzczxc",
"id5":"VB 9",
"id6":"5134132451",
"id7":"8989898",
"sdasdasdsadsa":[],
"xcascxassaxa":1234,
"sadasdadasdsad":4567
}
这是错误的
import pandas as pd
data = open('/Users/sadsad.log')
df = pd.DataFrame([data])
df
1 rows × 259800 columns
您可以打开文件并使用 json.loads
将其转换为 python 字典,然后使用 pd.DataFrame
import json
import pandas as pd
with open('/Users/sadsad.log', 'r', encoding='utf-8') as f:
data = json.loads(f.read())
df = pd.DataFrame([data])
print(df)
asdasd id1 id2 id3 abcd \
0 1831a-12123 23x.abc212.4566 456a.2412.16348x5_def sdaw-p-2323 xyz
asdsadas xasda id4 \
0 "sdasdsad": sadasd 0.8 409cc2e
dictionary \
0 {'sadasd': 'xdasd', 'zxczxc': 'asdsa', 'xczczxczx': 'sdsdsadas.xyz'}
zxczxcz xczczxcz dadsdsd asdasdasdaxcz xczxczxczxc bgfbgb \
0 [xczczxc] dqwdqwd [sdsd] True sdadsa.xcxcxc.ab [dsvsdvsdv]
cascasas xsxasxas xasxasxas sdasdasd id5 id6 id7 \
0 [asxsaasx] [] wewewe xzczxc VB 9 5134132451 8989898
sdasdasdsadsa xcascxassaxa sadasdadasdsad
0 [] 1234 4567
你的数据里面有嵌套字典,你也可以试试pd.json_normalize
df = pd.json_normalize(data)
print(df)
asdasd id1 id2 id3 abcd \
0 1831a-12123 23x.abc212.4566 456a.2412.16348x5_def sdaw-p-2323 xyz
asdsadas xasda id4 zxczxcz xczczxcz dadsdsd \
0 "sdasdsad": sadasd 0.8 409cc2e [xczczxc] dqwdqwd [sdsd]
asdasdasdaxcz xczxczxczxc bgfbgb cascasas xsxasxas \
0 True sdadsa.xcxcxc.ab [dsvsdvsdv] [asxsaasx] []
xasxasxas sdasdasd id5 id6 id7 sdasdasdsadsa xcascxassaxa \
0 wewewe xzczxc VB 9 5134132451 8989898 [] 1234
sadasdadasdsad dictionary.sadasd dictionary.zxczxc dictionary.xczczxczx
0 4567 xdasd asdsa sdsdsadas.xyz
logs = []
for line in open('/Users/sadsad.log', 'r'):
logs.append(json.loads(line))
df = pd.DataFrame(logs)
df.sample(n=50)
对于字典问题,我正在查看: Split / Explode a column of dictionaries into separate columns with pandas
df = pd.concat([df.drop(['dictionary'], axis=1), df['dictionary'].apply(pd.Series)], axis=1)
成功了,虽然有点慢