将 'filepath' 列添加到 pandas DataFrame
Add a 'filepath' column to a pandas DataFrame
我有一个大约 100 个 json 的列表,它们正在被读取、过滤并附加到 pandas 数据帧中:
import pandas as pd
import glob
dfOutput = pd.DataFrame()
for filepath in glob.iglob('/Users/vinceparis/dev/dfyb/dataset/cucumber_test/out/*.json'):
dfRead = pd.read_json(filepath, orient='columns')
dfFiltered = dfRead.filter(items=['label', 'confidence'])
dfOutput = dfOutput.append(dfFiltered)
print(dfOutput)
dfOutput = dfOutput.to_csv('/Users/vinceparis/dev/dfyb/growlog2.csv')
输出将是一个不错的单一数据帧
label confidence
0 seedling 0.33
0 cucumber 0.35
1 cotyledons 0.38
0 seedling 0.36
1 cotyledons 0.31
2 flowers 0.38
3 flowers 0.34
0 cucumber 0.48
.. ... ...
0 cotyledons 0.41
1 cotyledons 0.42
0 cucumber 0.36
0 cotyledons 0.43
1 cotyledons 0.34
0 flowers 0.36
1 flowers 0.40
如何为 'filename' 添加一列,其中将包含被倒入数据框的原始 json 的路径?
在循环中使用 concat
而不是 append
您可以通过从可迭代的组件数据帧构建组合数据帧来使用 assign
and follow the advice in the docs:
fps = glob.iglob('/Users/vinceparis/dev/dfyb/dataset/cucumber_test/out/*.json')
cols = ['label', 'confidence']
dfs = (pd.read_json(fp, orient='columns').filter(items=cols).assign(file=fp) for fp in fps)
dfOutput = pd.concat(dfs, ignore_index=True)
我有一个大约 100 个 json 的列表,它们正在被读取、过滤并附加到 pandas 数据帧中:
import pandas as pd
import glob
dfOutput = pd.DataFrame()
for filepath in glob.iglob('/Users/vinceparis/dev/dfyb/dataset/cucumber_test/out/*.json'):
dfRead = pd.read_json(filepath, orient='columns')
dfFiltered = dfRead.filter(items=['label', 'confidence'])
dfOutput = dfOutput.append(dfFiltered)
print(dfOutput)
dfOutput = dfOutput.to_csv('/Users/vinceparis/dev/dfyb/growlog2.csv')
输出将是一个不错的单一数据帧
label confidence
0 seedling 0.33
0 cucumber 0.35
1 cotyledons 0.38
0 seedling 0.36
1 cotyledons 0.31
2 flowers 0.38
3 flowers 0.34
0 cucumber 0.48
.. ... ...
0 cotyledons 0.41
1 cotyledons 0.42
0 cucumber 0.36
0 cotyledons 0.43
1 cotyledons 0.34
0 flowers 0.36
1 flowers 0.40
如何为 'filename' 添加一列,其中将包含被倒入数据框的原始 json 的路径?
在循环中使用 concat
而不是 append
您可以通过从可迭代的组件数据帧构建组合数据帧来使用 assign
and follow the advice in the docs:
fps = glob.iglob('/Users/vinceparis/dev/dfyb/dataset/cucumber_test/out/*.json')
cols = ['label', 'confidence']
dfs = (pd.read_json(fp, orient='columns').filter(items=cols).assign(file=fp) for fp in fps)
dfOutput = pd.concat(dfs, ignore_index=True)