将列表格式化为 CSV 的列

Formatting List to Columns for CSV

我正在尝试将计算列表用作现有 csv 中的 2 个附加列,但是我很难将它们准备为 2 个列。

MWE:

import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification, pipeline

df = pd.read_csv('original.csv')
dtype_before = type(df["text"])
text_list = df["text"].tolist()
tokenizer = BertTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")
model = BertForSequenceClassification.from_pretrained("daigo/bert-base-japanese-sentiment")
sentiment_analyzer = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)
list(map(sentiment_analyzer, text_list))

打印列表将导致:

[[{'label': 'ポジティブ', 'score': 0.7804045081138611}], [{'label': 'ポジティブ', 'score': 0.9542087912559509}], [{'label': 'ポジティブ', 'score': 0.8557115793228149}], [{'label': 'ポジティブ', 'score': 0.9135494232177734}], [{'label': 'ポジティブ', 'score': 0.86244797706604}], [{'label': 'ネガティブ', 'score': 0.8266600370407104}], [{'label': 'ポジティブ', 'score': 0.9198371767997742}], [{'label': 'ポジティブ', 'score': 0.9033421874046326}], [{'label': 'ポジティブ', 'score': 0.7705154418945312}], [{'label': 'ポジティブ', 'score': 0.8205435872077942}], [{'label': 'ポジティブ', 'score': 0.8045720458030701}], [{'label': 'ネガティブ', 'score': 0.5160148739814758}], [{'label': 'ポジティブ', 'score': 0.8745550513267517}], [{'label': 'ポジティブ', 'score': 0.941367506980896}], [{'label': 'ポジティブ', 'score': 0.899341344833374}], [{'label': 'ポジティブ', 'score': 0.9200822710990906}], [{'label': 'ポジティブ', 'score': 0.6254457235336304}], [{'label': 'ポジティブ', 'score': 0.8494048714637756}], [{'label': 'ポジティブ', 'score': 0.6723847389221191}], [{'label': 'ポジティブ', 'score': 0.9329613447189331}], [{'label': 'ポジティブ', 'score': 0.9084392786026001}], [{'label': 'ポジティブ', 'score': 0.7804917693138123}], [{'label': 'ポジティブ', 'score': 0.6737139225006104}], [{'label': 'ネガティブ', 'score': 0.5254362225532532}], [{'label': 'ネガティブ', 'score': 0.7653219103813171}], [{'label': 'ネガティブ', 'score': 0.7342881560325623}], [{'label': 'ポジティブ', 'score': 0.8476402163505554}]]

我想实现,将 'label' 作为一列 header 并将 'score' 作为第二列 header,这样最后的 2 列看起来有点像像这样:

label         column
ポジティブ      0.7804045081138611
ポジティブ      0.9542087912559509
ポジティブ      0.8557115793228149
...
ネガティブ      0.5160148739814758

我想一旦我实现了这一点,我就可以使用 pandas 将这些列添加到 csv 中,对吗?所以添加:

import csv
import re
import pandas as pd
from transformers import BertTokenizer, BertForSequenceClassification, pipeline
 df = pd.read_csv('original.csv')
    dtype_before = type(df["text"])
    text_list = df["text"].tolist()
    tokenizer = BertTokenizer.from_pretrained("daigo/bert-base-japanese-sentiment")
    model = BertForSequenceClassification.from_pretrained("daigo/bert-base-japanese-sentiment")
    sentiment_analyzer = pipeline("sentiment-analysis",model=model, tokenizer=tokenizer)
    list(map(sentiment_analyzer, text_list))
<some magic to prepare results to a proper list>

df['label','score'] = <some magic to prepare results to a proper list>
df.to_csv("filepath.csv", index=False) 

可以试试吗;

label_score_flat = [dc for ls in map(sentiment_analyzer, text_list) for dc in ls]

df['label'] = [dc['label'] for dc in label_score_flat ]
df['score'] = [dc['score'] for dc in label_score_flat ]

 
df.to_csv("filepath.csv", index=False) 

我还没有测试过所以可能有错误

请试试这个,如果有帮助的话:

import pandas as pd

x = [[{'label': 'ポジティブ', 'score': 0.7804045081138611}], [{'label': 'ポジティブ', 'score': 0.9542087912559509}], [{'label': 'ポジティブ', 'score': 0.8557115793228149}],
     [{'label': 'ポジティブ', 'score': 0.9135494232177734}], [{'label': 'ポジティブ', 'score': 0.86244797706604}], [{'label': 'ネガティブ', 'score': 0.8266600370407104}],
     [{'label': 'ポジティブ', 'score': 0.9198371767997742}], [{'label': 'ポジティブ', 'score': 0.9033421874046326}], [{'label': 'ポジティブ', 'score': 0.7705154418945312}],
     [{'label': 'ポジティブ', 'score': 0.8205435872077942}], [{'label': 'ポジティブ', 'score': 0.8045720458030701}], [{'label': 'ネガティブ', 'score': 0.5160148739814758}],
     [{'label': 'ポジティブ', 'score': 0.8745550513267517}], [{'label': 'ポジティブ', 'score': 0.941367506980896}], [{'label': 'ポジティブ', 'score': 0.899341344833374}],
     [{'label': 'ポジティブ', 'score': 0.9200822710990906}], [{'label': 'ポジティブ', 'score': 0.6254457235336304}], [{'label': 'ポジティブ', 'score': 0.8494048714637756}],
     [{'label': 'ポジティブ', 'score': 0.6723847389221191}], [{'label': 'ポジティブ', 'score': 0.9329613447189331}], [{'label': 'ポジティブ', 'score': 0.9084392786026001}],
     [{'label': 'ポジティブ', 'score': 0.7804917693138123}], [{'label': 'ポジティブ', 'score': 0.6737139225006104}], [{'label': 'ネガティブ', 'score': 0.5254362225532532}],
     [{'label': 'ネガティブ', 'score': 0.7653219103813171}], [{'label': 'ネガティブ', 'score': 0.7342881560325623}], [{'label': 'ポジティブ', 'score': 0.8476402163505554}]]

_label_vals = [_v for _ in x for _k, _v in _[0].items() if _k == 'label']
_score_vals = [_v for _ in x for _k, _v in _[0].items() if _k == 'score']

df1 = pd.DataFrame(list(zip(_label_vals, _score_vals)))
df1.columns = ['label', 'score']
print(df1)
df1.to_csv('Whosebug.csv', index=False)