pandas 帧中的 IBM 音调分析器输出具有重复值
IBM tone analyzer output in pandas frame has repeated values
我正在对 newsapi 进行情绪分析,然后进行语气分析。我能够在 pandas 框架中显示情绪分析和语气分析器的输出。问题是 IBM 音调分析器的输出具有重复值。我希望每行中的值都应该是唯一的。这是相同的代码和输出:
from ibm_watson import ToneAnalyzerV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator(apikey)
ta = ToneAnalyzerV3(version='2017-09-21', authenticator=authenticator)
ta.set_service_url(url)
result =[]
for i in new_df['description']:
tone_analysis = ta.tone(
{'text': i},
# 'application/json'
).get_result()
result.append(tone_analysis)
如果我这样做 print(result)
,我得到的输出为:[{'document_tone': {'tones': [{'score': 0.677676, 'tone_id': 'analytical', 'tone_name': 'Analytical'}]}}
。像这样有很多值。
如果我只输入 result
,我会得到类似的输出,但格式不同,如下所示:
使用 result
和 print(result)
似乎有些问题
接下来我尝试使用以下代码将值放入 pandas 框架中:
def f(x):
x = ta.tone({'text': i}).get_result()['document_tone']['tones']
return pd.Series(x[0])
new_df = new_df.join(new_df['description'].apply(f))
最后三个特征重复输出,即“score”、“tone_id”、“tone-name”,这就是问题所在。此外,重复的值是使用 print(result)
获得的最后一个值。输出的屏幕截图如下:
每行有多个字典列表,因此对于带有数字后缀的新列名称,通过 enumerate
展平列表理解来更改答案:
#change f(x) to f(i)
def f(i):
x = ta.tone({'text': i}).get_result()['document_tone']['tones']
return pd.Series({f'{k}_{i}': v for i, y in enumerate(x)
for k, v in y.items()}, dtype=object)
new_df = new_df['description'].apply(f)
print (new_df)
score_0 tone_id_0 tone_name_0 score_1 tone_id_1 tone_name_1
0 0.677676 analytical Analytical NaN NaN NaN
1 0.620279 analytical Analytical NaN NaN NaN
2 0.683108 sadness Sadness NaN NaN NaN
3 0.920855 analytical Analytical NaN NaN NaN
4 0.825035 confident Confident NaN NaN NaN
5 0.632229 joy Joy 0.527569 tentative Tentative
6 NaN NaN NaN NaN NaN NaN
7 0.574650 sadness Sadness NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 0.751512 confident Confident NaN NaN NaN
10 0.618451 confident Confident NaN NaN NaN
11 0.672469 analytical Analytical 0.912588 confident Confident
12 0.764412 tentative Tentative 0.840583 analytical Analytical
13 0.660207 confident Confident NaN NaN NaN
14 0.840583 analytical Analytical 0.764412 tentative Tentative
15 0.786991 tentative Tentative NaN NaN NaN
16 0.753348 sadness Sadness NaN NaN NaN
17 0.672469 analytical Analytical 0.912588 confident Confident
18 0.590326 sadness Sadness 0.877080 tentative Tentative
19 0.560098 analytical Analytical NaN NaN NaN
要添加到原始:
new_df = new_df.join(new_df['description'].apply(f))
我正在对 newsapi 进行情绪分析,然后进行语气分析。我能够在 pandas 框架中显示情绪分析和语气分析器的输出。问题是 IBM 音调分析器的输出具有重复值。我希望每行中的值都应该是唯一的。这是相同的代码和输出:
from ibm_watson import ToneAnalyzerV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
authenticator = IAMAuthenticator(apikey)
ta = ToneAnalyzerV3(version='2017-09-21', authenticator=authenticator)
ta.set_service_url(url)
result =[]
for i in new_df['description']:
tone_analysis = ta.tone(
{'text': i},
# 'application/json'
).get_result()
result.append(tone_analysis)
如果我这样做 print(result)
,我得到的输出为:[{'document_tone': {'tones': [{'score': 0.677676, 'tone_id': 'analytical', 'tone_name': 'Analytical'}]}}
。像这样有很多值。
如果我只输入 result
,我会得到类似的输出,但格式不同,如下所示:
使用 result
和 print(result)
接下来我尝试使用以下代码将值放入 pandas 框架中:
def f(x):
x = ta.tone({'text': i}).get_result()['document_tone']['tones']
return pd.Series(x[0])
new_df = new_df.join(new_df['description'].apply(f))
最后三个特征重复输出,即“score”、“tone_id”、“tone-name”,这就是问题所在。此外,重复的值是使用 print(result)
获得的最后一个值。输出的屏幕截图如下:
每行有多个字典列表,因此对于带有数字后缀的新列名称,通过 enumerate
展平列表理解来更改答案:
#change f(x) to f(i)
def f(i):
x = ta.tone({'text': i}).get_result()['document_tone']['tones']
return pd.Series({f'{k}_{i}': v for i, y in enumerate(x)
for k, v in y.items()}, dtype=object)
new_df = new_df['description'].apply(f)
print (new_df)
score_0 tone_id_0 tone_name_0 score_1 tone_id_1 tone_name_1
0 0.677676 analytical Analytical NaN NaN NaN
1 0.620279 analytical Analytical NaN NaN NaN
2 0.683108 sadness Sadness NaN NaN NaN
3 0.920855 analytical Analytical NaN NaN NaN
4 0.825035 confident Confident NaN NaN NaN
5 0.632229 joy Joy 0.527569 tentative Tentative
6 NaN NaN NaN NaN NaN NaN
7 0.574650 sadness Sadness NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 0.751512 confident Confident NaN NaN NaN
10 0.618451 confident Confident NaN NaN NaN
11 0.672469 analytical Analytical 0.912588 confident Confident
12 0.764412 tentative Tentative 0.840583 analytical Analytical
13 0.660207 confident Confident NaN NaN NaN
14 0.840583 analytical Analytical 0.764412 tentative Tentative
15 0.786991 tentative Tentative NaN NaN NaN
16 0.753348 sadness Sadness NaN NaN NaN
17 0.672469 analytical Analytical 0.912588 confident Confident
18 0.590326 sadness Sadness 0.877080 tentative Tentative
19 0.560098 analytical Analytical NaN NaN NaN
要添加到原始:
new_df = new_df.join(new_df['description'].apply(f))