pandas 帧中的 IBM 音调分析器输出具有重复值

IBM tone analyzer output in pandas frame has repeated values

我正在对 newsapi 进行情绪分析,然后进行语气分析。我能够在 pandas 框架中显示情绪分析和语气分析器的输出。问题是 IBM 音调分析器的输出具有重复值。我希望每行中的值都应该是唯一的。这是相同的代码和输出:

from ibm_watson import ToneAnalyzerV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

authenticator = IAMAuthenticator(apikey)
ta = ToneAnalyzerV3(version='2017-09-21', authenticator=authenticator)
ta.set_service_url(url)

result =[]
for i in new_df['description']:
   tone_analysis = ta.tone(
       {'text': i},
     #  'application/json'
   ).get_result()
   result.append(tone_analysis)

如果我这样做 print(result),我得到的输出为:[{'document_tone': {'tones': [{'score': 0.677676, 'tone_id': 'analytical', 'tone_name': 'Analytical'}]}}。像这样有很多值。

如果我只输入 result,我会得到类似的输出,但格式不同,如下所示:

使用 resultprint(result)

似乎有些问题

接下来我尝试使用以下代码将值放入 pandas 框架中:

def f(x):
    x = ta.tone({'text': i}).get_result()['document_tone']['tones']
    return pd.Series(x[0])

new_df = new_df.join(new_df['description'].apply(f))

最后三个特征重复输出,即“score”、“tone_id”、“tone-name”,这就是问题所在。此外,重复的值是使用 print(result) 获得的最后一个值。输出的屏幕截图如下:

每行有多个字典列表,因此对于带有数字后缀的新列名称,通过 enumerate 展平列表理解来更改答案:

#change f(x) to f(i)
def f(i):
    x = ta.tone({'text': i}).get_result()['document_tone']['tones']
    return pd.Series({f'{k}_{i}': v for i, y in enumerate(x) 
                      for k, v in y.items()}, dtype=object)

new_df = new_df['description'].apply(f)
print (new_df)
     score_0   tone_id_0 tone_name_0   score_1   tone_id_1 tone_name_1
0   0.677676  analytical  Analytical       NaN         NaN         NaN
1   0.620279  analytical  Analytical       NaN         NaN         NaN
2   0.683108     sadness     Sadness       NaN         NaN         NaN
3   0.920855  analytical  Analytical       NaN         NaN         NaN
4   0.825035   confident   Confident       NaN         NaN         NaN
5   0.632229         joy         Joy  0.527569   tentative   Tentative
6        NaN         NaN         NaN       NaN         NaN         NaN
7   0.574650     sadness     Sadness       NaN         NaN         NaN
8        NaN         NaN         NaN       NaN         NaN         NaN
9   0.751512   confident   Confident       NaN         NaN         NaN
10  0.618451   confident   Confident       NaN         NaN         NaN
11  0.672469  analytical  Analytical  0.912588   confident   Confident
12  0.764412   tentative   Tentative  0.840583  analytical  Analytical
13  0.660207   confident   Confident       NaN         NaN         NaN
14  0.840583  analytical  Analytical  0.764412   tentative   Tentative
15  0.786991   tentative   Tentative       NaN         NaN         NaN
16  0.753348     sadness     Sadness       NaN         NaN         NaN
17  0.672469  analytical  Analytical  0.912588   confident   Confident
18  0.590326     sadness     Sadness  0.877080   tentative   Tentative
19  0.560098  analytical  Analytical       NaN         NaN         NaN

要添加到原始:

new_df = new_df.join(new_df['description'].apply(f))