我正在尝试解析网站并生成正面、中性或负面情绪分析
I am trying to parse a website and generate positive, neutral, or negative sentiment analysis
我正在尝试从 CNBC 网站获取非常基本的情绪分析。我把这段代码放在一起,它工作正常。
from bs4 import BeautifulSoup
import urllib.request
from pandas import DataFrame
resp = urllib.request.urlopen("https://www.cnbc.com/finance/")
soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))
substring = 'https://www.cnbc.com/'
df = ['review']
for link in soup.find_all('a', href=True):
print(link['href'])
if (link['href'].find(substring) == 0):
# append
df.append(link['href'])
#print(link['href'])
#list(df)
# convert list to data frame
df = DataFrame(df)
#type(df)
#list(df)
# add column name
df.columns = ['review']
# clean up
df['review'] = df['review'].str.replace('\d+', '')
# Get rid of special characters
df['review'] = df['review'].str.replace(r'[^\w\s]+', '')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
df['sentiment'] = df['review'].apply(lambda x: sid.polarity_scores(x))
def convert(x):
if x < 0:
return "negative"
elif x > .2:
return "positive"
else:
return "neutral"
df['result'] = df['sentiment'].apply(lambda x:convert(x['compound']))
df['result']
当我 运行 上面的代码时,我得到了肯定和否定,但这些并没有映射到原始 'review'。我如何在每个 link 的语言旁边显示数据框中的每种情绪?谢谢!
哦,伙计,我完全失去了它!这只是一个简单的合并!!
df_final = pd.merge(df['review'], df['result'], left_index=True, right_index=True)
df_final
结果:
0 review neutral
1 https://www.cnbc.com/business/ neutral
2 https://www.cnbc.com/2020/09/15/stocks-making-... neutral
3 https://www.cnbc.com/2020/09/15/stocks-making-... neutral
4 https://www.cnbc.com/maggie-fitzgerald/ neutral
.. ... ...
90 https://www.cnbc.com/finance/ neutral
91 https://www.cnbc.com/2020/09/10/citi-ceo-micha... neutral
92 https://www.cnbc.com/central-banks/ neutral
93 https://www.cnbc.com/2020/09/10/watch-ecb-pres... neutral
94 https://www.cnbc.com/finance/?page=2 neutral
我正在尝试从 CNBC 网站获取非常基本的情绪分析。我把这段代码放在一起,它工作正常。
from bs4 import BeautifulSoup
import urllib.request
from pandas import DataFrame
resp = urllib.request.urlopen("https://www.cnbc.com/finance/")
soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset'))
substring = 'https://www.cnbc.com/'
df = ['review']
for link in soup.find_all('a', href=True):
print(link['href'])
if (link['href'].find(substring) == 0):
# append
df.append(link['href'])
#print(link['href'])
#list(df)
# convert list to data frame
df = DataFrame(df)
#type(df)
#list(df)
# add column name
df.columns = ['review']
# clean up
df['review'] = df['review'].str.replace('\d+', '')
# Get rid of special characters
df['review'] = df['review'].str.replace(r'[^\w\s]+', '')
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
df['sentiment'] = df['review'].apply(lambda x: sid.polarity_scores(x))
def convert(x):
if x < 0:
return "negative"
elif x > .2:
return "positive"
else:
return "neutral"
df['result'] = df['sentiment'].apply(lambda x:convert(x['compound']))
df['result']
当我 运行 上面的代码时,我得到了肯定和否定,但这些并没有映射到原始 'review'。我如何在每个 link 的语言旁边显示数据框中的每种情绪?谢谢!
哦,伙计,我完全失去了它!这只是一个简单的合并!!
df_final = pd.merge(df['review'], df['result'], left_index=True, right_index=True)
df_final
结果:
0 review neutral
1 https://www.cnbc.com/business/ neutral
2 https://www.cnbc.com/2020/09/15/stocks-making-... neutral
3 https://www.cnbc.com/2020/09/15/stocks-making-... neutral
4 https://www.cnbc.com/maggie-fitzgerald/ neutral
.. ... ...
90 https://www.cnbc.com/finance/ neutral
91 https://www.cnbc.com/2020/09/10/citi-ceo-micha... neutral
92 https://www.cnbc.com/central-banks/ neutral
93 https://www.cnbc.com/2020/09/10/watch-ecb-pres... neutral
94 https://www.cnbc.com/finance/?page=2 neutral