如何从流中解码ascii进行分析
How to decode ascii from stream for analysis
我正在尝试通过来自 textblob 库的情感分析 运行 来自 Twitter api 的文本,当我 运行 我的代码时,代码打印一两个情感值然后出现错误出,出现如下错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 31: ordinal not in range(128)
我不明白为什么代码只分析文本时要处理这个问题。我试图将脚本编码为 UTF-8。这是代码:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
import sys
import csv
from textblob import TextBlob
# Variables that contains the user credentials to access Twitter API
access_token = ""
access_token_secret = ""
consumer_key = ""
consumer_secret = ""
# This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
json_load = json.loads(data)
texts = json_load['text']
coded = texts.encode('utf-8')
s = str(coded)
content = s.decode('utf-8')
#print(s[2:-1])
wiki = TextBlob(s[2:-1])
r = wiki.sentiment.polarity
print r
return True
def on_error(self, status):
print(status)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, StdOutListener())
# This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['dollar', 'euro' ], languages=['en'])
有人可以帮我解决这个问题吗?
提前谢谢你。
你把太多东西混在一起了。如错误所述,您正在尝试解码字节类型。
json.loads
将生成字符串形式的数据,您需要对其进行编码。
texts = json_load['text'] # string
coded = texts.encode('utf-8') # byte
print(coded[2:-1])
因此,在您的脚本中,当您尝试解码 coded
时,您遇到了关于解码 byte
数据的错误。
我正在尝试通过来自 textblob 库的情感分析 运行 来自 Twitter api 的文本,当我 运行 我的代码时,代码打印一两个情感值然后出现错误出,出现如下错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 31: ordinal not in range(128)
我不明白为什么代码只分析文本时要处理这个问题。我试图将脚本编码为 UTF-8。这是代码:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import json
import sys
import csv
from textblob import TextBlob
# Variables that contains the user credentials to access Twitter API
access_token = ""
access_token_secret = ""
consumer_key = ""
consumer_secret = ""
# This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
json_load = json.loads(data)
texts = json_load['text']
coded = texts.encode('utf-8')
s = str(coded)
content = s.decode('utf-8')
#print(s[2:-1])
wiki = TextBlob(s[2:-1])
r = wiki.sentiment.polarity
print r
return True
def on_error(self, status):
print(status)
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, StdOutListener())
# This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['dollar', 'euro' ], languages=['en'])
有人可以帮我解决这个问题吗?
提前谢谢你。
你把太多东西混在一起了。如错误所述,您正在尝试解码字节类型。
json.loads
将生成字符串形式的数据,您需要对其进行编码。
texts = json_load['text'] # string
coded = texts.encode('utf-8') # byte
print(coded[2:-1])
因此,在您的脚本中,当您尝试解码 coded
时,您遇到了关于解码 byte
数据的错误。