使用 Tweepy 仅提取 'text' 和 'language' 字段（如果存在于家庭作业中）

Question

我正在做数据科学的家庭作业 class，我不明白我在编辑 on_data 方法时遇到的错误。 "TypeError: string indices must be integers"

现在一切正常，让我们过滤数据。这部分其实很简单。更改 ListenerParser 的 on_data 方法以仅提取 'text' 和 'language' 字段（如果存在）。我们还希望能够检索一组结果，所以我设置了一个 max_results 参数在构造函数中。在 on_data 的编辑中使用它来使对象只检索到 max_results 许多结果。

这是一个监听器，它将提取我们感兴趣的数据并打印到标准输出

class ListenerParser(StreamListener):

    def __init__(self, max_results): 
        super(ListenerParser, self).__init__()
        self.texts = []
        self.langs = []
        if max_results:
            self.max_results = max_results
        else: 
            self.max_results = float("inf")

    ####This is the code I am responsible for as part of my homework###
    def on_data(self, data):
        if data['text'] and len(self.texts) < max_results:
            self.texts.append(data['text'])
        if data['lang']and len(self.langs) < max_results:
            self.langs.append(data['lang'])

    def on_error(self, status):
        print status

现在我们来获取一些数据吧！从 10 个结果开始进行测试。测试完成后，增加到 10,000

l = ListenerParser(max_results=10) 

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

stream = Stream(auth, l)
stream.sample()

我的新密码是：

import json
class ListenerParser(StreamListener):

    def __init__(self, max_results): 
        super(ListenerParser, self).__init__()
        self.texts = []
        self.langs = []
        if max_results:
            self.max_results = max_results
        else: 
            self.max_results = float("inf")


    def on_data(self, data):
        jd = json.loads(data)
        if len(self.texts)<self.max_results:
            if jd.has_key('text'):
                self.texts.append(jd['text'].encode('utf-8'))
        if len(self.langs)<self.max_results:
            if jd.has_key('lang'):
                self.langs.append(jd['lang'])


    def on_error(self, status):
        print status


# Now let's get some data! 
# start with 10 results for testing. 
# once testing is done, increase to 10,000
l = ListenerParser(max_results=10) 

auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

stream = Stream(auth, l)
stream.filter()

它返回了十个“406”并且在 ipython 笔记本中仍然是运行。为什么还是运行而 406 是错误还是状态码？

Answer 1

class ListenerParser(StreamListener):

    def __init__(self, max_results): 
        super(ListenerParser, self).__init__()
        self.texts = []
        self.langs = []
        self.count = 0
        if max_results:
            self.max_results = max_results
        else: 
            self.max_results = float("inf")

    ####This is the code I am responsible for as part of my homework###
    def on_data(self, data):
        data_refined = json.loads(data)
        if self.count<self.max_results:
        if data.has_key['text']:
            self.texts.append(data['text'].encode('utf-8'))
        if data.has_key['lang']:
            self.langs.append(data['lang'])
        self.count+=1

    def on_error(self, status):
        print status

我认为这对你有用吗？我还没有在我的机器上尝试过这段代码，所以如果有的话，请自行修复一些小错误。有任何疑问在评论中提出，随时为您提供帮助:)

使用 Tweepy 仅提取 'text' 和 'language' 字段（如果存在于家庭作业中）

Using Tweepy to extractonly the 'text' and 'language' fields, if present in homework

python

api

twitter

tweepy