Python twint 库在 Colab 环境中不工作
Python twint library is not working in Colab environment
我正在尝试 运行 使用 Colab 中 Python 的双胞胎库(Twitter 抓取工具)的代码。
我的代码是:
!pip install twint
!pip install nest_asyncio
!pip install pandas
import twint
import nest_asyncio
nest_asyncio.apply()
import time
import pandas as pd
import os
import re
timestr = time.strftime("%Y%m%d")
c = twint.Config()
c.Limit = 1000
c.Lang = "en"
c.Store_csv = True
c.Search = "apple"
c.Output = timestr + "_en_apple.csv"
twint.run.Search(c)
以上代码在我机器上的 Jupyter 中完美运行并获取推文。但是,Colab 中的相同代码会产生以下结果:
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 1.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 8.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 27.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 64.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 125.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 216.0 secs
如何在 Colab 中解决这个问题?
我在 Google Colab 中使用了以下内容。从 requirements.txt 安装比较简单。
!git clone --depth=1 https://github.com/twintproject/twint.git
!cd /content/twint && pip3 install . -r requirements.txt
import twint
import nest_asyncio
nest_asyncio.apply()
import time
import pandas as pd
import os
import re
timestr = time.strftime("%Y%m%d")
c = twint.Config()
c.Limit = 1000
c.Lang = "en"
c.Store_csv = True
c.Search = "apple"
c.Output = timestr + "_en_apple.csv"
twint.run.Search(c)
对于构建失败的用户,请像这样编辑您的 requirements.txt;
aiohttp==3.7.0
aiogram==2.2
aiodns
beautifulsoup4
cchardet
dataclasses
elasticsearch
pysocks
pandas>=0.23.0
aiohttp_socks<=0.4.1
schedule
geopy
fake-useragent
googletransx
我正在尝试 运行 使用 Colab 中 Python 的双胞胎库(Twitter 抓取工具)的代码。
我的代码是:
!pip install twint
!pip install nest_asyncio
!pip install pandas
import twint
import nest_asyncio
nest_asyncio.apply()
import time
import pandas as pd
import os
import re
timestr = time.strftime("%Y%m%d")
c = twint.Config()
c.Limit = 1000
c.Lang = "en"
c.Store_csv = True
c.Search = "apple"
c.Output = timestr + "_en_apple.csv"
twint.run.Search(c)
以上代码在我机器上的 Jupyter 中完美运行并获取推文。但是,Colab 中的相同代码会产生以下结果:
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 1.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 8.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 27.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 64.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 125.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 216.0 secs
如何在 Colab 中解决这个问题?
我在 Google Colab 中使用了以下内容。从 requirements.txt 安装比较简单。
!git clone --depth=1 https://github.com/twintproject/twint.git
!cd /content/twint && pip3 install . -r requirements.txt
import twint
import nest_asyncio
nest_asyncio.apply()
import time
import pandas as pd
import os
import re
timestr = time.strftime("%Y%m%d")
c = twint.Config()
c.Limit = 1000
c.Lang = "en"
c.Store_csv = True
c.Search = "apple"
c.Output = timestr + "_en_apple.csv"
twint.run.Search(c)
对于构建失败的用户,请像这样编辑您的 requirements.txt;
aiohttp==3.7.0
aiogram==2.2
aiodns
beautifulsoup4
cchardet
dataclasses
elasticsearch
pysocks
pandas>=0.23.0
aiohttp_socks<=0.4.1
schedule
geopy
fake-useragent
googletransx