如果文件被压缩 (.zip),则使用 python 从 UCI 数据集中在线提取数据

Extracting data from a UCI dataset Online using python if the file is compressed(.zip)

我想使用网络抓取从文件中获取数据 https://archive.ics.uci.edu/ml/machine-learning-databases/00380/YouTube-Spam-Collection-v1.zip

如何在 python 中使用 requests

您可以使用此示例如何使用 requests 和 built-in zipfile 模块加载 zip 文件:

import requests
from io import BytesIO
from zipfile import ZipFile


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00380/YouTube-Spam-Collection-v1.zip"

with ZipFile(BytesIO(requests.get(url).content), "r") as myzip:
    # print content of zip:
    # print(myzip.namelist())

    # print content of one of the file:
    with myzip.open("Youtube01-Psy.csv", "r") as f_in:
        print(f_in.read())

打印:

b'COMMENT_ID,AUTHOR,DATE,CONTENT,CLASS\n

...