为整个 table 编写查询

Writing a query for an entire table

我正在尝试编写一个 python 脚本,它将从 https://data.cms.gov/provider-data/dataset/g6vv-u9sr 下载数据并对数据集执行不同的操作。我在自动提取这些数据时遇到了问题,我不确定如何正确编写一个查询来 return 整个数据集(最好是 pandas 的 csv 形式)。有什么指点吗?

您可以使用requests模块下载CSV数据,例如:

import pandas as pd
from io import StringIO

r = requests.get(
    "https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv"
)

df = pd.read_csv(StringIO(r.text))
print(df.dtypes)
print(len(df))

打印:

Federal Provider Number           object
Provider Name                     object
Provider Address                  object
Provider City                     object
Provider State                    object
Provider Zip Code                  int64
Penalty Date                      object
Penalty Type                      object
Fine Amount                      float64
Payment Denial Start Date         object
Payment Denial Length in Days    float64
Location                          object
Processing Date                   object
dtype: object

27881

编辑:如@Parfait 所述,您可以直接在pd.read_csv 中使用url。但是,在这种情况下有必要显式设置 enoding= 参数(“latin1”/“iso_8859-1”有效):

df = pd.read_csv(
    "https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv",
    encoding="iso_8859-1",
)
print(len(df))

打印:

27881