为整个 table 编写查询
Writing a query for an entire table
我正在尝试编写一个 python 脚本,它将从 https://data.cms.gov/provider-data/dataset/g6vv-u9sr 下载数据并对数据集执行不同的操作。我在自动提取这些数据时遇到了问题,我不确定如何正确编写一个查询来 return 整个数据集(最好是 pandas 的 csv 形式)。有什么指点吗?
您可以使用requests
模块下载CSV数据,例如:
import pandas as pd
from io import StringIO
r = requests.get(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv"
)
df = pd.read_csv(StringIO(r.text))
print(df.dtypes)
print(len(df))
打印:
Federal Provider Number object
Provider Name object
Provider Address object
Provider City object
Provider State object
Provider Zip Code int64
Penalty Date object
Penalty Type object
Fine Amount float64
Payment Denial Start Date object
Payment Denial Length in Days float64
Location object
Processing Date object
dtype: object
27881
编辑:如@Parfait 所述,您可以直接在pd.read_csv
中使用url。但是,在这种情况下有必要显式设置 enoding=
参数(“latin1”/“iso_8859-1”有效):
df = pd.read_csv(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv",
encoding="iso_8859-1",
)
print(len(df))
打印:
27881
我正在尝试编写一个 python 脚本,它将从 https://data.cms.gov/provider-data/dataset/g6vv-u9sr 下载数据并对数据集执行不同的操作。我在自动提取这些数据时遇到了问题,我不确定如何正确编写一个查询来 return 整个数据集(最好是 pandas 的 csv 形式)。有什么指点吗?
您可以使用requests
模块下载CSV数据,例如:
import pandas as pd
from io import StringIO
r = requests.get(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv"
)
df = pd.read_csv(StringIO(r.text))
print(df.dtypes)
print(len(df))
打印:
Federal Provider Number object
Provider Name object
Provider Address object
Provider City object
Provider State object
Provider Zip Code int64
Penalty Date object
Penalty Type object
Fine Amount float64
Payment Denial Start Date object
Payment Denial Length in Days float64
Location object
Processing Date object
dtype: object
27881
编辑:如@Parfait 所述,您可以直接在pd.read_csv
中使用url。但是,在这种情况下有必要显式设置 enoding=
参数(“latin1”/“iso_8859-1”有效):
df = pd.read_csv(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv",
encoding="iso_8859-1",
)
print(len(df))
打印:
27881