使用 pandas read_csv 时出现索引错误
Index mistake when using pandas read_csv
在尝试从 github 端点导入流行的 UCL bank marketing dataset 时,我 运行 遇到了一些问题。 read 语句没有正确获取 17 列的数据集。我检查了分隔符和 header 但我不确定如何更正索引。
# URL endoint
url = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'
column_names = ["age","job","marital","education","default","balance","housing","loan","contact","day","month"
,"duration","campaign","pdays","previous","poutcome", "y"]
raw_dataset = pd.read_csv(url, names=column_names,
na_values='?',sep=';'
, skipinitialspace=False, index_col=None)
相反,它给了我这样的东西:
如何使用 pandas read_csv
从 URL 正确导入数据集 (link)?
您不需要设置headers。它已经在 CSV 中带有 headers。你的看起来很奇怪的原因是因为你的 headers 列表中缺少 3 个值,这就是为什么它被 3.
偏移的原因
以下语法显示一致的结果:
raw_dataset = pd.read_csv(url, sep=";")
在尝试从 github 端点导入流行的 UCL bank marketing dataset 时,我 运行 遇到了一些问题。 read 语句没有正确获取 17 列的数据集。我检查了分隔符和 header 但我不确定如何更正索引。
# URL endoint
url = 'https://raw.githubusercontent.com/ThamuMnyulwa/bankMarketing/main/bank-additional-full.csv'
column_names = ["age","job","marital","education","default","balance","housing","loan","contact","day","month"
,"duration","campaign","pdays","previous","poutcome", "y"]
raw_dataset = pd.read_csv(url, names=column_names,
na_values='?',sep=';'
, skipinitialspace=False, index_col=None)
相反,它给了我这样的东西:
如何使用 pandas read_csv
从 URL 正确导入数据集 (link)?
您不需要设置headers。它已经在 CSV 中带有 headers。你的看起来很奇怪的原因是因为你的 headers 列表中缺少 3 个值,这就是为什么它被 3.
偏移的原因以下语法显示一致的结果:
raw_dataset = pd.read_csv(url, sep=";")