无法导出到“.csv”文件 - pandas.DataFrame

Cannot export to ".csv" file - pandas.DataFrame

我想就我的 Google Colaboratory Notebook 寻求帮助。错误位于 第四个单元格中。

上下文:
我们正在执行 Web 抓取 BTC 的历史数据。

这是我的代码:

第一个单元格 (执行成功)

#importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd

第二个单元格 (执行成功)

#sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
#request the page
page = requests.get(url)
#creating a soup object and the parser
soup = BeautifulSoup(page.text, 'lxml')

#creating a table body to pass on the soup to find the table
table_body = soup.find('table')
#creating an empty list to store information
row_data = []

#creating a table 
for row in table_body.find_all('tr'):
  col = row.find_all('td')
  col = [ele.text.strip() for ele in col ] # stripping the whitespaces
  row_data.append(col) #append the column

# extracting all data on table entries
df = pd.DataFrame(row_data)
df

第三格(执行成功)

headers = []
for i in soup.find_all('th'):
  col_name = i.text.strip().lower().replace(" ", "_")
  headers.append(col_name)
headers

第四格(执行失败)

df = pd.DataFrame(row_data, columns=headers)
df
#into a file 
df.to_csv('/content/file.csv')

错误! :(

AssertionError                            Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    563     try:
--> 564         columns = _validate_or_indexify_columns(content, columns)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
AssertionError: 13 columns passed, passed data had 7 columns

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/internals/construction.py in _list_to_arrays(data, columns, coerce_float, dtype)
    565         result = _convert_object_array(content, dtype=dtype, coerce_float=coerce_float)
    566     except AssertionError as e:
--> 567         raise ValueError(e) from e
    568     return result, columns
    569 

ValueError: 13 columns passed, passed data had 7 columns

要加载 table,您可以使用简单的 pd.read_html()。例如:

import pandas as pd

url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"

df = pd.read_html(url)[0]
print(df)
df.to_csv("data.csv")

创建 data.csv(来自 LibreOffice 的屏幕截图):


要更正您的示例:

# importing libaries
from bs4 import BeautifulSoup
import requests
import pandas as pd

# sample url
url = "https://www.bitrates.com/coin/BTC/historical-data/USD?period=allData&limit=500"
# request the page
page = requests.get(url)
# creating a soup object and the parser
soup = BeautifulSoup(page.text, "lxml")

# creating a table body to pass on the soup to find the table
table_body = soup.find("table")
# creating an empty list to store information
row_data = []

# creating a table
for row in table_body.select("tr:has(td)"):
    col = row.find_all("td")
    col = [ele.text.strip() for ele in col]  # stripping the whitespaces
    row_data.append(col)  # append the column

# extracting all data on table entries
df = pd.DataFrame(row_data)

headers = []
for i in table_body.select("th"):
    col_name = i.text.strip().lower().replace(" ", "_")
    headers.append(col_name)

df = pd.DataFrame(row_data, columns=headers)
print(df)
df.to_csv("/content/file.csv")
import pandas as pd


df = pd.read_json(
    'https://www.bitrates.com/api/node/v1/symbols/USDTUSD/bitrates/series?aggregate=3&period=lastMonth').T['series'].to_dict()['data']
print(pd.DataFrame(df))

输出:

                        date      open     close  ...        supply  market_volume24  btc_ratio
0   2021-04-11T06:00:00.000Z  0.999212  0.999114  ...  4.584629e+10     3.146109e+08   0.000016    
1   2021-04-12T00:00:00.000Z  0.999114  0.999317  ...  4.584629e+10     2.100706e+09   0.000016    
2   2021-06-04T18:00:00.000Z  0.999317  1.000613  ...  6.447629e+10     7.298208e+08   0.000025    
3   2021-06-05T12:00:00.000Z  1.000613  1.000328  ...  0.000000e+00     6.502947e+09   0.000025    
4   2021-06-06T06:00:00.000Z  1.000328  1.000499  ...  6.447629e+10     6.649574e+08   0.000025    
5   2021-06-07T00:00:00.000Z  1.000499  1.000408  ...  6.447629e+10     8.272473e+09   0.000025    
6   2021-06-07T18:00:00.000Z  1.000408  1.000338  ...  6.447629e+10     1.090599e+09   0.000025    
7   2021-06-08T12:00:00.000Z  1.000338  1.000840  ...  6.447177e+10     2.196249e+09   0.000028    
8   2021-06-09T06:00:00.000Z  1.000840  1.001088  ...  0.000000e+00     1.080053e+10   0.000028    
9   2021-06-10T00:00:00.000Z  1.001088  1.000618  ...  6.447177e+10     4.158914e+09   0.000026    
10  2021-06-10T18:00:00.000Z  1.000618  1.000436  ...  6.447177e+10     6.713012e+08   0.000026    
11  2021-06-11T12:00:00.000Z  1.000436  1.000234  ...  6.447177e+10     4.093096e+09   0.000025    
12  2021-06-12T06:00:00.000Z  1.000234  1.000385  ...  6.447177e+10     5.042653e+09   0.000026    
13  2021-06-13T00:00:00.000Z  1.000385  1.000302  ...  0.000000e+00     5.502808e+09   0.000026    
14  2021-06-13T18:00:00.000Z  1.000302  1.000110  ...  6.447177e+10     1.008952e+10   0.000024    
15  2021-06-14T12:00:00.000Z  1.000110  1.000309  ...  6.447177e+10     7.405940e+09   0.000024    
16  2021-06-15T06:00:00.000Z  1.000309  1.000205  ...  6.447177e+10     4.256491e+09   0.000023    
17  2021-06-16T00:00:00.000Z  1.000205  1.000104  ...  0.000000e+00     1.495518e+09   0.000023    
18  2021-06-16T18:00:00.000Z  1.000104  0.999833  ...  0.000000e+00     3.033091e+09   0.000024    
19  2021-06-17T12:00:00.000Z  0.999833  1.000016  ...  6.447177e+10     1.449031e+08   0.000024    
20  2021-07-10T00:00:00.000Z  1.000016  1.000100  ...  6.446977e+10     7.586923e+08   0.000025    
21  2021-07-10T18:00:00.000Z  1.000100  1.000199  ...  6.446977e+10     2.312489e+09   0.000025    
22  2021-07-11T12:00:00.000Z  1.000199  1.000134  ...  6.446977e+10     2.236517e+09   0.000024    
23  2021-07-12T06:00:00.000Z  1.000134  1.000192  ...  6.446977e+10     8.140557e+09   0.000024    
24  2021-07-13T00:00:00.000Z  1.000192  1.000290  ...  6.446977e+10     3.846952e+09   0.000026    
25  2021-07-13T18:00:00.000Z  1.000290  1.000411  ...  6.446977e+10     1.278604e+09   0.000026    
26  2021-07-14T12:00:00.000Z  1.000411  1.000315  ...  6.446977e+10     3.279535e+09   0.000026    
27  2021-07-15T06:00:00.000Z  1.000315  1.000142  ...  6.446977e+10     8.086642e+08   0.000026    
28  2021-07-16T00:00:00.000Z  1.000142  1.000295  ...  6.446977e+10     1.187211e+09   0.000027    
29  2021-07-16T18:00:00.000Z  1.000295  1.000610  ...  6.446977e+10     7.721854e+08   0.000027    
30  2021-07-17T12:00:00.000Z  1.000610  1.000535  ...  6.446977e+10     4.535049e+09   0.000027    
31  2021-07-18T06:00:00.000Z  1.000535  1.000610  ...  6.446977e+10     2.345491e+09   0.000026    
32  2021-07-19T00:00:00.000Z  1.000610  1.000386  ...  6.446977e+10     4.725531e+09   0.000027    
33  2021-07-19T18:00:00.000Z  1.000386  1.000215  ...  6.446977e+10     3.314499e+09   0.000028    
34  2021-07-20T12:00:00.000Z  1.000215  1.000324  ...  6.446977e+10     5.315525e+09   0.000030    
35  2021-07-21T06:00:00.000Z  1.000324  1.000277  ...  6.446977e+10     7.141479e+09   0.000028    
36  2021-07-22T00:00:00.000Z  1.000277  1.000255  ...  6.446977e+10     2.533840e+09   0.000028    
37  2021-07-22T18:00:00.000Z  1.000255  1.000325  ...  6.446977e+10     2.699050e+09   0.000027    
38  2021-07-23T12:00:00.000Z  1.000325  1.000363  ...  6.446977e+10     2.681340e+09   0.000026    
39  2021-07-24T06:00:00.000Z  1.000363  1.000644  ...  6.446974e+10     6.241232e+08   0.000026    

[40 rows x 10 columns]