pandas 的 for 循环在 jupyter 中处理大数据时显示错误

Question

我在 Windows 上使用 Jupyter notebook。

我正在尝试为标准普尔 500 指数的所有股票填充数据。我创建了一个 pandas 数据框并为每只股票填充：代码、价格和市值。

代码如下：

my_columns = ['Ticker', 'Stock Price', 'Market Capitalization']
final_dataframe = pd.DataFrame(columns = my_columns)
for stock in stocks['Ticker'][:5]:
    api_url = f'https://sandbox.iexapis.com/stable/stock/{stock}/quote/?token={IEX_CLOUD_API_TOKEN}'
    data = requests.get(api_url).json()
    final_dataframe = final_dataframe.append(
        pd.Series(
        [
            stock,
            data['latestPrice'],
            data['marketCap'],
        ],
        index = my_columns),
    ignore_index = True
    )

这将在我查看 final_dataframe 时显示前 5 只股票。

但是，如果我想通过删除“[:5]”（在代码的第 3 行）来查看所有股票，我会收到错误消息。

我再次测试了它以查看前 50 只股票“[:50]”，它运行良好。

我在前 500 只股票“[:500]”上测试了它，但出现错误。

所以我想这可能与数据大小有关？

可选信息：我正在学习一门课程，其中讲师只是从代码中删除 [:5] 以查看所有股票数据并且也成功了。就我而言，情况并非如此。

这里是错误：

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
Input In [45], in <module>
      3 for stock in stocks['Ticker']:
      4     api_url = f'https://sandbox.iexapis.com/stable/stock/{stock}/quote/?token={IEX_CLOUD_API_TOKEN}'
----> 5     data = requests.get(api_url).json()
      6     final_dataframe = final_dataframe.append(
      7         pd.Series(
      8         [
   (...)
     14     ignore_index = True
     15     )

File F:\Projects\algorithmic-trading-python\venv\lib\site-packages\requests\models.py:888, in Response.json(self, **kwargs)
    886 if encoding is not None:
    887     try:
--> 888         return complexjson.loads(
    889             self.content.decode(encoding), **kwargs
    890         )
    891     except UnicodeDecodeError:
    892         # Wrong UTF codec detected; usually because it's not UTF-8
    893         # but some other 8-bit codec.  This is an RFC violation,
    894         # and the server didn't bother to tell us what codec *was*
    895         # used.
    896         pass

File ~\AppData\Local\Programs\Python\Python38\lib\json\__init__.py:357, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352     del kw['encoding']
    354 if (cls is None and object_hook is None and
    355         parse_int is None and parse_float is None and
    356         parse_constant is None and object_pairs_hook is None and not kw):
--> 357     return _default_decoder.decode(s)
    358 if cls is None:
    359     cls = JSONDecoder

File ~\AppData\Local\Programs\Python\Python38\lib\json\decoder.py:337, in JSONDecoder.decode(self, s, _w)
    332 def decode(self, s, _w=WHITESPACE.match):
    333     """Return the Python representation of ``s`` (a ``str`` instance
    334     containing a JSON document).
    335 
    336     """
--> 337     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338     end = _w(s, end).end()
    339     if end != len(s):

File ~\AppData\Local\Programs\Python\Python38\lib\json\decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    353     obj, end = self.scan_once(s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

接下来我可以尝试什么？

Answer 1

requests.get(api_url) 没有为第 50 行和第 500 行之间某处的令牌获取任何内容。在调用 json 方法之前，您可以先放入一个 if 条件来检查它是否不是 None：

my_columns = ['Ticker', 'Stock Price', 'Market Capitalization']
lst = []
for stock in stocks['Ticker']:
    api_url = f'https://sandbox.iexapis.com/stable/stock/{stock}/quote/?token={IEX_CLOUD_API_TOKEN}'
    r = requests.get(api_url)
    # proceed only if r is not None
    if r:    
        data = r.json()
        lst.append([stock, data['latestPrice'], data['marketCap']])
final_dataframe = pd.DataFrame(lst, columns=my_columns)

pandas 的 for 循环在 jupyter 中处理大数据时显示错误

for loop with pandas shows error when dealing with large data in jupyter

python

dataframe

pandas

jupyter