Python DataFrame 块提取问题

Python DataFrame chunk extract issue

我想将一个数据帧分成块(例如:如果我们有 100 行,我将它们分成 20 个块)并且对于其中包含 5 个值的每个块,我需要应用 5 个更新查询(5 个不同的表)在这个分块数据上。

我如何完成这项任务,因为我是新手,在工作中学习,你能推荐一下方法吗?

for item in np.array_split(df1, 10):
 print(item) ##I was able to divide into chunks
 for i,j in item.iterrows():
   print(item.iloc[i]['ColumnName'])

我的想法是在这个打印语句之后添加更新查询行。

但是这段代码给出了一个例外。

Traceback (most recent call last):
  File "/Users/gd/Documents/myproj/test.py", line 63, in <module>
    func()
  File "/Users/gd/Documents/myproj/test.py", line 45, in dedupe_pe
    print(item.iloc[i]['ColumnName'])
  File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 931, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 1566, in _getitem_axis
    self._validate_integer(key, axis)
  File "/Users/gd/Documents/myproj/lib/python3.9/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

item.iterrows() 生成行索引和行本身,因此您可以尝试如下操作:

for item in np.array_split(df1, 10):
    print(item) ##I was able to divide into chunks
    item["sql"] = "UPDATE " + item["table_name"] + " SET column1 = '" + item["ColumnName_DATA"] + "' WHERE condition"
    for i, j in item.iterrows():
        print(j['ColumnName'])
        print(j['sql'])