工具不适用于大数据集 - 单个位置索引器超出范围

Question

我已经为以下事情伤脑筋了一天：

我已经构建了一个工具来迭代两个 df 以查找重复的值和求和点（如果它们是重复的）：

看起来像这样：

df1 = pd.DataFrame(dict1.items())
df2 = pd.DataFrame(dict2.items())



a = 0
while a != len(df2):

    value_to_compare = df2.iloc[a, 0]

    b = 0

    for row in range(len(df1)):
        if value_to_compare == df1.iloc[b, 0]:
            df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
            b = b + 1

        else:
            b = b + 1

    if b == len(df1):
        df1 = df1.append(df2.iloc[a, :], ignore_index=True)
        a = a + 1



df1 = df1.drop_duplicates(subset=[0], keep='first', ignore_index=True)
print('\n\n',df1)

它在来自 2 个字典的数据集上运行完美：

dict1 = {'A': 1, 'B': 1, 'C': 1, 'D': 1}

dict2 = {'a': 1, 'b': 1, 'c': 1, 'd': 1}

但是一旦我将它应用到主程序，那里有 2 个 df，有几百行（这里是一个例子）：

              word  occurance
0            labor          4
1      predictions          2
2              nfl          2
3             kids          2
4           africa          2
5         pandemic          2
6             kara          2
7             days          2
8          swisher          2
9            event          2
10             day          2
11        football          2
12          office          2
13              us          2
14        politics          2

并使用以下命令对它们进行口述：

keys1 = words_total['word'].tolist()
values1 = words_total['occurance'].tolist()
dict1 = dict(zip(keys1, values1))
keys2 = words_date['word'].tolist()
values2 = words_date['occurance'].tolist()
dict2 = dict(zip(keys2, values2))

我收到以下错误：

Traceback (most recent call last):
  File "/Users/Programowanie/PycharmProjects/pythonProject3/main.py", line 120, in <module>
    df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 925, in __getitem__
    return self._getitem_tuple(key)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1506, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 754, in _has_valid_tuple
    self._validate_key(k, i)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1409, in _validate_key
    self._validate_integer(key, axis)
  File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

你知道为什么会这样吗？提前谢谢你:)

Answer 1

没有看到 2 个数据帧，我的假设是这 2 个数据帧不包含相同数量的行，这意味着当您调用 index/row 和 .iloc 时，它将是值为 out of bounds。例如，如果我有一个 10 行的数据框，我不能去调用索引值 15.

处的行

合并 2 个数据框，然后按 'word' 列分组，然后对这些数据框的 occurance 值求和会不会更容易？

import pandas as pd

data1 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
         'swisher','event','day','football','office','us','politics'],
 'occurance':[4,2,2,2,2,2,2,2,2,2,2,2,2,2,2]}

data2 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
         'swisher','event','day','us','politics'],
 'occurance':[1,2,8,2,2,2,1,2,2,7,2,4,5]}

    
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)

# Combine the 2 dataframes
combined_df = pd.concat([df1, df2])


# Groupby the word column and sum the occurance column
occurances = combined_df.groupby('word').agg({"occurance": "sum"}).reset_index()

输出：

print(occurances)
           word  occurance
0        africa          4
1           day          4
2          days          4
3         event          9
4      football          2
5          kara          3
6          kids          4
7         labor          5
8           nfl         10
9        office          2
10     pandemic          4
11     politics          7
12  predictions          4
13      swisher          4
14           us          6

工具不适用于大数据集 - 单个位置索引器超出范围

Tool doesn't work on big data set - Single positional indexer is out-of-bounds

python

dictionary

indexer