工具不适用于大数据集 - 单个位置索引器超出范围
Tool doesn't work on big data set - Single positional indexer is out-of-bounds
我已经为以下事情伤脑筋了一天:
我已经构建了一个工具来迭代两个 df 以查找重复的值和求和点(如果它们是重复的):
看起来像这样:
df1 = pd.DataFrame(dict1.items())
df2 = pd.DataFrame(dict2.items())
a = 0
while a != len(df2):
value_to_compare = df2.iloc[a, 0]
b = 0
for row in range(len(df1)):
if value_to_compare == df1.iloc[b, 0]:
df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
b = b + 1
else:
b = b + 1
if b == len(df1):
df1 = df1.append(df2.iloc[a, :], ignore_index=True)
a = a + 1
df1 = df1.drop_duplicates(subset=[0], keep='first', ignore_index=True)
print('\n\n',df1)
它在来自 2 个字典的数据集上运行完美:
dict1 = {'A': 1, 'B': 1, 'C': 1, 'D': 1}
dict2 = {'a': 1, 'b': 1, 'c': 1, 'd': 1}
但是一旦我将它应用到主程序,那里有 2 个 df,有几百行(这里是一个例子):
word occurance
0 labor 4
1 predictions 2
2 nfl 2
3 kids 2
4 africa 2
5 pandemic 2
6 kara 2
7 days 2
8 swisher 2
9 event 2
10 day 2
11 football 2
12 office 2
13 us 2
14 politics 2
并使用以下命令对它们进行口述:
keys1 = words_total['word'].tolist()
values1 = words_total['occurance'].tolist()
dict1 = dict(zip(keys1, values1))
keys2 = words_date['word'].tolist()
values2 = words_date['occurance'].tolist()
dict2 = dict(zip(keys2, values2))
我收到以下错误:
Traceback (most recent call last):
File "/Users/Programowanie/PycharmProjects/pythonProject3/main.py", line 120, in <module>
df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 925, in __getitem__
return self._getitem_tuple(key)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1506, in _getitem_tuple
self._has_valid_tuple(tup)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 754, in _has_valid_tuple
self._validate_key(k, i)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1409, in _validate_key
self._validate_integer(key, axis)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
你知道为什么会这样吗?
提前谢谢你:)
没有看到 2 个数据帧,我的假设是这 2 个数据帧不包含相同数量的行,这意味着当您调用 index/row 和 .iloc
时,它将是值为 out of bounds
。例如,如果我有一个 10 行的数据框,我不能去调用索引值 15
.
处的行
合并 2 个数据框,然后按 'word'
列分组,然后对这些数据框的 occurance
值求和会不会更容易?
import pandas as pd
data1 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
'swisher','event','day','football','office','us','politics'],
'occurance':[4,2,2,2,2,2,2,2,2,2,2,2,2,2,2]}
data2 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
'swisher','event','day','us','politics'],
'occurance':[1,2,8,2,2,2,1,2,2,7,2,4,5]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Combine the 2 dataframes
combined_df = pd.concat([df1, df2])
# Groupby the word column and sum the occurance column
occurances = combined_df.groupby('word').agg({"occurance": "sum"}).reset_index()
输出:
print(occurances)
word occurance
0 africa 4
1 day 4
2 days 4
3 event 9
4 football 2
5 kara 3
6 kids 4
7 labor 5
8 nfl 10
9 office 2
10 pandemic 4
11 politics 7
12 predictions 4
13 swisher 4
14 us 6
我已经为以下事情伤脑筋了一天:
我已经构建了一个工具来迭代两个 df 以查找重复的值和求和点(如果它们是重复的):
看起来像这样:
df1 = pd.DataFrame(dict1.items())
df2 = pd.DataFrame(dict2.items())
a = 0
while a != len(df2):
value_to_compare = df2.iloc[a, 0]
b = 0
for row in range(len(df1)):
if value_to_compare == df1.iloc[b, 0]:
df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
b = b + 1
else:
b = b + 1
if b == len(df1):
df1 = df1.append(df2.iloc[a, :], ignore_index=True)
a = a + 1
df1 = df1.drop_duplicates(subset=[0], keep='first', ignore_index=True)
print('\n\n',df1)
它在来自 2 个字典的数据集上运行完美:
dict1 = {'A': 1, 'B': 1, 'C': 1, 'D': 1}
dict2 = {'a': 1, 'b': 1, 'c': 1, 'd': 1}
但是一旦我将它应用到主程序,那里有 2 个 df,有几百行(这里是一个例子):
word occurance
0 labor 4
1 predictions 2
2 nfl 2
3 kids 2
4 africa 2
5 pandemic 2
6 kara 2
7 days 2
8 swisher 2
9 event 2
10 day 2
11 football 2
12 office 2
13 us 2
14 politics 2
并使用以下命令对它们进行口述:
keys1 = words_total['word'].tolist()
values1 = words_total['occurance'].tolist()
dict1 = dict(zip(keys1, values1))
keys2 = words_date['word'].tolist()
values2 = words_date['occurance'].tolist()
dict2 = dict(zip(keys2, values2))
我收到以下错误:
Traceback (most recent call last):
File "/Users/Programowanie/PycharmProjects/pythonProject3/main.py", line 120, in <module>
df1.iloc[b, 1] = df1.iloc[b, 1] + df2.iloc[b, 1]
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 925, in __getitem__
return self._getitem_tuple(key)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1506, in _getitem_tuple
self._has_valid_tuple(tup)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 754, in _has_valid_tuple
self._validate_key(k, i)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1409, in _validate_key
self._validate_integer(key, axis)
File "/Users/Programowanie/PycharmProjects/pythonProject3/venv/lib/python3.9/site-packages/pandas/core/indexing.py", line 1500, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
你知道为什么会这样吗? 提前谢谢你:)
没有看到 2 个数据帧,我的假设是这 2 个数据帧不包含相同数量的行,这意味着当您调用 index/row 和 .iloc
时,它将是值为 out of bounds
。例如,如果我有一个 10 行的数据框,我不能去调用索引值 15
.
合并 2 个数据框,然后按 'word'
列分组,然后对这些数据框的 occurance
值求和会不会更容易?
import pandas as pd
data1 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
'swisher','event','day','football','office','us','politics'],
'occurance':[4,2,2,2,2,2,2,2,2,2,2,2,2,2,2]}
data2 = {'word':['labor','predictions','nfl','kids','africa','pandemic','kara','days',
'swisher','event','day','us','politics'],
'occurance':[1,2,8,2,2,2,1,2,2,7,2,4,5]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
# Combine the 2 dataframes
combined_df = pd.concat([df1, df2])
# Groupby the word column and sum the occurance column
occurances = combined_df.groupby('word').agg({"occurance": "sum"}).reset_index()
输出:
print(occurances)
word occurance
0 africa 4
1 day 4
2 days 4
3 event 9
4 football 2
5 kara 3
6 kids 4
7 labor 5
8 nfl 10
9 office 2
10 pandemic 4
11 politics 7
12 predictions 4
13 swisher 4
14 us 6