Pandas DataFrame 查找 for 循环：for 循环不会停止运行

Question

我正在尝试查找特定列的值并根据该查找复制剩余的列。问题是，这个操作中的行数超过了 2000 万行。

我尝试运行代码，但它并没有停止大约 8 个小时，然后我停止了它。我的问题是：

我的算法正确吗？如果正确的话，这个不停的原因是不是运行ning是我算法效率低下造成的？

这是我的代码和 tables 来说明：

Table 1

A	B
12	abc
13	def
28	ghi
50	jkl

Table 2（查找这个table）

B	C	D
abc	4	7
def	3	3
ghi	6	2
jkl	8	1

目标结果

A	B	C	D
12	abc	4	7
13	def	3	3
28	ghi	6	2
50	jkl	8	1

因此 C 和 D 的列也将添加到 table 1，但查找 B 列的 table 2

Table1 上的这个值位于不同的 CSV 文件中，因此我也循环遍历文件夹中的文件。我在代码中将目录命名为 all_files。所以，经过查找

我的代码：

df = pd.DataFrame()

for f in all_files:
    Table1 = pd.read_csv(all_files[f])
    for j in range(len(Table1)):
        u = Table1.loc[j,'B']
            for z in range(len(Table2)):
                if u == Table2.loc[z,'B']:
                    Table1.loc[j,'C'] = Table2.loc[z,'C']
                    Table1.loc[j,'D'] = Table2.loc[z,'D']
                break

    df = pd.concat([df,Table1],axis=0)

我在最后使用那个 break 只是为了在它找到相同的值并且 Table1 连接到 df 时停止循环。这里的代码对我不起作用，不断循环，永不停止。

有人可以帮忙吗？非常感谢任何帮助！

Answer 1

我希望这是您正在寻找的解决方案：

首先，我会将 table_1 的所有 CSV 作为单个 DataFrame 连接在一起。然后我将 table_2 合并到 table_1 与 B 列的键。示例代码：

df = pd.DataFrame()
for file in all_file:
    df_tmp = pd.read_csv(file)
    df = pd.concat([df, df_tmp])
    
df_merge = pd.merge(df, table_2, on="B", how="left")

Answer 2

当我们将for-loop与Pandas一起使用时，有98%的可能性我们用错了。 Pandas 是为你不使用循环而设计的。

提高性能的解决方案：

import pandas as pd

table_1 = pd.DataFrame({'A':    [12, 13, 28, 50], 'B': ['abc', 'def', 'ghi','jkl']})

table_2 = pd.DataFrame({'B': ['abc', 'def', 'ghi','jkl'], 'C': [4, 3, 6, 8], 'D': [7, 3, 2, 1]})

# simple merge
table = pd.merge(table_1, table_2, how='inner', on='B')

# gain speed by using indexing

table_1 = table_1.set_index('B')
table_2 = table_2.set_index('B')

table = pd.merge(table_1, table_2, how='inner', left_index=True, right_index=True)

# there is also join but it’s slow than merge
table = table_1.join(table_2, on="B").reset_index()

Pandas DataFrame 查找 for 循环：for 循环不会停止运行

Pandas DataFrame lookup for loop: for loop won't stop running

python

lookup

for-loop

dataframe

pandas

Pandas DataFrame 查找 for 循环：for 循环不会停止 运行

Pandas DataFrame lookup for loop: for loop won't stop running

python

lookup

for-loop

dataframe

pandas

Pandas DataFrame 查找 for 循环：for 循环不会停止运行