如何在 pandas 数据框中搜索字符串并与另一个匹配?
How to search for a string in pandas dataframe and match with another?
我正在尝试比较 2 个不同的 pandas' 数据框(A 和 B)的 2 列(字符串),如果它们匹配一段字符串,我想分配一个值数据框 A 到数据框 B 中的列。
这是我的代码:
import numpy as np
import pandas as pd
A = ['DF-PI-05', 'DF-PI-09', 'DF-PI-10', 'DF-PI-15', 'DF-PI-16',
'DF-PI-19', 'DF-PI-89', 'DF-PI-92', 'DF-PI-93', 'DF-PI-94',
'DF-PI-95', 'DF-PI-96', 'DF-PI-25', 'DF-PI-29', 'DF-PI-30',
'DF-PI-34', 'DF-PI-84']
B = ['PI-05', 'PI-10', 'PI-89', 'PI-90', 'PI-93', 'PI-94', 'PI-95',
'PI-96', 'PI-09', 'PI-15', 'PI-16', 'PI-19', 'PI-91A', 'PI-91b',
'PI-92', 'PI-25-CU', 'PI-29', 'PI-30', 'PI-34', 'PI-84-CU-S1',
'PI-84-CU-S2']
import random
sample_size = len(A)
Group = [random.randint(0,1) for _ in range(sample_size)]
A = pd.DataFrame(list(zip(A,Group)),columns=['ID','Group'])
B = pd.DataFrame(B,columns=['Name'])
clus_tx = np.array([])
for date, row in B.iterrows():
for date2, row2 in A.iterrows():
if row2['ID'] in row['Name']:
clus = row['Group']
else:
clus = 999
clus_tx = np.append(clus_tx,clus)
B['Group'] = clus_tx
我想要的是长度为 B 的 np.array clus_tx,如果有一个元素的字符串与 A ('PI-xx') 匹配,我会从A中取出'Group'列的值赋值给B,如果没有字符串匹配,我会把999的值赋给B中的'Group'列。
我想我做的循环是错误的,因为 clus_tx 的大小不是我预期的...我的真实数据集很大,所以我不能手动执行此操作。
首先,之所以clus_tx
的大小不是你想要的,是因为你把clus_tx = np.append(clus_tx,clus)
放在了最里面的循环里,没有break。所以 clus_tx
的长度总是 len(A) x len(B)
.
其次,if语句块的逻辑不是你想要的
我稍微更改了代码,希望对您有所帮助:
import numpy as np
import pandas as pd
A = ['DF-PI-05', 'DF-PI-09', 'DF-PI-10', 'DF-PI-15', 'DF-PI-16',
'DF-PI-19', 'DF-PI-89', 'DF-PI-92', 'DF-PI-93', 'DF-PI-94',
'DF-PI-95', 'DF-PI-96', 'DF-PI-25', 'DF-PI-29', 'DF-PI-30',
'DF-PI-34', 'DF-PI-84']
B = ['PI-05', 'PI-10', 'PI-89', 'PI-90', 'PI-93', 'PI-94', 'PI-95',
'PI-96', 'PI-09', 'PI-15', 'PI-16', 'PI-19', 'PI-91A', 'PI-91b',
'PI-92', 'PI-25-CU', 'PI-29', 'PI-30', 'PI-34', 'PI-84-CU-S1',
'PI-84-CU-S2']
import random
sample_size = len(A)
Group = [random.randint(0,1) for _ in range(sample_size)]
A = pd.DataFrame(list(zip(A,Group)),columns=['ID','Group'])
B = pd.DataFrame(B,columns=['Name'])
clus_tx = np.array([])
for date, row_B in B.iterrows():
clus = 999
for date2, row_A in A.iterrows():
if row_B['Name'] in row_A['ID']:
clus = row_A['Group']
break
clus_tx = np.append(clus_tx,clus)
B['Group'] = clus_tx
print(B)
B 的打印输出如下所示:
Name Group
0 PI-05 0.0
1 PI-10 0.0
2 PI-89 1.0
3 PI-90 999.0
4 PI-93 0.0
5 PI-94 1.0
6 PI-95 1.0
7 PI-96 0.0
8 PI-09 1.0
9 PI-15 0.0
10 PI-16 1.0
11 PI-19 1.0
12 PI-91A 999.0
13 PI-91b 999.0
14 PI-92 1.0
15 PI-25-CU 999.0
16 PI-29 0.0
17 PI-30 1.0
18 PI-34 0.0
19 PI-84-CU-S1 999.0
20 PI-84-CU-S2 999.0
我正在尝试比较 2 个不同的 pandas' 数据框(A 和 B)的 2 列(字符串),如果它们匹配一段字符串,我想分配一个值数据框 A 到数据框 B 中的列。
这是我的代码:
import numpy as np
import pandas as pd
A = ['DF-PI-05', 'DF-PI-09', 'DF-PI-10', 'DF-PI-15', 'DF-PI-16',
'DF-PI-19', 'DF-PI-89', 'DF-PI-92', 'DF-PI-93', 'DF-PI-94',
'DF-PI-95', 'DF-PI-96', 'DF-PI-25', 'DF-PI-29', 'DF-PI-30',
'DF-PI-34', 'DF-PI-84']
B = ['PI-05', 'PI-10', 'PI-89', 'PI-90', 'PI-93', 'PI-94', 'PI-95',
'PI-96', 'PI-09', 'PI-15', 'PI-16', 'PI-19', 'PI-91A', 'PI-91b',
'PI-92', 'PI-25-CU', 'PI-29', 'PI-30', 'PI-34', 'PI-84-CU-S1',
'PI-84-CU-S2']
import random
sample_size = len(A)
Group = [random.randint(0,1) for _ in range(sample_size)]
A = pd.DataFrame(list(zip(A,Group)),columns=['ID','Group'])
B = pd.DataFrame(B,columns=['Name'])
clus_tx = np.array([])
for date, row in B.iterrows():
for date2, row2 in A.iterrows():
if row2['ID'] in row['Name']:
clus = row['Group']
else:
clus = 999
clus_tx = np.append(clus_tx,clus)
B['Group'] = clus_tx
我想要的是长度为 B 的 np.array clus_tx,如果有一个元素的字符串与 A ('PI-xx') 匹配,我会从A中取出'Group'列的值赋值给B,如果没有字符串匹配,我会把999的值赋给B中的'Group'列。 我想我做的循环是错误的,因为 clus_tx 的大小不是我预期的...我的真实数据集很大,所以我不能手动执行此操作。
首先,之所以clus_tx
的大小不是你想要的,是因为你把clus_tx = np.append(clus_tx,clus)
放在了最里面的循环里,没有break。所以 clus_tx
的长度总是 len(A) x len(B)
.
其次,if语句块的逻辑不是你想要的
我稍微更改了代码,希望对您有所帮助:
import numpy as np
import pandas as pd
A = ['DF-PI-05', 'DF-PI-09', 'DF-PI-10', 'DF-PI-15', 'DF-PI-16',
'DF-PI-19', 'DF-PI-89', 'DF-PI-92', 'DF-PI-93', 'DF-PI-94',
'DF-PI-95', 'DF-PI-96', 'DF-PI-25', 'DF-PI-29', 'DF-PI-30',
'DF-PI-34', 'DF-PI-84']
B = ['PI-05', 'PI-10', 'PI-89', 'PI-90', 'PI-93', 'PI-94', 'PI-95',
'PI-96', 'PI-09', 'PI-15', 'PI-16', 'PI-19', 'PI-91A', 'PI-91b',
'PI-92', 'PI-25-CU', 'PI-29', 'PI-30', 'PI-34', 'PI-84-CU-S1',
'PI-84-CU-S2']
import random
sample_size = len(A)
Group = [random.randint(0,1) for _ in range(sample_size)]
A = pd.DataFrame(list(zip(A,Group)),columns=['ID','Group'])
B = pd.DataFrame(B,columns=['Name'])
clus_tx = np.array([])
for date, row_B in B.iterrows():
clus = 999
for date2, row_A in A.iterrows():
if row_B['Name'] in row_A['ID']:
clus = row_A['Group']
break
clus_tx = np.append(clus_tx,clus)
B['Group'] = clus_tx
print(B)
B 的打印输出如下所示:
Name Group
0 PI-05 0.0
1 PI-10 0.0
2 PI-89 1.0
3 PI-90 999.0
4 PI-93 0.0
5 PI-94 1.0
6 PI-95 1.0
7 PI-96 0.0
8 PI-09 1.0
9 PI-15 0.0
10 PI-16 1.0
11 PI-19 1.0
12 PI-91A 999.0
13 PI-91b 999.0
14 PI-92 1.0
15 PI-25-CU 999.0
16 PI-29 0.0
17 PI-30 1.0
18 PI-34 0.0
19 PI-84-CU-S1 999.0
20 PI-84-CU-S2 999.0