遍历一行中的每个项目并与另一行中的每个项目进行比较,然后将结果保存在新的 column_python
Loop over each item in a row and compare with each item from another row then save the result in a new column_python
我想在 python 中循环遍历一行中的每个项目与另一列中对应行中的其他项目。
如果项目不存在于第二列的行中,则应附加到将在另一列中转换的新列表(如果 i 不在 c 中,则附加时也应消除重复项)。
目标是将一列每一行的项目与另一列中对应行的项目进行比较,并将第一列的唯一值保存在新列中相同的 df。
df columns
这只是一个例子,我每行有很多项目
我尝试使用此代码,但没有任何反应,并且列表到列的转换与我测试的不正确
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
c = []
for i in df.values:
for i in a:
if i in a:
if i not in b:
if i not in c:
c.append(i)
print(c)
df['new'] = pd.Series(c)
任何帮助都是多余的,提前致谢
所以看到你有这两个变量,一种方法是:
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
尝试这样的事情:
new = {}
for index, items in enumerate(a):
for thing in items:
if thing not in b[index]:
if index in new:
new[index].append(thing)
else:
new[index] = [thing]
然后将字典映射到df。
df['new'] = df.index.map(new)
有更好的方法,但这应该可行。
这应该是你想要的:
import pandas as pd
data = {'final_key_concat':[['Camiseta', 'Tecnica', 'hombre', 'barate'],
['deportivas', 'calcetin', 'hombres', 'deportivas', 'shoes']],
'attributes_tokenize':[['The', 'North', 'Face', 'manga'], ['deportivas',
'calcetin', 'shoes', 'North']]} #recreated from your image
df = pd.DataFrame(data)
a= df['final_key_concat'].tolist() #this generates a list of lists
b = df['attributes_tokenize'].tolist()#this also generates a list of lists
#Both list a and b need to be flattened so as to access their elements the way you want it
c = [itm for sblst in a for itm in sblst] #flatten list a using list comprehension
d = [itm for sblst in b for itm in sblst] #flatten list b using list comprehension
final_list = [itm for itm in c if itm not in d]#Sort elements common to both list c and d
print (final_list)
Result
['Camiseta', 'Tecnica', 'hombre', 'barate', 'hombres']
def parse_str_into_list(s):
if s.startswith('[') and s.endswith(']'):
return ' '.join(s.strip('[]').strip("'").split("', '"))
return s
def filter_restrict_words(row):
targets = parse_str_into_list(row[0]).split(' ', -1)
restricts = parse_str_into_list(row[1]).split(' ', -1)
print(restricts)
# start for loop each words
# use set type to save words or list if we need to keep words in order
words_to_keep = []
for word in targets:
# condition to keep eligible words
if word not in restricts and 3 < len(word) < 45 and word not in words_to_keep:
words_to_keep.append(word)
print(words_to_keep)
return ' '.join(words_to_keep)
df['FINAL_KEYWORDS'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)
我想在 python 中循环遍历一行中的每个项目与另一列中对应行中的其他项目。 如果项目不存在于第二列的行中,则应附加到将在另一列中转换的新列表(如果 i 不在 c 中,则附加时也应消除重复项)。
目标是将一列每一行的项目与另一列中对应行的项目进行比较,并将第一列的唯一值保存在新列中相同的 df。
df columns
这只是一个例子,我每行有很多项目
我尝试使用此代码,但没有任何反应,并且列表到列的转换与我测试的不正确
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
c = []
for i in df.values:
for i in a:
if i in a:
if i not in b:
if i not in c:
c.append(i)
print(c)
df['new'] = pd.Series(c)
任何帮助都是多余的,提前致谢
所以看到你有这两个变量,一种方法是:
a= df['final_key_concat'].tolist()
b = df['attributes_tokenize'].tolist()
尝试这样的事情:
new = {}
for index, items in enumerate(a):
for thing in items:
if thing not in b[index]:
if index in new:
new[index].append(thing)
else:
new[index] = [thing]
然后将字典映射到df。
df['new'] = df.index.map(new)
有更好的方法,但这应该可行。
这应该是你想要的:
import pandas as pd
data = {'final_key_concat':[['Camiseta', 'Tecnica', 'hombre', 'barate'],
['deportivas', 'calcetin', 'hombres', 'deportivas', 'shoes']],
'attributes_tokenize':[['The', 'North', 'Face', 'manga'], ['deportivas',
'calcetin', 'shoes', 'North']]} #recreated from your image
df = pd.DataFrame(data)
a= df['final_key_concat'].tolist() #this generates a list of lists
b = df['attributes_tokenize'].tolist()#this also generates a list of lists
#Both list a and b need to be flattened so as to access their elements the way you want it
c = [itm for sblst in a for itm in sblst] #flatten list a using list comprehension
d = [itm for sblst in b for itm in sblst] #flatten list b using list comprehension
final_list = [itm for itm in c if itm not in d]#Sort elements common to both list c and d
print (final_list)
Result
['Camiseta', 'Tecnica', 'hombre', 'barate', 'hombres']
def parse_str_into_list(s):
if s.startswith('[') and s.endswith(']'):
return ' '.join(s.strip('[]').strip("'").split("', '"))
return s
def filter_restrict_words(row):
targets = parse_str_into_list(row[0]).split(' ', -1)
restricts = parse_str_into_list(row[1]).split(' ', -1)
print(restricts)
# start for loop each words
# use set type to save words or list if we need to keep words in order
words_to_keep = []
for word in targets:
# condition to keep eligible words
if word not in restricts and 3 < len(word) < 45 and word not in words_to_keep:
words_to_keep.append(word)
print(words_to_keep)
return ' '.join(words_to_keep)
df['FINAL_KEYWORDS'] = df[[col_target, col_restrict]].apply(lambda x: filter_restrict_words(x), axis=1)