如何使用列表中包含的另一列中的单词创建 pandas 列
How to create a pandas column with words from another column, contained in a list
我想从 pandas 列的字符串中删除列表中指定的单词,并用它们构建另一个列。
这个例子的灵感来自问题
listing = ['test', 'big']
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
df['Test_Flag'] = np.where(df['Title'].str.contains('|'.join(listing), case=False,
na=False), 'T', '')
print (df)
Title Test_Flag
0 small test T
1 huge Test T
2 big T
3 nothing
4 NaN
5 a
6 b
但是,如果我想用列表中已找到的实际单词代替“T”怎么办?
所以,有一个结果:
Title Test_Flag
0 small test test
1 huge Test test
2 big big
3 nothing
4 NaN
5 a
6 b
将 .apply
方法与自定义函数结合使用应该可以满足您的需求
import pandas as pd
import numpy as np
# Define the listing list with the words you want to extract
listing = ['test', 'big']
# Define the DataFrame
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
# Define the function which takes a string and a list of words to extract as inputs
def listing_splitter(text, listing):
# Try except to handle np.nans in input
try:
# Extract the list of flags
flags = [l for l in listing if l in text.lower()]
# If any flags were extracted then return the list
if flags:
return flags
# Otherwise return np.nan
else:
return np.nan
except AttributeError:
return np.nan
# Apply the function to the column
df['Test_Flag'] = df['Title'].apply(lambda x: listing_splitter(x, listing))
df
输出:
Title Test_Flag
0 small test ['test']
1 huge Test ['test']
2 big ['big']
3 nothing NaN
4 NaN NaN
5 a NaN
6 b NaN
7 smalltest ['test']
我想从 pandas 列的字符串中删除列表中指定的单词,并用它们构建另一个列。
这个例子的灵感来自问题
listing = ['test', 'big']
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
df['Test_Flag'] = np.where(df['Title'].str.contains('|'.join(listing), case=False,
na=False), 'T', '')
print (df)
Title Test_Flag
0 small test T
1 huge Test T
2 big T
3 nothing
4 NaN
5 a
6 b
但是,如果我想用列表中已找到的实际单词代替“T”怎么办? 所以,有一个结果:
Title Test_Flag
0 small test test
1 huge Test test
2 big big
3 nothing
4 NaN
5 a
6 b
将 .apply
方法与自定义函数结合使用应该可以满足您的需求
import pandas as pd
import numpy as np
# Define the listing list with the words you want to extract
listing = ['test', 'big']
# Define the DataFrame
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
# Define the function which takes a string and a list of words to extract as inputs
def listing_splitter(text, listing):
# Try except to handle np.nans in input
try:
# Extract the list of flags
flags = [l for l in listing if l in text.lower()]
# If any flags were extracted then return the list
if flags:
return flags
# Otherwise return np.nan
else:
return np.nan
except AttributeError:
return np.nan
# Apply the function to the column
df['Test_Flag'] = df['Title'].apply(lambda x: listing_splitter(x, listing))
df
输出:
Title Test_Flag
0 small test ['test']
1 huge Test ['test']
2 big ['big']
3 nothing NaN
4 NaN NaN
5 a NaN
6 b NaN
7 smalltest ['test']