如何使用列表中包含的另一列中的单词创建 pandas 列

Question

我想从 pandas 列的字符串中删除列表中指定的单词，并用它们构建另一个列。这个例子的灵感来自问题

listing  = ['test', 'big']
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})
df['Test_Flag'] = np.where(df['Title'].str.contains('|'.join(listing), case=False, 
na=False), 'T', '')
print (df)

        Title         Test_Flag
0  small test         T
1  huge Test          T
2  big                T
3  nothing
4   NaN          
5     a
6     b

但是，如果我想用列表中已找到的实际单词代替“T”怎么办？所以，有一个结果：

        Title       Test_Flag
0  small test       test
1  huge Test        test
2  big              big
3  nothing
4   NaN          
5     a
6     b

Answer 1

将 .apply 方法与自定义函数结合使用应该可以满足您的需求

import pandas as pd
import numpy as np

# Define the listing list with the words you want to extract
listing  = ['test', 'big']
# Define the DataFrame
df = pd.DataFrame({'Title':['small test','huge Test', 'big','nothing', np.nan, 'a', 'b']})

# Define the function which takes a string and a list of words to extract as inputs
def listing_splitter(text, listing):
    # Try except to handle np.nans in input
    try:
        # Extract the list of flags
        flags = [l for l in listing if l in text.lower()]
        # If any flags were extracted then return the list
        if flags:
            return flags
        # Otherwise return np.nan
        else:
            return np.nan
    except AttributeError:
        return np.nan

# Apply the function to the column
df['Test_Flag'] = df['Title'].apply(lambda x: listing_splitter(x, listing))
df

输出：

    Title       Test_Flag
0   small test  ['test']
1   huge Test   ['test']
2   big         ['big']
3   nothing     NaN
4   NaN         NaN
5   a           NaN
6   b           NaN
7   smalltest   ['test']

如何使用列表中包含的另一列中的单词创建 pandas 列

How to create a pandas column with words from another column, contained in a list

python

contains

list

pandas