pandas：不是将函数应用于 df，而是从函数中以列表形式获取结果

Question

我有如下数据框：

df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':["['DET', 'NOUN', 'VERB','ADJ', 'ADV']","['QUA', 'VERB', 'PRON', 'ADV']", "['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]"]})

我有一个函数可以在 pos == 'ADJ' 时输出准确对应的单词及其索引，如下所示：()

import pandas as pd

def extract_words(row):
word_pos = {}
text_splited = row.text.split()
pos = ast.literal_eval(row.pos)
for i, p in enumerate(pos):
    if p == 'ADJ':
        word_pos[text_splited[i]] = i
return word_pos

df['Third_column'] = ' '
df['Third_column'] = df.apply(extract_words, axis=1)

我想做的是重构函数，这样我就不必将此函数应用于函数外部的 df，而是能够将结果附加到函数外部的列表中。到目前为止我试过这个：

list_word_index = []

def extract_words(dataframe):
for li in dataframe.text.str.split():
    for lis in dataframe.pos:
        for i, p in enumerate(ast.literal_eval(lis)):
            if p == 'nk':
                ...
               list_word_index.append(...)

extract_words(df)

我不知道如何填写代码的...部分。

Answer 1

根据您的 DataFrame，您可以使用该函数返回列表：

from typing import List

df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})


def extract_words_to_list(df: pd.DataFrame) -> List:
    # iterate over dataframe row-wise
    tmp = []
    for _, row in df.iterrows():
        word_pos = {}
        text_splited = row.text.split()
        for i, p in enumerate(row.pos):
            if p == 'ADJ':
                word_pos[text_splited[i]] = i
        tmp.append(word_pos)
    return tmp

list_word_index = extract_words_to_list(df)
list_word_index # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]

虽然您也可以只使用：

df['Third_column'] = df.apply(extract_words, axis=1)
df['Third_column'].tolist() # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]

实现同样的目标。

pandas：不是将函数应用于 df，而是从函数中以列表形式获取结果

pandas: instead of applying the function to df get the result as a list from the function

python

refactoring

append

pandas