pandas:不是将函数应用于 df,而是从函数中以列表形式获取结果
pandas: instead of applying the function to df get the result as a list from the function
我有如下数据框:
df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':["['DET', 'NOUN', 'VERB','ADJ', 'ADV']","['QUA', 'VERB', 'PRON', 'ADV']", "['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]"]})
我有一个函数可以在 pos == 'ADJ'
时输出准确对应的单词及其索引,如下所示:()
import pandas as pd
def extract_words(row):
word_pos = {}
text_splited = row.text.split()
pos = ast.literal_eval(row.pos)
for i, p in enumerate(pos):
if p == 'ADJ':
word_pos[text_splited[i]] = i
return word_pos
df['Third_column'] = ' '
df['Third_column'] = df.apply(extract_words, axis=1)
我想做的是重构函数,这样我就不必将此函数应用于函数外部的 df,而是能够将结果附加到函数外部的列表中。到目前为止我试过这个:
list_word_index = []
def extract_words(dataframe):
for li in dataframe.text.str.split():
for lis in dataframe.pos:
for i, p in enumerate(ast.literal_eval(lis)):
if p == 'nk':
...
list_word_index.append(...)
extract_words(df)
我不知道如何填写代码的...
部分。
根据您的 DataFrame,您可以使用该函数返回列表:
from typing import List
df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})
def extract_words_to_list(df: pd.DataFrame) -> List:
# iterate over dataframe row-wise
tmp = []
for _, row in df.iterrows():
word_pos = {}
text_splited = row.text.split()
for i, p in enumerate(row.pos):
if p == 'ADJ':
word_pos[text_splited[i]] = i
tmp.append(word_pos)
return tmp
list_word_index = extract_words_to_list(df)
list_word_index # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]
虽然您也可以只使用:
df['Third_column'] = df.apply(extract_words, axis=1)
df['Third_column'].tolist() # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]
实现同样的目标。
我有如下数据框:
df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':["['DET', 'NOUN', 'VERB','ADJ', 'ADV']","['QUA', 'VERB', 'PRON', 'ADV']", "['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]"]})
我有一个函数可以在 pos == 'ADJ'
时输出准确对应的单词及其索引,如下所示:(
import pandas as pd
def extract_words(row):
word_pos = {}
text_splited = row.text.split()
pos = ast.literal_eval(row.pos)
for i, p in enumerate(pos):
if p == 'ADJ':
word_pos[text_splited[i]] = i
return word_pos
df['Third_column'] = ' '
df['Third_column'] = df.apply(extract_words, axis=1)
我想做的是重构函数,这样我就不必将此函数应用于函数外部的 df,而是能够将结果附加到函数外部的列表中。到目前为止我试过这个:
list_word_index = []
def extract_words(dataframe):
for li in dataframe.text.str.split():
for lis in dataframe.pos:
for i, p in enumerate(ast.literal_eval(lis)):
if p == 'nk':
...
list_word_index.append(...)
extract_words(df)
我不知道如何填写代码的...
部分。
根据您的 DataFrame,您可以使用该函数返回列表:
from typing import List
df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})
def extract_words_to_list(df: pd.DataFrame) -> List:
# iterate over dataframe row-wise
tmp = []
for _, row in df.iterrows():
word_pos = {}
text_splited = row.text.split()
for i, p in enumerate(row.pos):
if p == 'ADJ':
word_pos[text_splited[i]] = i
tmp.append(word_pos)
return tmp
list_word_index = extract_words_to_list(df)
list_word_index # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]
虽然您也可以只使用:
df['Third_column'] = df.apply(extract_words, axis=1)
df['Third_column'].tolist() # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]
实现同样的目标。