CleanTextEmptyString:没有提供要清理的文本。应用于数据框中的每一行
CleanTextEmptyString: No text is provided to clean. Apply on each row in a dataframe
我正在尝试将函数 cleantext 应用于数据框列的每一行。
它在没有应用功能的情况下工作完美,我得到了我想要的结果。
问题来了
import cleantext
from cleantext import clean
master_df_m['col'] = master_df_m.Presentation.apply(lambda row: clean(row))
CleanTextEmptyString: No text is provided to clean
这里没问题:
print(clean(master_df_m.Presentation[0], clean_all=True))
输出:
oper good morn name janeka confer oper time would like welcom everyon comerica second quarter earn call line place mute prevent background nois speaker remark questionandansw session oper instruct thank would like turn call ms darlen person director investor relat may begin darlen person comerica incorpor director ir thank janeka good morn welcom comerica
怎么了?我还尝试将 axis=1 放在 apply 函数的括号中。
假设您的数据框没有任何空字符串,您可以尝试这样的操作:
from cleantext import clean
import pandas as pd
df = pd.DataFrame(data={'Presentation': [' This is some kind of sentence', ' This is anoTher! kind of sentence']})
df['cleaned_text'] = df.Presentation.apply(clean)
输出:
Presentation cleaned_text
0 This is some kind of sentence kind sentenc
1 This is anoTher! kind of sentence anoth kind sentenc
如果您想覆盖您的 Presentation
列,那么只需使用 df['Presentation']
。或者使用 map
:
df['Presentation'] = df['Presentation'].map(clean)
更新 1:
如果您的数据框中有空字符串,请尝试这样的操作:
df = pd.DataFrame(data={'Presentation': [' This is some kind of sentence', ' This is anoTher! kind of sentence', ""]})
df = df.replace('', 'NaN')
# or df.loc[df.Presentation == '', 'Presentation'] = 'NaN'
df['Presentation'] = df['Presentation'].map(clean)
或:
df['Presentation'] = df.loc[df.Presentation !='', 'Presentation'].map(clean)
Presentation
0 kind sentenc
1 anoth kind sentenc
2 NaN
这里有一个简单的方法:
from cleantext import clean
for col in master_df_m.columns:
master_df_m[col] = master_df_m[col].apply(lambda word: clean(word))
这将帮助您根据需要在 clean() 中指定其他参数。
https://pypi.org/project/cleantext/
我正在尝试将函数 cleantext 应用于数据框列的每一行。 它在没有应用功能的情况下工作完美,我得到了我想要的结果。 问题来了
import cleantext
from cleantext import clean
master_df_m['col'] = master_df_m.Presentation.apply(lambda row: clean(row))
CleanTextEmptyString: No text is provided to clean
这里没问题:
print(clean(master_df_m.Presentation[0], clean_all=True))
输出:
oper good morn name janeka confer oper time would like welcom everyon comerica second quarter earn call line place mute prevent background nois speaker remark questionandansw session oper instruct thank would like turn call ms darlen person director investor relat may begin darlen person comerica incorpor director ir thank janeka good morn welcom comerica
怎么了?我还尝试将 axis=1 放在 apply 函数的括号中。
假设您的数据框没有任何空字符串,您可以尝试这样的操作:
from cleantext import clean
import pandas as pd
df = pd.DataFrame(data={'Presentation': [' This is some kind of sentence', ' This is anoTher! kind of sentence']})
df['cleaned_text'] = df.Presentation.apply(clean)
输出:
Presentation cleaned_text
0 This is some kind of sentence kind sentenc
1 This is anoTher! kind of sentence anoth kind sentenc
如果您想覆盖您的 Presentation
列,那么只需使用 df['Presentation']
。或者使用 map
:
df['Presentation'] = df['Presentation'].map(clean)
更新 1: 如果您的数据框中有空字符串,请尝试这样的操作:
df = pd.DataFrame(data={'Presentation': [' This is some kind of sentence', ' This is anoTher! kind of sentence', ""]})
df = df.replace('', 'NaN')
# or df.loc[df.Presentation == '', 'Presentation'] = 'NaN'
df['Presentation'] = df['Presentation'].map(clean)
或:
df['Presentation'] = df.loc[df.Presentation !='', 'Presentation'].map(clean)
Presentation
0 kind sentenc
1 anoth kind sentenc
2 NaN
这里有一个简单的方法:
from cleantext import clean
for col in master_df_m.columns:
master_df_m[col] = master_df_m[col].apply(lambda word: clean(word))
这将帮助您根据需要在 clean() 中指定其他参数。 https://pypi.org/project/cleantext/