将函数应用于 pandas df 时的异常处理

Question

我想使用 python 包 textblob 进行一些语言检测：我在 pandas df 中创建了一个新列，其中应包含检测到的语言：

from textblob import TextBlob
posts['Language']=posts['Caption'].apply(TextBlob.detect_language)

此代码有效。但是，对于一个 df，它会中断并抛出异常 ('TranslatorError')，其中相应的行包含少于 3 个字符。因此，我想编写一个函数来确保 'TextBlob.detect_language' 函数即使在发生异常时也能应用于完整的 df。

我想过类似的事情：

def get_language(r):
    try:
        return r.TextBlob.detect_language()
    # except (r.TextBlob.detect_language==TranslatorError):
        return np.nan # where textblob was not able to detect language -> nan

但是，我不知道在（已取消注释的）"except" 子句之后写什么。有帮助吗？

应用的当前函数（除注释外）

posts['Language']=posts['Caption'].apply(get_language)

returns

AttributeError: 'TextBlob' object has no attribute 'TextBlob'

如果我尝试

def get_language(r):
    try:
        return r.TextBlob.detect_language()
    except:
        pass # (or np.nan)

它只传递所有行，即不检测任何行的语言...

感谢大家的帮助！

Answer 1

见下文：

from textblob import TextBlob
import pandas

def detect_language(text):
    try:
        b = TextBlob(text)
        return b.detect_language()
    except:
        return "Language Not Detected"

df = pandas.DataFrame(data=[("na","hello"),("na", "bonjour"),("na", "_")], columns = ['Language', 'Caption']) 
df['Language']=df['Caption'].apply(detect_language)
df

将函数应用于 pandas df 时的异常处理

Exception handling when applying function to pandas df

exception-handling

function

python-3.x

pandas

textblob