如何从 DataFrame 中提取第一个单词

How to extract first word from DataFrame

背景

我结合了来自 Kaggle 的两个数据集创建了以下数据框。

Titanic: Machine Learning from Disaster (input/titanic/train.csv)

titanic-nationalities

数据框名称:输出

    PassengerId Nationality Name
0   1   CelticEnglish   Braund, Mr. Owen Harris
1   2   CelticEnglish   Cumings, Mrs. John Bradley (Florence Briggs Th...
2   3   Nordic,Scandinavian,Sweden  Heikkinen, Miss. Laina
3   4   CelticEnglish   Futrelle, Mrs. Jacques Heath (Lily May Peel
....

我希望改造的东西

    PassengerId Nationality Name
0   1   CelticEnglish   Braund
1   2   CelticEnglish   Cumings
2   3   Nordic  Heikkinen
3   4   CelticEnglish   Futrelle
....

问题

我试图执行下面的代码,但我不知道如何修复下面的代码。

错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
----> 1 output['Nationality'].split('\n', 1)[0]
      2 output['Name'].split('\n', 1)[0]

/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
   5137             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5138                 return self[name]
-> 5139             return object.__getattribute__(self, name)
   5140 
   5141     def __setattr__(self, name: str, value) -> None:

AttributeError: 'Series' object has no attribute 'split'

代码

output['Nationality'].split('\n', 1)[0]
output['Name'].split('\n', 1)[0]

我尝试做的事情

我尝试更改类型转换,但结果并没有改变。

output['Nationality'] = output['Nationality'].astype(str)
output['Name'] = output['Name'].astype(str)

output['Nationality'] = output['Nationality'].str.split('\n', expand=True)[0]
output['Name'] = output['Name'].str.split('\n', expand=True)[0]
output
PassengerId Nationality Name
0   1   CelticEnglish   Braund, Mr. Owen Harris
1   2   CelticEnglish   Cumings, Mrs. John Bradley (Florence Briggs Th...
2   3   Nordic,Scandinavian,Sweden  Heikkinen, Miss. Laina
3   4   CelticEnglish   Futrelle, Mrs. Jacques Heath (Lily May Peel)

环境

Kaggle 笔记本

Series 对象没有 split 方法。您正在尝试拆分字符串,因此您需要先将列数据类型转换为字符串(或将列扩展为多列),然后再应用拆分。

使用 df.dtypes

检查列的数据类型

使用 output['Nationality'].astype(str)

分配数据类型

编辑:dtype 调用没有括号

试试 .str.split()

output['Nationality'] = output['Nationality'].str.split('\n', expand=True)[0]
output['Name'] = output['Name'].str.split('\n', expand=True)[0]