如何从 DataFrame 中提取第一个单词
How to extract first word from DataFrame
背景
我结合了来自 Kaggle 的两个数据集创建了以下数据框。
Titanic: Machine Learning from Disaster
(input/titanic/train.csv)
数据框名称:输出
PassengerId Nationality Name
0 1 CelticEnglish Braund, Mr. Owen Harris
1 2 CelticEnglish Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 Nordic,Scandinavian,Sweden Heikkinen, Miss. Laina
3 4 CelticEnglish Futrelle, Mrs. Jacques Heath (Lily May Peel
....
我希望改造的东西
PassengerId Nationality Name
0 1 CelticEnglish Braund
1 2 CelticEnglish Cumings
2 3 Nordic Heikkinen
3 4 CelticEnglish Futrelle
....
问题
我试图执行下面的代码,但我不知道如何修复下面的代码。
错误
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
----> 1 output['Nationality'].split('\n', 1)[0]
2 output['Name'].split('\n', 1)[0]
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'split'
代码
output['Nationality'].split('\n', 1)[0]
output['Name'].split('\n', 1)[0]
我尝试做的事情
我尝试更改类型转换,但结果并没有改变。
output['Nationality'] = output['Nationality'].astype(str)
output['Name'] = output['Name'].astype(str)
output['Nationality'] = output['Nationality'].str.split('\n', expand=True)[0]
output['Name'] = output['Name'].str.split('\n', expand=True)[0]
output
PassengerId Nationality Name
0 1 CelticEnglish Braund, Mr. Owen Harris
1 2 CelticEnglish Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 Nordic,Scandinavian,Sweden Heikkinen, Miss. Laina
3 4 CelticEnglish Futrelle, Mrs. Jacques Heath (Lily May Peel)
环境
Kaggle 笔记本
Series 对象没有 split 方法。您正在尝试拆分字符串,因此您需要先将列数据类型转换为字符串(或将列扩展为多列),然后再应用拆分。
使用 df.dtypes
检查列的数据类型
使用 output['Nationality'].astype(str)
分配数据类型
编辑:dtype 调用没有括号
试试 .str.split()
output['Nationality'] = output['Nationality'].str.split('\n', expand=True)[0]
output['Name'] = output['Name'].str.split('\n', expand=True)[0]
背景
我结合了来自 Kaggle 的两个数据集创建了以下数据框。
Titanic: Machine Learning from Disaster (input/titanic/train.csv)
数据框名称:输出
PassengerId Nationality Name
0 1 CelticEnglish Braund, Mr. Owen Harris
1 2 CelticEnglish Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 Nordic,Scandinavian,Sweden Heikkinen, Miss. Laina
3 4 CelticEnglish Futrelle, Mrs. Jacques Heath (Lily May Peel
....
我希望改造的东西
PassengerId Nationality Name
0 1 CelticEnglish Braund
1 2 CelticEnglish Cumings
2 3 Nordic Heikkinen
3 4 CelticEnglish Futrelle
....
问题
我试图执行下面的代码,但我不知道如何修复下面的代码。
错误
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
----> 1 output['Nationality'].split('\n', 1)[0]
2 output['Name'].split('\n', 1)[0]
/opt/conda/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'split'
代码
output['Nationality'].split('\n', 1)[0]
output['Name'].split('\n', 1)[0]
我尝试做的事情
我尝试更改类型转换,但结果并没有改变。
output['Nationality'] = output['Nationality'].astype(str)
output['Name'] = output['Name'].astype(str)
output['Nationality'] = output['Nationality'].str.split('\n', expand=True)[0]
output['Name'] = output['Name'].str.split('\n', expand=True)[0]
output
PassengerId Nationality Name
0 1 CelticEnglish Braund, Mr. Owen Harris
1 2 CelticEnglish Cumings, Mrs. John Bradley (Florence Briggs Th...
2 3 Nordic,Scandinavian,Sweden Heikkinen, Miss. Laina
3 4 CelticEnglish Futrelle, Mrs. Jacques Heath (Lily May Peel)
环境
Kaggle 笔记本
Series 对象没有 split 方法。您正在尝试拆分字符串,因此您需要先将列数据类型转换为字符串(或将列扩展为多列),然后再应用拆分。
使用 df.dtypes
使用 output['Nationality'].astype(str)
编辑:dtype 调用没有括号
试试 .str.split()
output['Nationality'] = output['Nationality'].str.split('\n', expand=True)[0]
output['Name'] = output['Name'].str.split('\n', expand=True)[0]