Pandas - 数据清理 - 添加带有文本值 if else 语句的新列
Pandas - Data Cleaning - Add New Column with if else statement for text values
我有一个发布商列表,如下所示:
+--------------+
| Site Name |
+--------------+
| Radium One |
| Euronews |
| EUROSPORT |
| WIRED |
| RadiumOne |
| Eurosport FR |
| Wired US |
| Eurosport |
| EuroNews |
| Wired |
+--------------+
我想创建以下结果:
+--------------+----------------+
| Site Name | Publisher Name |
+--------------+----------------+
| Radium One | RadiumOne |
| Euronews | Euronews |
| EUROSPORT | Eurosport |
| WIRED | Wired |
| RadiumOne | RadiumOne |
| Eurosport FR | Eurosport |
| Wired US | Wired |
| Eurosport | Eurosport |
| EuroNews | Euronews |
| Wired | Wired |
+--------------+----------------+
我想了解如何复制我在 Power Query 中使用的这段代码:
搜索前 4 个字符
if Text.Start([网站名称],4) = "WIRE" then "Wired" else
搜索最后 3 个字符
if Text.End([站点名称],3) = "One" then "RadiumOne" else
如果找不到匹配项,则添加“Rest”
不必区分大小写。
您可以使用 apply
方法和功能,例如:
def handle_text(txt):
if txt.lower()[:4] == 'wire':
return 'Wired'
elif txt.lower()[-3:] == 'one':
return 'RadiumOne'
return 'Rest'
df['Publisher Name'] = df['Site Name'].apply(handle_text)
我认为你可以使用双 numpy.where
with conditions created with indexing with str:
s = df['Site Name'].str.lower()
df['new'] = np.where(s.str[:4] == 'wire', 'Wired',
np.where(s.str[-3:] == 'one', 'RadiumOne', 'Rest'))
但如果需要你的输出,还需要 split
and title
:
df['new1'] = np.where(s.str[:4] == 'wire', 'Wired',
np.where(s.str[-3:] == 'one', 'RadiumOne', s.str.split().str[0].str.title()))
print (df)
Site Name new new1
0 Radium One RadiumOne RadiumOne
1 Euronews Rest Euronews
2 EUROSPORT Rest Eurosport
3 WIRED Wired Wired
4 RadiumOne RadiumOne RadiumOne
5 Eurosport FR Rest Eurosport
6 Wired US Wired Wired
7 Eurosport Rest Eurosport
8 EuroNews Rest Euronews
9 Wired Wired Wired
我有一个发布商列表,如下所示:
+--------------+
| Site Name |
+--------------+
| Radium One |
| Euronews |
| EUROSPORT |
| WIRED |
| RadiumOne |
| Eurosport FR |
| Wired US |
| Eurosport |
| EuroNews |
| Wired |
+--------------+
我想创建以下结果:
+--------------+----------------+
| Site Name | Publisher Name |
+--------------+----------------+
| Radium One | RadiumOne |
| Euronews | Euronews |
| EUROSPORT | Eurosport |
| WIRED | Wired |
| RadiumOne | RadiumOne |
| Eurosport FR | Eurosport |
| Wired US | Wired |
| Eurosport | Eurosport |
| EuroNews | Euronews |
| Wired | Wired |
+--------------+----------------+
我想了解如何复制我在 Power Query 中使用的这段代码:
搜索前 4 个字符
if Text.Start([网站名称],4) = "WIRE" then "Wired" else
搜索最后 3 个字符
if Text.End([站点名称],3) = "One" then "RadiumOne" else
如果找不到匹配项,则添加“Rest”
不必区分大小写。
您可以使用 apply
方法和功能,例如:
def handle_text(txt):
if txt.lower()[:4] == 'wire':
return 'Wired'
elif txt.lower()[-3:] == 'one':
return 'RadiumOne'
return 'Rest'
df['Publisher Name'] = df['Site Name'].apply(handle_text)
我认为你可以使用双 numpy.where
with conditions created with indexing with str:
s = df['Site Name'].str.lower()
df['new'] = np.where(s.str[:4] == 'wire', 'Wired',
np.where(s.str[-3:] == 'one', 'RadiumOne', 'Rest'))
但如果需要你的输出,还需要 split
and title
:
df['new1'] = np.where(s.str[:4] == 'wire', 'Wired',
np.where(s.str[-3:] == 'one', 'RadiumOne', s.str.split().str[0].str.title()))
print (df)
Site Name new new1
0 Radium One RadiumOne RadiumOne
1 Euronews Rest Euronews
2 EUROSPORT Rest Eurosport
3 WIRED Wired Wired
4 RadiumOne RadiumOne RadiumOne
5 Eurosport FR Rest Eurosport
6 Wired US Wired Wired
7 Eurosport Rest Eurosport
8 EuroNews Rest Euronews
9 Wired Wired Wired