使用 python pandas 将一列拆分为两列

Question

我有一个城市 df，显示为：

| id | location         |
|----|------------------|
| 1  | New York (NY)    |
| 2  | Los Angeles (CA) |
| 3  | Houston (TX)     |

我希望使用某种 split/strip 给我类似

| id | city             | state |
|----|------------------|-------|
| 1  | New York         |   NY  |
| 2  | Los Angeles      |   CA  |
| 3  | Houston          |   TX  |

或者即使是三列，一个是原创的，另外两个是代码制作的。我已经尝试过类似的东西：

df[['city', 'state']] = df['location'].str.split("(", expand=True)
df['state'] = df['state'].str.strip(")")

这行得通，但没那么多，因为每个城市名称后面都有一个空白 space，不应该。如果我搜索一个城市，例如：

df[df['city'] == 'Houston']

它不会 return 什么，但我必须编写如下代码：

df[df['city'] == 'Houston '] # note the empty space after code

给我一些有用的东西，但是当我进行合并或类似的事情时，这种方式会让我头疼。

那么，有人有处理这段代码的一些技巧吗？我在互联网上找不到任何有用的东西。它总是一个简单的拆分，或者一个简单的条带。但我相信有一种更智能的模式可以做到这一点。

Answer 1

嗯，是的，为什么不 df['city'] = df['city'].strip()？

Answer 2

使用str.extract:

df = df.join(df.pop('location').str.extract(r'(.*)\s*\((.*)\)')
               .rename(columns={0: 'location', 1: 'state'}))
print(df)

# Output
   id      location state
0   1     New York     NY
1   2  Los Angeles     CA
2   3      Houston     TX

使用 python pandas 将一列拆分为两列

Split one column into two columns with python pandas

python

split

strip

dataframe

pandas