无法拆分列的两个值

Question

我有一个数据集，我正在尝试拆分列位置的值。我拥有的数据集是：- Dataset I have

数据集有 56 个空值，所以我使用以下代码获取这些空值的索引：-

nan = []
for i in range(len(data['location'])):
    if type(data['location'][i]) == float:
        nan.append(i)

完成后我运行另一个循环：-

for i in range(len(data['location'])):
    if i in nan:
        data['city'] = np.nan
    else:
        data['city'] = data['location'][i].split(',')[1]

这给我一个错误提示，

IndexError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15176/2022247788.py in <module>
      3         data['city'] = np.nan
      4     else:
----> 5         data['city'] = data['location'][i].split(',')[1]

IndexError: list index out of range

虽然它给我值，但它没有给出正确的值。正如在位置中看到的，第一个值是 NaN，所以我希望城市中有 NaN，列中的第二个值是 Canandaigua, NY，所以我希望城市中有 NY。

我也试过使用下面的代码直接拆分：-

data[['town','city2']] = data['location'].str.split(',',expand=True)

但出现错误：-

ValueError: Columns must be the same length as key

Answer 1

您可以像这样将城市移到另一列

data['city'] = data.location.str.split(",").str[1]

这将 return 城市，如果不可用则为 NaN

编辑：然后试试这个。

data['city'] = data[~data.location.isna()].location.str.split(",").apply(lambda x: x[0] if len(x) == 1 else x[1])

这将检查拆分字符串的长度是否仅为 1，然后 return 将字符串按原样保存。如果不是，return 是第二个值。

Answer 2

这应该也行

### Comma Condition
comma_condtn = (df['location'].str.contains(',')) & (df['location'].notna())

### Extract city
df.loc[comma_condtn, 'city_2'] = df['location'].apply(lambda x : str(x).split(',').pop())

### Condition without commas
df.loc[df['city_2'].isna(), 'city_2'] = df['location']

无法拆分列的两个值

Unable to split two values of a column

python

dataframe