以第二列作为分隔符的拆分数据框列
Split dataframe column with second column as delimiter
我想通过使用同一行中第二列的值将一列拆分为两列,因此第二列值用作拆分分隔符。
我收到错误 TypeError: 'Series' objects are mutable, thus they cannot be hashed
,这是有道理的,它接收一个系列,而不是单个值,但我不确定如何隔离到第二列的单行值。
示例数据:
title_location delimiter
0 Doctor - ABC - Los Angeles, CA - ABC -
1 Lawyer - ABC - Atlanta, GA - ABC -
2 Athlete - XYZ - Jacksonville, FL - XYZ -
代码:
bigdata[['title', 'location']] = bigdata['title_location'].str.split(bigdata['delimiter'], expand=True)
期望的输出:
title_location delimiter title location
0 Doctor - ABC - Los Angeles, CA - ABC - Doctor Los Angeles, CA
1 Lawyer - ABC - Atlanta, GA - ABC - Lawyer Atlanta, GA
2 Athlete - XYZ - Jacksonville, FL - XYZ - Athlete Jacksonville, FL
让我们试试 zip
然后 join
返回
df = df.join(pd.DataFrame([x.split(y) for x ,y in zip(df.title_location,df.delimiter)],index=df.index,columns=['Title','Location']))
df
Out[200]:
title_location delimiter Title Location
0 Doctor - ABC - Los Angeles, CA - ABC - Doctor Los Angeles, CA
1 Lawyer - ABC - Atlanta, GA - ABC - Lawyer Atlanta, GA
2 Athlete - XYZ - Jacksonville, FL - XYZ - Athlete Jacksonville, FL
尝试apply
。
bigdata[['title', 'location']]=bigdata.apply(func=lambda row: row['title_location'].split(row['delimiter']), axis=1, result_type="expand")
我想通过使用同一行中第二列的值将一列拆分为两列,因此第二列值用作拆分分隔符。
我收到错误 TypeError: 'Series' objects are mutable, thus they cannot be hashed
,这是有道理的,它接收一个系列,而不是单个值,但我不确定如何隔离到第二列的单行值。
示例数据:
title_location delimiter
0 Doctor - ABC - Los Angeles, CA - ABC -
1 Lawyer - ABC - Atlanta, GA - ABC -
2 Athlete - XYZ - Jacksonville, FL - XYZ -
代码:
bigdata[['title', 'location']] = bigdata['title_location'].str.split(bigdata['delimiter'], expand=True)
期望的输出:
title_location delimiter title location
0 Doctor - ABC - Los Angeles, CA - ABC - Doctor Los Angeles, CA
1 Lawyer - ABC - Atlanta, GA - ABC - Lawyer Atlanta, GA
2 Athlete - XYZ - Jacksonville, FL - XYZ - Athlete Jacksonville, FL
让我们试试 zip
然后 join
返回
df = df.join(pd.DataFrame([x.split(y) for x ,y in zip(df.title_location,df.delimiter)],index=df.index,columns=['Title','Location']))
df
Out[200]:
title_location delimiter Title Location
0 Doctor - ABC - Los Angeles, CA - ABC - Doctor Los Angeles, CA
1 Lawyer - ABC - Atlanta, GA - ABC - Lawyer Atlanta, GA
2 Athlete - XYZ - Jacksonville, FL - XYZ - Athlete Jacksonville, FL
尝试apply
。
bigdata[['title', 'location']]=bigdata.apply(func=lambda row: row['title_location'].split(row['delimiter']), axis=1, result_type="expand")