pandas split function 中将此字符用作拆分列符号时，如何跳过某些符号字符

Question

我有一个如下所示的数据框： Original data

index   string
0        a,b,c,d,e,f
1        a,b,c,d,e,f
2        a,(I,j,k),c,d,e,f

我想成为： To be data

index   col1    col2    col3    col4    col5    col6
0        a       b       c       d       e        f
1        a       b       c       d       e        f
2        a     (I,j,k)   c       d       e        f

Answer 1

您可以在不在括号内的逗号处拆分。然后将结果转换为 DataFrame 并分配给 df 列：

df[['col {}'.format(i) for i in range(1,7)]] =  df['string'].str.split(r",\s*(?![^()]*\))").apply(pd.Series)

输出：

   index             string col 1    col 2 col 3 col 4 col 5 col 6
0      0        a,b,c,d,e,f     a        b     c     d     e     f
1      1        a,b,c,d,e,f     a        b     c     d     e     f
2      2  a,(I,j,k),c,d,e,f     a  (I,j,k)     c     d     e     f

Answer 2

试试这个：

df = df['string'].str.split(r",\s*(?![^()]*\))", expand= True)
df.columns = ['col1','col2','col3','col4','col5','col6']

pandas split function 中将此字符用作拆分列符号时，如何跳过某些符号字符

How to skip some symbol characters, when this character is used as a split column symbol in pandas split function

python

split

dataframe

pandas