如何根据列中字符串的索引拆分列，同时使用有效的方法解析所有 Dataframe

Question

我有一列用字符串值填充：

col_1
10500
25020
35640
45440
50454
62150
75410

我希望能够使用从第一列拆分出来的字符串值创建另外两列。我也想要一种有效的方法来做到这一点。

假设结果：

col_1	col_2	col_3
10500	10	500
25020	25	020
35640	35	640
45440	45	440
50454	50	454
62150	62	150
75410	75	410

到目前为止，我一直在尝试进行矢量化，但还没能实现它。

对于拆分部分，我解析行（使用 iterows，我知道必须尽可能避免 iterows。）并创建一个可用于填充新选项卡的列表，但在我的意见这种方式太过时了。

另外，我怎样才能有效地修改每个单元格？比如添加逗号，或者对它们进行操作？

谢谢。

Answer 1

使用 str 访问器：

df = df.join(df['col_1'].astype(str).str.extract('(?P<col_2>\d{2})(?P<col_3>\d{3})'))
print(df)

# Output:
   col_1 col_2 col_3
0  10500    10   500
1  25020    25   020
2  35640    35   640
3  45440    45   440
4  50454    50   454
5  62150    62   150
6  75410    75   410

或简单几步：

df['col_1'] = df['col_1'].astype(str)
df['col_2'] = df['col_1'].str[:2]
df['col_3'] = df['col_1'].str[2:]
print(df)

# Output
   col_1 col_2 col_3
0  10500    10   500
1  25020    25   020
2  35640    35   640
3  45440    45   440
4  50454    50   454
5  62150    62   150
6  75410    75   410

另一个例子：

df['col_1'] = df['col_1'].astype(str)
df['col_4'] = df['col_1'].str[:2] + '-' + df['col_1'].str[2:]
print(df)

# Output
   col_1   col_4
0  10500  10-500
1  25020  25-020
2  35640  35-640
3  45440  45-440
4  50454  50-454
5  62150  62-150
6  75410  75-410

如何根据列中字符串的索引拆分列，同时使用有效的方法解析所有 Dataframe

How to split a columns based on the index of the string in the columns while using a efficient method to parse all the Dataframe

python

vectorization

dataframe

pandas