使用 .apply on dataframe 从一列生成 3 列

Question

我想从每一行中提取一些数据，并创建现有或新数据帧的新列，而无需重复执行相同的 re 操作。匹配。

数据框的一个条目如下所示：

00:00 Someones_name: some text goes here

我有一个正则表达式，它成功地包含了我需要的 3 个组：

re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)

我遇到的问题是，如何获取 matched_part[1]、[2] 和 [3] 而无需再次实际匹配每个新列。

我不想要的解决方案是：

new_df['time'] = old_df['text'].apply(function1)`
new_df['name'] = old_df['text'].apply(function2)`
new_df['text'] = old_df['text'].apply(function3)`

def function1(x):
  return re.match(r"^(\d{2}:\d{2}) (.*): (.*)$", x)[1]

Answer 1

您可以将 str.extract 与您的模式一起使用

df[['time','name', 'text']] = df['col1'].str.extract(r"^(\d{2}:\d{2}) (.*): (.*)$")
print(df)
#                                        col1   time           name  \
# 0  00:00 Someones_name: some text goes here  00:00  Someones_name   

#                   text  
# 0  some text goes here

使用 .apply on dataframe 从一列生成 3 列

Generating 3 columns from one with .apply on dataframe

python

series

dataframe

pandas

data-science