pandas 数据框中每一行从左边提取子字符串到特定字符？

Question

我有一个包含字符串集合的数据框。这些字符串看起来像这样：

"oop9-hg78-op67_457y"

我需要删除从下划线到末尾的所有内容，以便将此数据与另一组匹配。我的尝试看起来像这样：

df['column'] = df['column'].str[0:'_']

我试过在这个语句中使用 .find() ，但似乎没有任何效果。有人有什么想法吗？任何帮助将不胜感激！

Answer 1

您可以尝试 .str.split 然后使用 .str 或 .str.extract

访问列表

df['column'] = df['column'].str.split('_').str[0]

# or

df['column'] = df['column'].str.extract('^([^_]*)_')

print(df)

           column
0  oop9-hg78-op67

Answer 2

df['column'] = df['column'].str.extract('_', expand=False)

如果需要其他选项，也可以使用。

添加到@Ynjxsjmh 上面提供的解决方案

Answer 3

您可以使用 str.extract:

df['column'] = df['column'df].str.extract(r'(^[^_]+)')

输出（为清楚起见作为单独的列）：

                column         column2
0  oop9-hg78-op67_457y  oop9-hg78-op67

正则表达式：

(       # start capturing group
^       # match start of string
[^_]+   # one or more non-underscore
)       # end capturing group

Extract substring from left to a specific character for each row in a pandas dataframe?