正则表达式通过剥离列名来重命名列

Question

我的 df 有很多列，每列都有重复的值，因为它是调查数据。例如，我的数据如下所示：

df:

 Q36r9: sales platforms - Before purchasing a new car         Q36r32: Advertising letters - Before purchasing a new car
        Not Selected                                                                         Selected

所以我想从列名中删除文本。例如，我想从第一列中获取“：”和“-”之间的文本。所以它应该是这样的： "sales platform" 在第二部分我想转换列的值， "selected" 应该用列的名称和 "Not Selected" 更改为 NaN

所以期望的输出是这样的：

sales platforms                                       Advertising letters
      NaN                                             Advertising letters

已编辑：另一个问题，如果我有这样的列名：

Q40r1c3: WeChat - Looking for a new car - And now if you think again  - Which social media platforms or sources would you use in each situation?

如果我只是想在“:”和“-”之间找到一些东西。它应该提取 "WeChat"

Answer 1

IIUC,

我们可以利用一些正则表达式和贪婪匹配，使用 .* 匹配定义模式

之间的所有内容

import re

df.columns = [re.search(':(.*)-',i).group(1) for i in df.columns.str.strip()]

print(df.columns)

   sales platforms   Advertising letters 
0      Not Selected                  None

编辑：

我们可以使用贪心匹配+?

+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)

Q36r9: sales platforms - Before purchasing a new car    Q40r1c3: WeChat - Looking for a new car - And now if you think again - Which social media platforms or sources would you use in each situation?
0                                                       1


import re

[re.search(':(.+?)-',i).group(1).strip() for i in df.columns]

['sales platforms', 'WeChat']

正则表达式通过剥离列名来重命名列

Regular expression to rename the column by stripping the column name

python

rename

strip

pandas

编辑：