正则表达式通过剥离列名来重命名列

Regular expression to rename the column by stripping the column name

我的 df 有很多列,每列都有重复的值,因为它是调查数据。例如,我的数据如下所示:

df:

 Q36r9: sales platforms - Before purchasing a new car         Q36r32: Advertising letters - Before purchasing a new car
        Not Selected                                                                         Selected

所以我想从列名中删除文本。例如,我想从第一列中获取“:”和“-”之间的文本。所以它应该是这样的: "sales platform" 在第二部分我想转换列的值, "selected" 应该用列的名称和 "Not Selected" 更改为 NaN

所以期望的输出是这样的:

sales platforms                                       Advertising letters
      NaN                                             Advertising letters

已编辑:另一个问题,如果我有这样的列名:

Q40r1c3: WeChat - Looking for a new car - And now if you think again  - Which social media platforms or sources would you use in each situation?

如果我只是想在“:”和“-”之间找到一些东西。它应该提取 "WeChat"

IIUC,

我们可以利用一些正则表达式和贪婪匹配,使用 .* 匹配定义模式

之间的所有内容
import re

df.columns = [re.search(':(.*)-',i).group(1) for i in df.columns.str.strip()]

print(df.columns)

   sales platforms   Advertising letters 
0      Not Selected                  None

编辑:

我们可以使用贪心匹配+?

+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)

Q36r9: sales platforms - Before purchasing a new car    Q40r1c3: WeChat - Looking for a new car - And now if you think again - Which social media platforms or sources would you use in each situation?
0                                                       1


import re

[re.search(':(.+?)-',i).group(1).strip() for i in df.columns]

['sales platforms', 'WeChat']