正则表达式通过剥离列名来重命名列
Regular expression to rename the column by stripping the column name
我的 df 有很多列,每列都有重复的值,因为它是调查数据。例如,我的数据如下所示:
df:
Q36r9: sales platforms - Before purchasing a new car Q36r32: Advertising letters - Before purchasing a new car
Not Selected Selected
所以我想从列名中删除文本。例如,我想从第一列中获取“:”和“-”之间的文本。所以它应该是这样的: "sales platform" 在第二部分我想转换列的值, "selected" 应该用列的名称和 "Not Selected" 更改为 NaN
所以期望的输出是这样的:
sales platforms Advertising letters
NaN Advertising letters
已编辑:另一个问题,如果我有这样的列名:
Q40r1c3: WeChat - Looking for a new car - And now if you think again - Which social media platforms or sources would you use in each situation?
如果我只是想在“:”和“-”之间找到一些东西。它应该提取 "WeChat"
IIUC,
我们可以利用一些正则表达式和贪婪匹配,使用 .*
匹配定义模式
之间的所有内容
import re
df.columns = [re.search(':(.*)-',i).group(1) for i in df.columns.str.strip()]
print(df.columns)
sales platforms Advertising letters
0 Not Selected None
编辑:
我们可以使用贪心匹配+?
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
Q36r9: sales platforms - Before purchasing a new car Q40r1c3: WeChat - Looking for a new car - And now if you think again - Which social media platforms or sources would you use in each situation?
0 1
import re
[re.search(':(.+?)-',i).group(1).strip() for i in df.columns]
['sales platforms', 'WeChat']
我的 df 有很多列,每列都有重复的值,因为它是调查数据。例如,我的数据如下所示:
df:
Q36r9: sales platforms - Before purchasing a new car Q36r32: Advertising letters - Before purchasing a new car
Not Selected Selected
所以我想从列名中删除文本。例如,我想从第一列中获取“:”和“-”之间的文本。所以它应该是这样的: "sales platform" 在第二部分我想转换列的值, "selected" 应该用列的名称和 "Not Selected" 更改为 NaN
所以期望的输出是这样的:
sales platforms Advertising letters
NaN Advertising letters
已编辑:另一个问题,如果我有这样的列名:
Q40r1c3: WeChat - Looking for a new car - And now if you think again - Which social media platforms or sources would you use in each situation?
如果我只是想在“:”和“-”之间找到一些东西。它应该提取 "WeChat"
IIUC,
我们可以利用一些正则表达式和贪婪匹配,使用 .*
匹配定义模式
import re
df.columns = [re.search(':(.*)-',i).group(1) for i in df.columns.str.strip()]
print(df.columns)
sales platforms Advertising letters
0 Not Selected None
编辑:
我们可以使用贪心匹配+?
+? Quantifier — Matches between one and unlimited times, as few times as possible, expanding as needed (lazy)
Q36r9: sales platforms - Before purchasing a new car Q40r1c3: WeChat - Looking for a new car - And now if you think again - Which social media platforms or sources would you use in each situation?
0 1
import re
[re.search(':(.+?)-',i).group(1).strip() for i in df.columns]
['sales platforms', 'WeChat']