编辑包含重复特殊字符的列名
edit columnnames that include duplicate special characters
我有一些列名称在不同的空格处包含两个问号,例如'how old were you? when you started university?' - 我需要确定哪些列中有两个问号。欢迎任何提示!谢谢
数据
df = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'how old were you? when you started university?': [1,2,3,4,5], 'how old were you when you finished university?': [1,2,3,4,5], 'at what age? did you start your first job?': [1,2,3,4,5]})
预期输出
df1 = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'how old were you when you finished university?': [1,2,3,4,5]})
如果要获取所有有多个问号的列,可以使用如下:
[c for c in df.columns if c.count("?")>1]
编辑:如果你想替换多余的“?”但保留结尾“?”,使用这个:
df.rename(columns = {c: c.replace("?", "")+"?" for c in df.columns if c.find("?")>0})
一个列表理解的想法:
df = df[[c for c in df.columns if c.count("?") < 2]]
print (df)
id how old were you when you finished university?
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
您可以使用布尔索引:
x = df.loc[:, df.columns.str.count(r"\?") < 2]
print(x)
打印:
id how old were you when you finished university?
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
df = df.drop([col for col in df.columns if col.count("?")>1], axis=1)
我有一些列名称在不同的空格处包含两个问号,例如'how old were you? when you started university?' - 我需要确定哪些列中有两个问号。欢迎任何提示!谢谢
数据
df = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'how old were you? when you started university?': [1,2,3,4,5], 'how old were you when you finished university?': [1,2,3,4,5], 'at what age? did you start your first job?': [1,2,3,4,5]})
预期输出
df1 = pd.DataFrame(data={'id': [1, 2, 3, 4, 5], 'how old were you when you finished university?': [1,2,3,4,5]})
如果要获取所有有多个问号的列,可以使用如下:
[c for c in df.columns if c.count("?")>1]
编辑:如果你想替换多余的“?”但保留结尾“?”,使用这个:
df.rename(columns = {c: c.replace("?", "")+"?" for c in df.columns if c.find("?")>0})
一个列表理解的想法:
df = df[[c for c in df.columns if c.count("?") < 2]]
print (df)
id how old were you when you finished university?
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
您可以使用布尔索引:
x = df.loc[:, df.columns.str.count(r"\?") < 2]
print(x)
打印:
id how old were you when you finished university?
0 1 1
1 2 2
2 3 3
3 4 4
4 5 5
df = df.drop([col for col in df.columns if col.count("?")>1], axis=1)