按 Python 中的大字串剥离列值

Stripping column values by large string of words in Python

我有一个数据框,它有很多列,但有一列状态附加了额外的字符串。该列如下所示:

State
U.S. Natural Gas Number of Residential Consumers (Count)
Alabama Natural Gas Number of Residential Consumers (Count)
Kentucky Natural Gas Number of Residential Consumers (Count)
Mississippi Natural Gas Number of Residential Consumers (Count)
Tennessee Natural Gas Number of Residential Consumers (Count)
Arizona Natural Gas Number of Residential Consumers (Count)
Colorado Natural Gas Number of Residential Consumers (Count)
Idaho Natural Gas Number of Residential Consumers (Count)
Montana Natural Gas Number of Residential Consumers (Count)
Nevada Natural Gas Number of Residential Consumers (Count)
New Mexico Natural Gas Number of Residential Consumers (Count)
.
.
.

我想从每个值中删除 Natural Gas Number of Residential Consumers (Count),这样我就只剩下状态了。我试过:

df['State'] = df['State'].map(lambda x:x.strip('Natural Gas Number of Residential Consumers (Count)'))

但这似乎不起作用。它给我这个作为输出:

State
U.S.
A
Kentucky
Mississipp
T
Ariz
""
Idah
M
v
w Mexic
.
.
.

当我想去除像 R 这样的单个字符时,这确实有效 - 使用 x.rstripx.lstrip.

对其进行了测试

使用 lambda 函数进行映射是否是从我的所有值中去除这些长字符串的正确方法?我不确定这样做的最佳方法是什么。

您可以尝试 replace,然后是 strip:

df['clean'] = df['State'].str.replace('Natural Gas Number of Residential Consumers (Count)', '', regex=False).str.strip()
print(df.clean)

输出

0            U.S.
1         Alabama
2        Kentucky
3     Mississippi
4       Tennessee
5         Arizona
6        Colorado
7           Idaho
8         Montana
9          Nevada
10     New Mexico
Name: clean, dtype: object

这里还有一个更简单的方法。与其使用地图,不如使用 Apply:

df['State']=df['State'].apply(lambda x:x.split(sep='Natural')[0])

我给出“自然”的原因(意味着自然之前的 space),因此它不会在最终结果中添加白色space。这给了我以下输出:

        State
0   U.S.
1   Alabama
2   Kentucky
3   Mississippi
4   Tennessee
5   Arizona
6   Colorado
7   Idaho
8   Montana
9   Nevada
10  New Mexico