使用 Pandas 将整个数据框从小写转换为大写

Convert whole dataframe from lower case to upper case with Pandas

我有一个如下所示的数据框:

# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'],
            'company': ['1st', '1st', '2nd', '2nd'],
            'deaths': ['kkk', 52, '25', 616],
            'battles': [5, '42', 2, 2],
            'size': ['l', 'll', 'l', 'm']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])

我的目标是将数据帧内的每个字符串都转换为大写,如下所示:

注意:所有数据类型都是对象,不能更改;输出必须包含所有对象。我想避免逐一转换每一列...我想可能在整个数据帧上都这样做。

到目前为止我尝试过这样做但没有成功

df.str.upper()

astype() will cast each series to the dtype object (string) and then call the str() method on the converted series to get the string literally and call the function upper()就可以了。请注意,在此之后,所有列的数据类型都会更改为对象。

In [17]: df
Out[17]: 
     regiment company deaths battles size
0  Nighthawks     1st    kkk       5    l
1  Nighthawks     1st     52      42   ll
2  Nighthawks     2nd     25       2    l
3  Nighthawks     2nd    616       2    m

In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

您稍后可以使用 to_numeric():

再次将 'battles' 列转换为数字
In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())

In [43]: df2['battles'] = pd.to_numeric(df2['battles'])

In [44]: df2
Out[44]: 
     regiment company deaths  battles size
0  NIGHTHAWKS     1ST    KKK        5    L
1  NIGHTHAWKS     1ST     52       42   LL
2  NIGHTHAWKS     2ND     25        2    L
3  NIGHTHAWKS     2ND    616        2    M

In [45]: df2.dtypes
Out[45]: 
regiment    object
company     object
deaths      object
battles      int64
size        object
dtype: object

由于 str 仅适用于系列,您可以将其单独应用于每一列然后连接:

In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
Out[6]: 
     regiment company deaths battles size
0  NIGHTHAWKS     1ST    KKK       5    L
1  NIGHTHAWKS     1ST     52      42   LL
2  NIGHTHAWKS     2ND     25       2    L
3  NIGHTHAWKS     2ND    616       2    M

编辑:性能比较

In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper())
100 loops, best of 3: 3.32 ms per loop

In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
100 loops, best of 3: 3.32 ms per loop

两个答案在小型数据帧上的表现相同。

In [15]: df = pd.concat(10000 * [df])

In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
10 loops, best of 3: 104 ms per loop

In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper())
10 loops, best of 3: 130 ms per loop

在大型数据框上,我的回答速度稍快。

可以通过以下applymap方法解决:

df = df.applymap(lambda s: s.lower() if type(s) == str else s)

如果你想保存 dtype 使用 isinstance(obj,type)

df.apply(lambda x: x.str.upper().str.strip() if isinstance(x, object) else x)

试试这个

df2 = df2.apply(lambda x: x.str.upper() if x.dtype == "object" else x)  

循环非常慢,而不是对行中的每个单元格使用应用函数,尝试获取列表中的列名称,然后遍历列列表以将每个列文本转换为小写。

下面的代码是矢量运算,比apply函数更快。

for columns in dataset.columns:
    dataset[columns] = dataset[columns].str.lower()