使用 Pandas 将整个数据框从小写转换为大写
Convert whole dataframe from lower case to upper case with Pandas
我有一个如下所示的数据框:
# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'],
'company': ['1st', '1st', '2nd', '2nd'],
'deaths': ['kkk', 52, '25', 616],
'battles': [5, '42', 2, 2],
'size': ['l', 'll', 'l', 'm']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])
我的目标是将数据帧内的每个字符串都转换为大写,如下所示:
注意:所有数据类型都是对象,不能更改;输出必须包含所有对象。我想避免逐一转换每一列...我想可能在整个数据帧上都这样做。
到目前为止我尝试过这样做但没有成功
df.str.upper()
astype() will cast each series to the dtype object (string) and then call the str() method on the converted series to get the string literally and call the function upper()就可以了。请注意,在此之后,所有列的数据类型都会更改为对象。
In [17]: df
Out[17]:
regiment company deaths battles size
0 Nighthawks 1st kkk 5 l
1 Nighthawks 1st 52 42 ll
2 Nighthawks 2nd 25 2 l
3 Nighthawks 2nd 616 2 m
In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
您稍后可以使用 to_numeric():
再次将 'battles' 列转换为数字
In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())
In [43]: df2['battles'] = pd.to_numeric(df2['battles'])
In [44]: df2
Out[44]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
In [45]: df2.dtypes
Out[45]:
regiment object
company object
deaths object
battles int64
size object
dtype: object
由于 str
仅适用于系列,您可以将其单独应用于每一列然后连接:
In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
Out[6]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
编辑:性能比较
In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper())
100 loops, best of 3: 3.32 ms per loop
In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
100 loops, best of 3: 3.32 ms per loop
两个答案在小型数据帧上的表现相同。
In [15]: df = pd.concat(10000 * [df])
In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
10 loops, best of 3: 104 ms per loop
In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper())
10 loops, best of 3: 130 ms per loop
在大型数据框上,我的回答速度稍快。
可以通过以下applymap
方法解决:
df = df.applymap(lambda s: s.lower() if type(s) == str else s)
如果你想保存 dtype 使用 isinstance(obj,type)
df.apply(lambda x: x.str.upper().str.strip() if isinstance(x, object) else x)
试试这个
df2 = df2.apply(lambda x: x.str.upper() if x.dtype == "object" else x)
循环非常慢,而不是对行中的每个单元格使用应用函数,尝试获取列表中的列名称,然后遍历列列表以将每个列文本转换为小写。
下面的代码是矢量运算,比apply函数更快。
for columns in dataset.columns:
dataset[columns] = dataset[columns].str.lower()
我有一个如下所示的数据框:
# Create an example dataframe about a fictional army
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks'],
'company': ['1st', '1st', '2nd', '2nd'],
'deaths': ['kkk', 52, '25', 616],
'battles': [5, '42', 2, 2],
'size': ['l', 'll', 'l', 'm']}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'deaths', 'battles', 'size'])
我的目标是将数据帧内的每个字符串都转换为大写,如下所示:
注意:所有数据类型都是对象,不能更改;输出必须包含所有对象。我想避免逐一转换每一列...我想可能在整个数据帧上都这样做。
到目前为止我尝试过这样做但没有成功
df.str.upper()
astype() will cast each series to the dtype object (string) and then call the str() method on the converted series to get the string literally and call the function upper()就可以了。请注意,在此之后,所有列的数据类型都会更改为对象。
In [17]: df
Out[17]:
regiment company deaths battles size
0 Nighthawks 1st kkk 5 l
1 Nighthawks 1st 52 42 ll
2 Nighthawks 2nd 25 2 l
3 Nighthawks 2nd 616 2 m
In [18]: df.apply(lambda x: x.astype(str).str.upper())
Out[18]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
您稍后可以使用 to_numeric():
再次将 'battles' 列转换为数字In [42]: df2 = df.apply(lambda x: x.astype(str).str.upper())
In [43]: df2['battles'] = pd.to_numeric(df2['battles'])
In [44]: df2
Out[44]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
In [45]: df2.dtypes
Out[45]:
regiment object
company object
deaths object
battles int64
size object
dtype: object
由于 str
仅适用于系列,您可以将其单独应用于每一列然后连接:
In [6]: pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
Out[6]:
regiment company deaths battles size
0 NIGHTHAWKS 1ST KKK 5 L
1 NIGHTHAWKS 1ST 52 42 LL
2 NIGHTHAWKS 2ND 25 2 L
3 NIGHTHAWKS 2ND 616 2 M
编辑:性能比较
In [10]: %timeit df.apply(lambda x: x.astype(str).str.upper())
100 loops, best of 3: 3.32 ms per loop
In [11]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
100 loops, best of 3: 3.32 ms per loop
两个答案在小型数据帧上的表现相同。
In [15]: df = pd.concat(10000 * [df])
In [16]: %timeit pd.concat([df[col].astype(str).str.upper() for col in df.columns], axis=1)
10 loops, best of 3: 104 ms per loop
In [17]: %timeit df.apply(lambda x: x.astype(str).str.upper())
10 loops, best of 3: 130 ms per loop
在大型数据框上,我的回答速度稍快。
可以通过以下applymap
方法解决:
df = df.applymap(lambda s: s.lower() if type(s) == str else s)
如果你想保存 dtype 使用 isinstance(obj,type)
df.apply(lambda x: x.str.upper().str.strip() if isinstance(x, object) else x)
试试这个
df2 = df2.apply(lambda x: x.str.upper() if x.dtype == "object" else x)
循环非常慢,而不是对行中的每个单元格使用应用函数,尝试获取列表中的列名称,然后遍历列列表以将每个列文本转换为小写。
下面的代码是矢量运算,比apply函数更快。
for columns in dataset.columns:
dataset[columns] = dataset[columns].str.lower()