如何将最后几列从pandas中的字符串类型转换为整数
How to convert the last several columns to integer from string type in pandas
我有一个名为 df
的 df。
我想将此数据框的最后 10 列从字符串类型转换为整数。我怎样才能用 pythonic 方式做到这一点?
我认为最快的方法是使用 convert_objects
和 select 最后 10 列使用 subscript/slicing 表示法,例如:
In [23]:
df = pd.DataFrame({'a':['1','2','3','4','5']})
df = pd.concat([df]*11, axis=1)
df.columns = list('abcdefghijk')
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a 5 non-null object
b 5 non-null object
c 5 non-null object
d 5 non-null object
e 5 non-null object
f 5 non-null object
g 5 non-null object
h 5 non-null object
i 5 non-null object
j 5 non-null object
k 5 non-null object
dtypes: object(11)
memory usage: 480.0+ bytes
In [21]:
converted = df[df.columns[-10:]].convert_objects(convert_numeric=True)
converted
Out[21]:
b c d e f g h i j k
0 1 1 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 5
In [22]:
converted.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 10 columns):
b 5 non-null int64
c 5 non-null int64
d 5 non-null int64
e 5 non-null int64
f 5 non-null int64
g 5 non-null int64
h 5 non-null int64
i 5 non-null int64
j 5 non-null int64
k 5 non-null int64
dtypes: int64(10)
memory usage: 440.0 bytes
然后您可以直接将结果分配回去:
In [31]:
df[df.columns[-10:]] = converted
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a 5 non-null object
b 5 non-null int64
c 5 non-null int64
d 5 non-null int64
e 5 non-null int64
f 5 non-null int64
g 5 non-null int64
h 5 non-null int64
i 5 non-null int64
j 5 non-null int64
k 5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes
或在 1 班轮中进行:
In [33]:
df[df.columns[-10:]] = df[df.columns[-10:]].convert_objects(convert_numeric=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a 5 non-null object
b 5 non-null int64
c 5 non-null int64
d 5 non-null int64
e 5 non-null int64
f 5 non-null int64
g 5 non-null int64
h 5 non-null int64
i 5 non-null int64
j 5 non-null int64
k 5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes
我有一个名为 df
的 df。
我想将此数据框的最后 10 列从字符串类型转换为整数。我怎样才能用 pythonic 方式做到这一点?
我认为最快的方法是使用 convert_objects
和 select 最后 10 列使用 subscript/slicing 表示法,例如:
In [23]:
df = pd.DataFrame({'a':['1','2','3','4','5']})
df = pd.concat([df]*11, axis=1)
df.columns = list('abcdefghijk')
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a 5 non-null object
b 5 non-null object
c 5 non-null object
d 5 non-null object
e 5 non-null object
f 5 non-null object
g 5 non-null object
h 5 non-null object
i 5 non-null object
j 5 non-null object
k 5 non-null object
dtypes: object(11)
memory usage: 480.0+ bytes
In [21]:
converted = df[df.columns[-10:]].convert_objects(convert_numeric=True)
converted
Out[21]:
b c d e f g h i j k
0 1 1 1 1 1 1 1 1 1 1
1 2 2 2 2 2 2 2 2 2 2
2 3 3 3 3 3 3 3 3 3 3
3 4 4 4 4 4 4 4 4 4 4
4 5 5 5 5 5 5 5 5 5 5
In [22]:
converted.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 10 columns):
b 5 non-null int64
c 5 non-null int64
d 5 non-null int64
e 5 non-null int64
f 5 non-null int64
g 5 non-null int64
h 5 non-null int64
i 5 non-null int64
j 5 non-null int64
k 5 non-null int64
dtypes: int64(10)
memory usage: 440.0 bytes
然后您可以直接将结果分配回去:
In [31]:
df[df.columns[-10:]] = converted
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a 5 non-null object
b 5 non-null int64
c 5 non-null int64
d 5 non-null int64
e 5 non-null int64
f 5 non-null int64
g 5 non-null int64
h 5 non-null int64
i 5 non-null int64
j 5 non-null int64
k 5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes
或在 1 班轮中进行:
In [33]:
df[df.columns[-10:]] = df[df.columns[-10:]].convert_objects(convert_numeric=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 5 entries, 0 to 4
Data columns (total 11 columns):
a 5 non-null object
b 5 non-null int64
c 5 non-null int64
d 5 non-null int64
e 5 non-null int64
f 5 non-null int64
g 5 non-null int64
h 5 non-null int64
i 5 non-null int64
j 5 non-null int64
k 5 non-null int64
dtypes: int64(10), object(1)
memory usage: 480.0+ bytes