将存储为字符的分数转换为 float64
Convert fractions stored as characters to float64
假设我们有这个 df:
df = pd.DataFrame({
'value': ['18 4/2', '2 2/2', '8.5'],
'country': ['USA', 'Canada', 'Switzerland']
})
Out:
value country
0 18 4/2 USA
1 2 2/2 Canada
2 8.5 Switzerland
注意 'value' 列存储一个 object 类型:
df.dtypes
Out:
value object
country object
dtype: object
我的问题:我们如何将 'value' 转换为十进制,同时将数据类型更改为 float64?请注意,一个值 (8.5) 已经是小数,因此应保持不变。期望的输出:
desired_output = pd.DataFrame({
'value': [20, 3, 8.5],
'country': ['USA', 'Canada', 'Switzerland']
})
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
desired_output.dtypes
value float64
country object
dtype: object
你可以 replace
带符号 + 的 space 然后 apply
eval
print(df['value'].str.replace(' ', '+').apply(eval))
0 20.0
1 3.0
2 8.5
Name: value, dtype: float64
或使用pd.eval
df['value'] = pd.eval(df['value'].str.replace(' ', '+')).astype(float)
print(df)
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
我会接受@Ben.T的回答,但由于我已经试过,所以这是我的尝试。
>>> import pandas as pd
>>> df = pd.DataFrame({
... 'value': ['18 4/2', '2 2/2', '8.5'],
... 'country': ['USA', 'Canada', 'Switzerland']
... })
>>> df
value country
0 18 4/2 USA
1 2 2/2 Canada
2 8.5 Switzerland
>>> def foo(s):
... try:
... return float(s)
... except ValueError:
... pass
... w, f = s.split()
... n, d = f.split('/')
... w, n, d = map(int, (w, n, d))
... return w + n / d
...
>>> foo('1')
1.0
>>> foo('18 4/2')
20.0
>>> df['value'] = df['value'].apply(foo)
>>> df
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 value 3 non-null float64
1 country 3 non-null object
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes
>>>
假设我们有这个 df:
df = pd.DataFrame({
'value': ['18 4/2', '2 2/2', '8.5'],
'country': ['USA', 'Canada', 'Switzerland']
})
Out:
value country
0 18 4/2 USA
1 2 2/2 Canada
2 8.5 Switzerland
注意 'value' 列存储一个 object 类型:
df.dtypes
Out:
value object
country object
dtype: object
我的问题:我们如何将 'value' 转换为十进制,同时将数据类型更改为 float64?请注意,一个值 (8.5) 已经是小数,因此应保持不变。期望的输出:
desired_output = pd.DataFrame({
'value': [20, 3, 8.5],
'country': ['USA', 'Canada', 'Switzerland']
})
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
desired_output.dtypes
value float64
country object
dtype: object
你可以 replace
带符号 + 的 space 然后 apply
eval
print(df['value'].str.replace(' ', '+').apply(eval))
0 20.0
1 3.0
2 8.5
Name: value, dtype: float64
或使用pd.eval
df['value'] = pd.eval(df['value'].str.replace(' ', '+')).astype(float)
print(df)
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
我会接受@Ben.T的回答,但由于我已经试过,所以这是我的尝试。
>>> import pandas as pd
>>> df = pd.DataFrame({
... 'value': ['18 4/2', '2 2/2', '8.5'],
... 'country': ['USA', 'Canada', 'Switzerland']
... })
>>> df
value country
0 18 4/2 USA
1 2 2/2 Canada
2 8.5 Switzerland
>>> def foo(s):
... try:
... return float(s)
... except ValueError:
... pass
... w, f = s.split()
... n, d = f.split('/')
... w, n, d = map(int, (w, n, d))
... return w + n / d
...
>>> foo('1')
1.0
>>> foo('18 4/2')
20.0
>>> df['value'] = df['value'].apply(foo)
>>> df
value country
0 20.0 USA
1 3.0 Canada
2 8.5 Switzerland
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 value 3 non-null float64
1 country 3 non-null object
dtypes: float64(1), object(1)
memory usage: 176.0+ bytes
>>>