将对象转换为在 pandas 中浮动并替换 $ 符号
converting an object to float in pandas along with replacing a $ sign
我是 Pandas 的新手,我正在做一个项目,其中有一列如下所示:
AverageTotalPayments
64.38
55.75
21.90
ETC
我正在尝试从中获取成本因素,成本可能高于 7000。首先,此列是一个对象。因此,我知道我可能无法将它与数字进行比较。我的代码如下所示:
import pandas as pd
health_data = pd.read_csv("inpatientCharges.csv")
state = input("What is your state: ")
issue = input("What is your issue: ")
#This line of code will create a new dataframe based on the two letter state code
state_data = health_data[(health_data.ProviderState == state)]
#With the new data set I search it for the injury the person has.
issue_data=state_data[state_data.DRGDefinition.str.contains(issue.upper())]
#I then make it replace the $ sign with a '' so I have a number. I also believe at this point my code may be starting to break down.
issue_data = issue_data['AverageTotalPayments'].str.replace('$', '')
#Since the previous line took out the $ I convert it from an object to a float
issue_data = issue_data[['AverageTotalPayments']].astype(float)
#I attempt to print out the values.
cost = issue_data[(issue_data.AverageTotalPayments >= 10000)]
print(cost)
当我 运行 这段代码时,我只是简单地取回了 nan。不完全是我想要的。任何错误的帮助都会很棒!提前谢谢你。
考虑 pd.Series
s
s
0 64.38
1 55.75
2 21.90
Name: AverageTotalPayments, dtype: object
这将获取浮点值
pd.to_numeric(s.str.replace('$', ''), 'ignore')
0 7064.38
1 7455.75
2 6921.90
Name: AverageTotalPayments, dtype: float64
过滤器s
s[pd.to_numeric(s.str.replace('$', ''), 'ignore') > 7000]
0 64.38
1 55.75
Name: AverageTotalPayments, dtype: object
试试这个:
In [83]: df
Out[83]:
AverageTotalPayments
0 64.38
1 55.75
2 21.90
3 aaa
In [84]: df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000
Out[84]:
0 True
1 True
2 False
3 False
Name: AverageTotalPayments, dtype: bool
In [85]: df[df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000]
Out[85]:
AverageTotalPayments
0 64.38
1 55.75
我是 Pandas 的新手,我正在做一个项目,其中有一列如下所示:
AverageTotalPayments
64.38
55.75
21.90
ETC
我正在尝试从中获取成本因素,成本可能高于 7000。首先,此列是一个对象。因此,我知道我可能无法将它与数字进行比较。我的代码如下所示:
import pandas as pd
health_data = pd.read_csv("inpatientCharges.csv")
state = input("What is your state: ")
issue = input("What is your issue: ")
#This line of code will create a new dataframe based on the two letter state code
state_data = health_data[(health_data.ProviderState == state)]
#With the new data set I search it for the injury the person has.
issue_data=state_data[state_data.DRGDefinition.str.contains(issue.upper())]
#I then make it replace the $ sign with a '' so I have a number. I also believe at this point my code may be starting to break down.
issue_data = issue_data['AverageTotalPayments'].str.replace('$', '')
#Since the previous line took out the $ I convert it from an object to a float
issue_data = issue_data[['AverageTotalPayments']].astype(float)
#I attempt to print out the values.
cost = issue_data[(issue_data.AverageTotalPayments >= 10000)]
print(cost)
当我 运行 这段代码时,我只是简单地取回了 nan。不完全是我想要的。任何错误的帮助都会很棒!提前谢谢你。
考虑 pd.Series
s
s
0 64.38
1 55.75
2 21.90
Name: AverageTotalPayments, dtype: object
这将获取浮点值
pd.to_numeric(s.str.replace('$', ''), 'ignore')
0 7064.38
1 7455.75
2 6921.90
Name: AverageTotalPayments, dtype: float64
过滤器s
s[pd.to_numeric(s.str.replace('$', ''), 'ignore') > 7000]
0 64.38
1 55.75
Name: AverageTotalPayments, dtype: object
试试这个:
In [83]: df
Out[83]:
AverageTotalPayments
0 64.38
1 55.75
2 21.90
3 aaa
In [84]: df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000
Out[84]:
0 True
1 True
2 False
3 False
Name: AverageTotalPayments, dtype: bool
In [85]: df[df.AverageTotalPayments.str.extract(r'.*?(\d+\.*\d*)', expand=False).astype(float) > 7000]
Out[85]:
AverageTotalPayments
0 64.38
1 55.75