混合格式号码
Mixed Format Numbers
在工作中,我们使用 oracle sql 数据库,有时(很少,但会发生),数据库以错误的格式提供数据,如下所示:
Sales
Price
s1
10.00
s2
10,00
s3
10
所有行的价格相同,但格式不同,我如何使用 python 将价格列标准化为相同的格式?
遵循使用的代码:
import pandas as pd
import cx_Oracle
import numpy as np
cx_Oracle.init_oracle_client(path to oracle client)
def connect(user, password, host):
connection = cx_Oracle.connect(user=user, password = password, dsn = host)
cursor = connection.cursor()
return cursor
def sql(query,cursor):
cursor.execute(query)
result = cursor.fetchall()
cols = [i[0] for i in cursor.description]
df = pd.DataFrame(result, columns=[cols])
return df
query = """
querie
"""
df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)
查看您的代码,问题是 python 将逗号识别为小数点分隔符。
因此,您可以更改 cursor.fetchall() 响应中的逗号,然后构建数据框。
import pandas as pd
import cx_Oracle
import numpy as np
cx_Oracle.init_oracle_client(path to oracle client)
def connect(user, password, host):
connection = cx_Oracle.connect(user=user, password = password, dsn = host)
cursor = connection.cursor()
return cursor
def sql(query,cursor):
cursor.execute(query)
result = cursor.fetchall()
new_result = [[str(i).replace(',', '.') for i in r] for r in result]
cols = [i[0] for i in cursor.description]
df = pd.DataFrame(result, columns=[cols])
return df
query = """
querie
"""
df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)
如果仍然将价格列识别为字符串,您可以使用以下方式进行转换:
df['Price'] = df['Price'].astype(float)
希望对您有所帮助!
# import
import pandas as pd
# test values
df = pd.DataFrame({'Sales': ['s1', 's2', 's3', 's4'], 'Price': ['10.00', '10,00', 10, 9]})
# convert all to string/object type for consistency
# can comment this out if all values are already string/object type
df['Price'] = df['Price'].astype(str)
# replace comma with period
df['Price'] = df['Price'].str.replace(',', '.')
# get index of values that do not have decimal places (period)
index = df[df['Price'].str.contains('\.') == False].index
# pad decimal to values that do not have decimal places
df.loc[index, 'Price'] = df.loc[index, 'Price'] + '.00'
作为最后一步,如果需要,您可以选择将值转换回 float/decimal
最简单的方法是注入类型处理程序。以下交换逗号和句点,但您可以根据需要进行调整。在纯 cx_Oracle 示例中:
def type_handler(cursor, name, default_type, size, precision, scale):
if default_type == oracledb.DB_TYPE_NUMBER:
return cursor.var(oracledb.DB_TYPE_VARCHAR, arraysize=cursor.arraysize,
outconverter=lambda v: v.replace('.', ','))
conn.outputtypehandler = type_handler
cursor.execute("select 2.5 from dual")
在工作中,我们使用 oracle sql 数据库,有时(很少,但会发生),数据库以错误的格式提供数据,如下所示:
Sales | Price |
---|---|
s1 | 10.00 |
s2 | 10,00 |
s3 | 10 |
所有行的价格相同,但格式不同,我如何使用 python 将价格列标准化为相同的格式?
遵循使用的代码:
import pandas as pd
import cx_Oracle
import numpy as np
cx_Oracle.init_oracle_client(path to oracle client)
def connect(user, password, host):
connection = cx_Oracle.connect(user=user, password = password, dsn = host)
cursor = connection.cursor()
return cursor
def sql(query,cursor):
cursor.execute(query)
result = cursor.fetchall()
cols = [i[0] for i in cursor.description]
df = pd.DataFrame(result, columns=[cols])
return df
query = """
querie
"""
df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)
查看您的代码,问题是 python 将逗号识别为小数点分隔符。
因此,您可以更改 cursor.fetchall() 响应中的逗号,然后构建数据框。
import pandas as pd
import cx_Oracle
import numpy as np
cx_Oracle.init_oracle_client(path to oracle client)
def connect(user, password, host):
connection = cx_Oracle.connect(user=user, password = password, dsn = host)
cursor = connection.cursor()
return cursor
def sql(query,cursor):
cursor.execute(query)
result = cursor.fetchall()
new_result = [[str(i).replace(',', '.') for i in r] for r in result]
cols = [i[0] for i in cursor.description]
df = pd.DataFrame(result, columns=[cols])
return df
query = """
querie
"""
df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)
如果仍然将价格列识别为字符串,您可以使用以下方式进行转换:
df['Price'] = df['Price'].astype(float)
希望对您有所帮助!
# import
import pandas as pd
# test values
df = pd.DataFrame({'Sales': ['s1', 's2', 's3', 's4'], 'Price': ['10.00', '10,00', 10, 9]})
# convert all to string/object type for consistency
# can comment this out if all values are already string/object type
df['Price'] = df['Price'].astype(str)
# replace comma with period
df['Price'] = df['Price'].str.replace(',', '.')
# get index of values that do not have decimal places (period)
index = df[df['Price'].str.contains('\.') == False].index
# pad decimal to values that do not have decimal places
df.loc[index, 'Price'] = df.loc[index, 'Price'] + '.00'
作为最后一步,如果需要,您可以选择将值转换回 float/decimal
最简单的方法是注入类型处理程序。以下交换逗号和句点,但您可以根据需要进行调整。在纯 cx_Oracle 示例中:
def type_handler(cursor, name, default_type, size, precision, scale):
if default_type == oracledb.DB_TYPE_NUMBER:
return cursor.var(oracledb.DB_TYPE_VARCHAR, arraysize=cursor.arraysize,
outconverter=lambda v: v.replace('.', ','))
conn.outputtypehandler = type_handler
cursor.execute("select 2.5 from dual")