混合格式号​​码

Mixed Format Numbers

在工作中,我们使用 oracle sql 数据库,有时(很少,但会发生),数据库以错误的格式提供数据,如下所示:

Sales Price
s1 10.00
s2 10,00
s3 10

所有行的价格相同,但格式不同,我如何使用 python 将价格列标准化为相同的格式?

遵循使用的代码:

import pandas as pd
import cx_Oracle
import numpy as np

cx_Oracle.init_oracle_client(path to oracle client)

def connect(user, password, host):
    connection = cx_Oracle.connect(user=user, password = password, dsn = host)
    cursor = connection.cursor()
    return cursor

def sql(query,cursor):
    cursor.execute(query)
    result = cursor.fetchall()
    cols = [i[0] for i in cursor.description]
    df = pd.DataFrame(result, columns=[cols])
    return df

query = """
querie
"""

df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)

查看您的代码,问题是 python 将逗号识别为小数点分隔符。

因此,您可以更改 cursor.fetchall() 响应中的逗号,然后构建数据框。

import pandas as pd
import cx_Oracle
import numpy as np

cx_Oracle.init_oracle_client(path to oracle client)

def connect(user, password, host):
    connection = cx_Oracle.connect(user=user, password = password, dsn = host)
    cursor = connection.cursor()
    return cursor

def sql(query,cursor):
    cursor.execute(query)
    result = cursor.fetchall()
    new_result = [[str(i).replace(',', '.') for i in r] for r in result]
    cols = [i[0] for i in cursor.description]
    df = pd.DataFrame(result, columns=[cols])
    return df

query = """
querie
"""

df = sql(query,cursor)
df.columns = df.columns.get_level_values(0)

如果仍然将价格列识别为字符串,您可以使用以下方式进行转换:

df['Price'] = df['Price'].astype(float)

希望对您有所帮助!

# import
import pandas as pd

# test values
df = pd.DataFrame({'Sales': ['s1', 's2', 's3', 's4'], 'Price': ['10.00', '10,00', 10, 9]})

# convert all to string/object type for consistency
# can comment this out if all values are already string/object type
df['Price'] = df['Price'].astype(str)
# replace comma with period
df['Price'] = df['Price'].str.replace(',', '.')

# get index of values that do not have decimal places (period)
index = df[df['Price'].str.contains('\.') == False].index

# pad decimal to values that do not have decimal places
df.loc[index, 'Price'] = df.loc[index, 'Price'] + '.00'

作为最后一步,如果需要,您可以选择将值转换回 float/decimal

最简单的方法是注入类型处理程序。以下交换逗号和句点,但您可以根据需要进行调整。在纯 cx_Oracle 示例中:

def type_handler(cursor, name, default_type, size, precision, scale):
    if default_type == oracledb.DB_TYPE_NUMBER:
        return cursor.var(oracledb.DB_TYPE_VARCHAR, arraysize=cursor.arraysize,
                outconverter=lambda v: v.replace('.', ','))

conn.outputtypehandler = type_handler
cursor.execute("select 2.5 from dual")