尝试使我的数据框数字化时遇到问题

Having issues trying to make my dataframe numeric

所以我有一个 sqlite 本地数据库,我使用

作为 pandas 数据帧将其读入我的程序
""" Seperating hitters and pitchers """
pitchers = pd.read_sql_query("SELECT * FROM ALL_NORTHWOODS_DATA WHERE BF_y >= 20 AND BF_x >= 20", northwoods_db)
hitters = pd.read_sql_query("SELECT * FROM ALL_NORTHWOODS_DATA WHERE PA_y >= 25 AND PA_x >= 25", northwoods_db)

但是当我这样做时,有些数字不是数字。这是其中一个数据帧的头部:

index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   Aff_x   G_x PA_x    ... ER_y    BK_y    WP_y    BF_y    WHIP_y  H9_y    HR9_y   BB9_y   SO9_y   SO/W_y
0   84  2020    21  -0.3    Hillsdale   GMAC    NCAA    None    5   None    ... 4.0 None    3.0 71.0    1.132   5.6 0.0 4.6 8.7 1.89
1   264 2018    --  None    Duke    ACC NCAA    None    15  None    ... 13  0   1   88  2.111   10.0    0.5 9.0 8.0 0.89
2   298 2019    21  0.1 Wisconsin-Milwaukee Horz    NCAA    None    8   None    ... 1.0 0.0 2.0 21.0    2.25    9.0 0.0 11.3    11.3    1.0
3   357 2017    22  1.0 Nova Southeastern   SSC NCAA    None    15.0    None    ... 20.0    0.0 3.0 206.0   1.489   9.7 0.4 3.7 8.5 2.32
4   418 2021    21  -0.4    Creighton   BigE    NCAA    None    4   None    ... 26.0    1.0 6.0 226.0   1.625   8.6 0.9 6.0 7.5 1.25

当我尝试将数据框设为数字时,我使用了这行代码:

hitters = hitters.apply(pd.to_numeric, errors='coerce')
pitchers = pitchers.apply(pd.to_numeric, errors='coerce')

但是当我这样做时,数据帧的新头部充满了 NaN,它似乎摆脱了所有字符串值,但我想保留它们。

    index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   Aff_x   G_x PA_x    ... ER_y    BK_y    WP_y    BF_y    WHIP_y  H9_y    HR9_y   BB9_y   SO9_y   SO/W_y
0   84  2020    21.0    -0.3    NaN NaN NaN NaN 5.0 NaN ... 4.0 NaN 3.0 71.0    1.132   5.6 0.0 4.6 8.7 1.89
1   264 2018    NaN NaN NaN NaN NaN NaN 15.0    NaN ... 13.0    0.0 1.0 88.0    2.111   10.0    0.5 9.0 8.0 0.89
2   298 2019    21.0    0.1 NaN NaN NaN NaN 8.0 NaN ... 1.0 0.0 2.0 21.0    2.250   9.0 0.0 11.3    11.3    1.00
3   357 2017    22.0    1.0 NaN NaN NaN NaN 15.0    NaN ... 20.0    0.0 3.0 206.0   1.489   9.7 0.4 3.7 8.5 2.32
4   418 2021    21.0    -0.4    NaN NaN NaN NaN 4.0 NaN ... 26.0    1.0 6.0 226.0   1.625   8.6 0.9 6.0 7.5 1.25

是否有更好的方法使数字值变为数字值并保留我所有的字符串列?也许有一个 sqlite 函数可以做得更好?我不确定,请提供任何帮助。

也许你可以使用 combine_first:

hitters_new = hitters.apply(pd.to_numeric, errors='coerce').combine_first(hitters)
pitchers_new = pitchers.apply(pd.to_numeric, errors='coerce').combine_first(pitchers)

您可以尝试使用 astype or convert_dtypes. They both take an argument which is the columns you want to convert, if you already know which columns are numeric and which ones are strings that can work. Otherwise, take a look at this thread 自动执行此操作。