有没有办法将 convert/standardize 文本转换为 Python 中的整数?

Is there a way to convert/standardize text into Integer in Python?

我有一个数据框,其中有一列显示组织每个库存项目所花费的时间(以分钟为单位)。目标是以整数或浮点数显示花费的分钟数。但是,此列中的值不干净,请参见下面的一些示例。有没有办法标准化并将所有内容转换为整数或浮点数? (例如10小时应该是600分钟)

import pandas as pd
df1 = { 'min':['420','450','480','512','560','10 hours', '10.5 hours',
'420 (all inventory)','3h ', '4.1 hours', '60**','6h', '7hours  ']}

df1=pd.DataFrame(df1)

想要的输出是这样的

我用regex解决了这类问题。

import regex as re
import numpy as np
import pandas as pd
df1 = { 'min':['420','450','480','512','560','10 hours', '10.5 hours',
'420 (all inventory)','3h ', '4.1 hours', '60**','6h', '7hours  ']}
df1=pd.DataFrame(df1)

# Copy Dataframe for iteration
# Created a empty  numpy array for parsing by index
arr = np.zeros(df1.shape[0])
df1_copy = df1.copy()

for i,j in df1_copy.iterrows():
    if "h" in j["min"]:
        j["min"] = re.sub(r"[a-zA-Z()\s]","",j["min"])
        j["min"] = float(j["min"])
        arr[i] = float(j["min"]*60)
    else:
        j["min"] = re.sub(r"[a-zA-Z()**\s]","",j["min"])
        j["min"] = float(j["min"])
        arr[i] = float(j["min"])


df1["min_clean"] = arr
print(df1)
                    min  min_clean
0                   420      420.0
1                   450      450.0
2                   480      480.0
3                   512      512.0
4                   560      560.0
5              10 hours      600.0
6            10.5 hours      630.0
7   420 (all inventory)      420.0
8                   3h       180.0
9             4.1 hours      246.0
10                 60**       60.0
11                   6h      360.0
12             7hours        420.0

我目前不知道 pandas 但这个解决方案(使用正则表达式)可能会有所帮助

import re

df1 = { 'min':['420','450','480','512','560','10 hours', '10.5 hours',
'420 (all inventory)','3h ', '4.1 hours', '60**','6h', '7hours  ']}

def mins(s):
    if re.match(r"\d*\.?\d+ *(h|hour)", s):
        l = re.sub(r"[^\d.]", "", s).split(".")
        m = int(l[0]) * 60
        if len(l) != 1:
            m += int(l[1]) * 6
        return m
    return int(re.sub(r"\D", "", s))

min_clear = map(mins, df1["min"])
print(list(min_clear))

# output: [420, 450, 480, 512, 560, 600, 630, 420, 180, 246, 60, 360, 420]

您稍后可以将 min_clear 添加到 DataFrame

顺便说一句,我只是一个初学者;如果任何用例失败,请告诉我,我会尽力改进。

谢谢