当输入 string/object 格式不正确时如何将对象或字符串转换为时间格式，即所有行都不存在 %H

Question

之前曾询问过 similar question 但没有收到任何回复

我浏览了很多论坛以寻求解决方案。其他问题涉及一年，但我的没有 - 它只是 H:M:S

我从网上抓取了这个 data，它返回了

时间 - 36:42 38:34 1:38:32 1:41:18

这里的数据样本：Source data 1 and Source data 2

我需要这样的时间 36.70 38.57 98.53 101.30

为此我尝试了这个：

time_mins = []
for i in time_list:
    h, m, s = i.split(':')
    math = (int(h) * 3600 + int(m) * 60 + int(s))/60
    time_mins.append(math)

但这没有用，因为 36:42 不是 H:M:S 格式，所以我尝试使用此

转换 36:42

df1.loc[1:,6] = df1[6]+ timedelta(hours=0)

还有这个

df1['minutes'] = pd.to_datetime(df1[6], format='%H:%M:%S')

但运气不好。

我可以在提取阶段进行吗？我必须完成 500 多行

row_td = soup.find_all('td')

如果不是，转换成data frame后怎么办

提前致谢

Answer 1

如果您的输入（时间增量字符串）仅包含 hours/minutes/seconds（没有天等），您可以使用应用于该列的自定义函数：

import pandas as pd

df = pd.DataFrame({'Time': ['36:42', '38:34', '1:38:32', '1:41:18']})

def to_minutes(s):
    # split string s on ':', reverse so that seconds come first
    # multiply the result as type int with elements from tuple (1/60, 1, 60) to get minutes for each value
    # return the sum of these multiplications
    return sum(int(a)*b for a, b in zip(s.split(':')[::-1], (1/60, 1, 60)))

df['Minutes'] = df['Time'].apply(to_minutes)
# df['Minutes']
# 0     36.700000
# 1     38.566667
# 2     98.533333
# 3    101.300000
# Name: Minutes, dtype: float64

编辑： 我花了一段时间才找到它，但这是 this question. And my answer here is based on this reply.

的变体

Answer 2

你走在正确的轨道上。下面对您的代码进行了一些修改，并获得了会议记录。

创建函数

def get_time(i):
    ilist = i.split(':')
    if(len(ilist)==3):
        h, m, s = i.split(':')
    else:
        m, s = i.split(':')
        h = 0
math = (int(h) * 3600 + int(m) * 60 + int(s))/60
return np.round(math, 2)

使用拆分调用函数

x = "36:42 38:34 1:38:32 1:41:18"
x = x.split(" ")
xmin = [get_time(i) for i in x]
xmin

输出

[36.7, 38.57, 98.53, 101.3]

Answer 3

我没有使用 pandas 的经验，但这里有一些您可能会觉得有用的东西

...
time_mins = []
for i in time_list:
    parts = i.split(':')
    minutes_multiplier = 1/60
    math = 0
    for part in reversed(parts):
        math += (minutes_multiplier * int(part))
        minutes_multiplier *= 60
    time_mins.append(math)
...

Answer 4

我之前评论说 @NileshIngle 上面的回复没有用，因为它给了我一个

NameError: name 'h' is not defined.

需要进行简单的更正 - 将 h 移到 m,s 之上，因为它是引用的第一个变量

h = 0 # move this above
m, s = i.split(':') 


 def get_time(i):
    ilist = i.split(':')
    if(len(ilist)==3):
        h, m, s = i.split(':')
    else:
        h = 0
        m, s = i.split(':')
    math = (int(h) * 3600 + int(m) * 60 + int(s))/60
    return np.round(math, 2)

我要感谢@MrFuppes、@NileshIngle 和@KaustubhBadrike 花时间回复。我学会了三种不同的方法。

当输入 string/object 格式不正确时如何将对象或字符串转换为时间格式，即所有行都不存在 %H

How to convert objects or string to time format when the input string/object is malformed that is %H does not exist for all rows

python

format

datetime

timedelta

pandas