Python - 从 excel 文件读取时间后没有得到正确的日期时间

Python - not getting correct datetime upon reading time from excel file

我有一个 excel 文件,它有 3 列作为日期时间或日期或时间字段。我正在通过 xlrd 包阅读它,我得到的时间是 milliseconds 我想当我试图将它转换回日期时间时,我得到了错误的结果.

我也尝试将文件转换为 csv。这也无济于事,我得到了我无法理解的奇怪的日期时间格式。

这是我尝试使用 xlrd 格式的结果。我更喜欢使用扩展名为 .xlrs 的文件作为输入,否则我每次获得新的输入文件时都必须将 excel 文件转换为 .csv

from xlrd import open_workbook
import os,pickle,datetime

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)

    wb = open_workbook(absolute_filepath)
    for sheet in wb.sheets():
        number_of_rows = sheet.nrows
        number_of_columns = sheet.ncols

        for row_index in xrange(1, sheet.nrows):
            row=[]
            for col_index in xrange(4,7): #4th and 6th columns are date fields
                row.append(sheet.cell(row_index, col_index).value)

            print(row)  #Relevant list formed with 4th, 5th and 6th columns
            print(datetime.datetime.fromtimestamp(float(row[0])).strftime('%Y-%m-%d %H:%M:%S'))


path = "C:\Users\***************\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
absolute_path_organisation_structure = "C:\Users\******************NEW DATA\organisation.csv"
main(path, MISfile, absolute_path_organisation_structure)

结果:

[42011.46789351852, u'Registered', 42009.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Registered', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG2 approval', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'CTG2 Approved', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42013.0]
1970-01-01 17:10:11
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50

实际输入文件:(复制自excel)

1/7/2015 11:13  Registered  1/5/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/6/2015 0:00
1/7/2015 11:13  Sent back   1/6/2015 0:00
1/7/2015 11:13  Registered  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG1 approval  1/7/2015 0:00
1/7/2015 11:13  Sent for CTG2 approval  1/8/2015 0:00
1/7/2015 11:13  CTG2 Approved   1/8/2015 0:00
1/7/2015 11:13  Sent back   1/9/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Registered  5/20/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Sent back   6/10/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00
6/15/2015 14:48 Registered  6/15/2015 0:00

为什么我无法正确读取日期?为什么它们不简单地以字符串形式出现,以便我可以轻松转换它们?

问题是您将 Excel 日期时间值解释为 UNIX 时间戳,而它们不是一回事。要查找的警告标志是结果值都接近 UNIX 纪元 (1970-01-01)。

您可以使用 in this answer 中描述的方法将 Excel 日期时间转换为 UNIX。

Windows/Mac Excel 2011

Unix Timestamp = (Excel Timestamp - 25569) * 86400

Mac Excel 2007

Unix Timestamp = (Excel Timestamp - 24107) * 86400

如果您应用此转换,您应该得到正确的输出:

timestamp = (float(row[0]) - 25569) * 86400
datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')

xldate_as_tuple(xldate, datemode) [#]

Convert an Excel number (presumed to represent a date, a datetime or a time) into a tuple suitable for feeding to datetime or mx.DateTime constructors.

来源:http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function

用法示例:How to use ``xlrd.xldate_as_tuple()``

如果要读取的Excel文件是table可以简单直接使用pandas.read_excel。 使用 pandas.to_datetime

转换日期后
from __future__ import absolute_import, division, print_function
import os
import pandas as pd

def main(path, filename, absolute_path_organisation_structure):
    absolute_filepath = os.path.join(path,filename)
    #Relevant list formed with 4th, 5th and 6th columns
    df = pd.read_excel(absolute_filepath, header=None, parse_cols=[4,5,6])
    # Transform column 0 and 2 to datetime
    df[0] = pd.to_datetime(df[0])
    df[2] = pd.to_datetime(df[2])
    print(df)

path = "C:\Users\***************\NEW DATA"
MISfile  = "P2P_2015 - Copy.xlsx"
main(path, MISfile,None)