Python - 从 excel 文件读取时间后没有得到正确的日期时间
Python - not getting correct datetime upon reading time from excel file
我有一个 excel 文件,它有 3 列作为日期时间或日期或时间字段。我正在通过 xlrd
包阅读它,我得到的时间是 milliseconds
我想当我试图将它转换回日期时间时,我得到了错误的结果.
我也尝试将文件转换为 csv
。这也无济于事,我得到了我无法理解的奇怪的日期时间格式。
这是我尝试使用 xlrd
格式的结果。我更喜欢使用扩展名为 .xlrs
的文件作为输入,否则我每次获得新的输入文件时都必须将 excel 文件转换为 .csv
。
from xlrd import open_workbook
import os,pickle,datetime
def main(path, filename, absolute_path_organisation_structure):
absolute_filepath = os.path.join(path,filename)
wb = open_workbook(absolute_filepath)
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
for row_index in xrange(1, sheet.nrows):
row=[]
for col_index in xrange(4,7): #4th and 6th columns are date fields
row.append(sheet.cell(row_index, col_index).value)
print(row) #Relevant list formed with 4th, 5th and 6th columns
print(datetime.datetime.fromtimestamp(float(row[0])).strftime('%Y-%m-%d %H:%M:%S'))
path = "C:\Users\***************\NEW DATA"
MISfile = "P2P_2015 - Copy.xlsx"
absolute_path_organisation_structure = "C:\Users\******************NEW DATA\organisation.csv"
main(path, MISfile, absolute_path_organisation_structure)
结果:
[42011.46789351852, u'Registered', 42009.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Registered', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG2 approval', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'CTG2 Approved', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42013.0]
1970-01-01 17:10:11
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
实际输入文件:(复制自excel)
1/7/2015 11:13 Registered 1/5/2015 0:00
1/7/2015 11:13 Sent for CTG1 approval 1/6/2015 0:00
1/7/2015 11:13 Sent back 1/6/2015 0:00
1/7/2015 11:13 Registered 1/7/2015 0:00
1/7/2015 11:13 Sent for CTG1 approval 1/7/2015 0:00
1/7/2015 11:13 Sent for CTG2 approval 1/8/2015 0:00
1/7/2015 11:13 CTG2 Approved 1/8/2015 0:00
1/7/2015 11:13 Sent back 1/9/2015 0:00
6/15/2015 14:48 Registered 5/20/2015 0:00
6/15/2015 14:48 Registered 5/20/2015 0:00
6/15/2015 14:48 Sent back 6/10/2015 0:00
6/15/2015 14:48 Sent back 6/10/2015 0:00
6/15/2015 14:48 Registered 6/15/2015 0:00
6/15/2015 14:48 Registered 6/15/2015 0:00
为什么我无法正确读取日期?为什么它们不简单地以字符串形式出现,以便我可以轻松转换它们?
问题是您将 Excel 日期时间值解释为 UNIX 时间戳,而它们不是一回事。要查找的警告标志是结果值都接近 UNIX 纪元 (1970-01-01
)。
您可以使用 in this answer 中描述的方法将 Excel 日期时间转换为 UNIX。
Windows/Mac Excel 2011
Unix Timestamp = (Excel Timestamp - 25569) * 86400
Mac Excel 2007
Unix Timestamp = (Excel Timestamp - 24107) * 86400
如果您应用此转换,您应该得到正确的输出:
timestamp = (float(row[0]) - 25569) * 86400
datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
xldate_as_tuple(xldate, datemode) [#]
Convert an Excel number (presumed to represent a date, a datetime or a time) into a tuple suitable for feeding to datetime or mx.DateTime constructors.
来源:http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function
用法示例:How to use ``xlrd.xldate_as_tuple()``
如果要读取的Excel文件是table可以简单直接使用pandas.read_excel。
使用 pandas.to_datetime
转换日期后
from __future__ import absolute_import, division, print_function
import os
import pandas as pd
def main(path, filename, absolute_path_organisation_structure):
absolute_filepath = os.path.join(path,filename)
#Relevant list formed with 4th, 5th and 6th columns
df = pd.read_excel(absolute_filepath, header=None, parse_cols=[4,5,6])
# Transform column 0 and 2 to datetime
df[0] = pd.to_datetime(df[0])
df[2] = pd.to_datetime(df[2])
print(df)
path = "C:\Users\***************\NEW DATA"
MISfile = "P2P_2015 - Copy.xlsx"
main(path, MISfile,None)
我有一个 excel 文件,它有 3 列作为日期时间或日期或时间字段。我正在通过 xlrd
包阅读它,我得到的时间是 milliseconds
我想当我试图将它转换回日期时间时,我得到了错误的结果.
我也尝试将文件转换为 csv
。这也无济于事,我得到了我无法理解的奇怪的日期时间格式。
这是我尝试使用 xlrd
格式的结果。我更喜欢使用扩展名为 .xlrs
的文件作为输入,否则我每次获得新的输入文件时都必须将 excel 文件转换为 .csv
。
from xlrd import open_workbook
import os,pickle,datetime
def main(path, filename, absolute_path_organisation_structure):
absolute_filepath = os.path.join(path,filename)
wb = open_workbook(absolute_filepath)
for sheet in wb.sheets():
number_of_rows = sheet.nrows
number_of_columns = sheet.ncols
for row_index in xrange(1, sheet.nrows):
row=[]
for col_index in xrange(4,7): #4th and 6th columns are date fields
row.append(sheet.cell(row_index, col_index).value)
print(row) #Relevant list formed with 4th, 5th and 6th columns
print(datetime.datetime.fromtimestamp(float(row[0])).strftime('%Y-%m-%d %H:%M:%S'))
path = "C:\Users\***************\NEW DATA"
MISfile = "P2P_2015 - Copy.xlsx"
absolute_path_organisation_structure = "C:\Users\******************NEW DATA\organisation.csv"
main(path, MISfile, absolute_path_organisation_structure)
结果:
[42011.46789351852, u'Registered', 42009.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42010.0]
1970-01-01 17:10:11
[42011.46789351852, u'Registered', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG1 approval', 42011.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent for CTG2 approval', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'CTG2 Approved', 42012.0]
1970-01-01 17:10:11
[42011.46789351852, u'Sent back', 42013.0]
1970-01-01 17:10:11
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42144.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Sent back', 42165.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
[42170.61667824074, u'Registered', 42170.0]
1970-01-01 17:12:50
实际输入文件:(复制自excel)
1/7/2015 11:13 Registered 1/5/2015 0:00
1/7/2015 11:13 Sent for CTG1 approval 1/6/2015 0:00
1/7/2015 11:13 Sent back 1/6/2015 0:00
1/7/2015 11:13 Registered 1/7/2015 0:00
1/7/2015 11:13 Sent for CTG1 approval 1/7/2015 0:00
1/7/2015 11:13 Sent for CTG2 approval 1/8/2015 0:00
1/7/2015 11:13 CTG2 Approved 1/8/2015 0:00
1/7/2015 11:13 Sent back 1/9/2015 0:00
6/15/2015 14:48 Registered 5/20/2015 0:00
6/15/2015 14:48 Registered 5/20/2015 0:00
6/15/2015 14:48 Sent back 6/10/2015 0:00
6/15/2015 14:48 Sent back 6/10/2015 0:00
6/15/2015 14:48 Registered 6/15/2015 0:00
6/15/2015 14:48 Registered 6/15/2015 0:00
为什么我无法正确读取日期?为什么它们不简单地以字符串形式出现,以便我可以轻松转换它们?
问题是您将 Excel 日期时间值解释为 UNIX 时间戳,而它们不是一回事。要查找的警告标志是结果值都接近 UNIX 纪元 (1970-01-01
)。
您可以使用 in this answer 中描述的方法将 Excel 日期时间转换为 UNIX。
Windows/Mac Excel 2011
Unix Timestamp = (Excel Timestamp - 25569) * 86400
Mac Excel 2007
Unix Timestamp = (Excel Timestamp - 24107) * 86400
如果您应用此转换,您应该得到正确的输出:
timestamp = (float(row[0]) - 25569) * 86400
datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
xldate_as_tuple(xldate, datemode) [#]
Convert an Excel number (presumed to represent a date, a datetime or a time) into a tuple suitable for feeding to datetime or mx.DateTime constructors.
来源:http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function
用法示例:How to use ``xlrd.xldate_as_tuple()``
如果要读取的Excel文件是table可以简单直接使用pandas.read_excel。 使用 pandas.to_datetime
转换日期后from __future__ import absolute_import, division, print_function
import os
import pandas as pd
def main(path, filename, absolute_path_organisation_structure):
absolute_filepath = os.path.join(path,filename)
#Relevant list formed with 4th, 5th and 6th columns
df = pd.read_excel(absolute_filepath, header=None, parse_cols=[4,5,6])
# Transform column 0 and 2 to datetime
df[0] = pd.to_datetime(df[0])
df[2] = pd.to_datetime(df[2])
print(df)
path = "C:\Users\***************\NEW DATA"
MISfile = "P2P_2015 - Copy.xlsx"
main(path, MISfile,None)