将数据框中的日期时间保存为 csv 时保持时区
Keeping timezone when saving datetime in dataframe as csv
我正在使用以下代码将时间戳保存到磁盘,然后在以后的日期查找自该时间以来经过了多少时间。我的问题是,当我使用 businesstimedelta 包时,出现 returns 错误,我的数据框没有时区。我假设它在保存到 csv 时丢失了:
import pandas as pd
import time
import datetime
import pytz
import businesstimedelta
from pytz import timezone
workday = businesstimedelta.WorkDayRule(start_time=datetime.time(9,30),end_time=datetime.time(16),working_days=[0, 1, 2, 3, 4])
timestamps = pd.DataFrame([datetime.datetime.now(timezone('America/New_York'))])
time.sleep(5)
timestamps.to_csv('timestamps.csv')
timestamps2 = pd.read_csv('timestamps.csv')
difference = workday.difference(timestamps2,datetime.datetime.now(timezone('America/New_York'))).hours
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-0e7502d68c65> in <module>
10 timestamps.to_csv('timestamps.csv')
11 timestamps2 = pd.read_csv('production temp/positions.csv')
---> 12 difference = workday.difference(timestamps2,datetime.datetime.now(timezone('America/New_York'))).hours
c:\users\g\appdata\local\programs\python\python38\lib\site-packages\businesstimedelta\rules\rule.py in difference(self, dt1, dt2)
29 def difference(self, dt1, dt2):
30 """Calculate the business time between two datetime objects."""
---> 31 dt1 = localize_unlocalized_dt(dt1)
32 dt2 = localize_unlocalized_dt(dt2)
33 start_dt, end_dt = sorted([dt1, dt2])
c:\users\g\appdata\local\programs\python\python38\lib\site-packages\businesstimedelta\businesstimedelta.py in localize_unlocalized_dt(dt)
8 https://docs.python.org/3/library/datetime.html#datetime.timezone
9 """
---> 10 if dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None:
11 return dt
12 return pytz.utc.localize(dt)
c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5464 return self[name]
-> 5465 return object.__getattribute__(self, name)
5466
5467 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'tzinfo'
I am assuming that it is lost when saved to csv
是的,这是你问题的一部分。 CSV 是一种低保真数据格式,不会保留大多数对象的数据类型。一切都首先被读取为数字或字符串。然后 CSV 的 reader 负责确定要使用的数据类型。 (Pandas 自动检测做得不错。)
这里有几个选项:
- 读入数据帧后将字符串转换为正确的日期时间格式。
timestamps2["0"] = pd.to_datetime(timestamps2["0"])
- 准确地告诉 Pandas 它读取文件时要使用的转换器。
timestamps2 = pd.read_csv("./timestamps.csv", converters={"0": pd.to_datetime})
- 导出为保留数据类型的不同文件格式,例如 pickle。
现在,一旦您读取了数据并将其加载到日期时间数据类型而不是 object
,您会发现该系列的数据类型为 pandas._libs.tslibs.timestamps.Timestamp
:
dt1 = datetime.datetime.now(timezone('America/New_York'))
timestamps = pd.Series(data=[dt1])
print(type(dt1)) # <class 'datetime.datetime'>
print(timestamps.dtype) # datetime64[ns, America/New_York]
print(type(timestamps.at[0])) # <class 'pandas._libs.tslibs.timestamps.Timestamp'>
businesstimedelta
库似乎不提供对 Pandas 对象的矢量化操作,它似乎只适用于本机 Python 日期时间对象。所以这是一个解决方案:
dt1 = datetime.datetime.now(timezone('America/New_York'))
dt2 = dt1 + datetime.timedelta(seconds=1)
timestamps = pd.Series([dt1])
timestamps.apply(lambda dt: workday.difference(dt.to_pydatetime(), dt2))
0 <BusinessTimeDelta 0 hours 1 seconds>
dtype: object
您还应该查看 Pandas' 对时间增量的本机支持:https://pandas.pydata.org/docs/user_guide/timedeltas.html
以及各种营业日支持:https://pandas.pydata.org/docs/user_guide/timeseries.html?highlight=business#dateoffset-objects
我正在使用以下代码将时间戳保存到磁盘,然后在以后的日期查找自该时间以来经过了多少时间。我的问题是,当我使用 businesstimedelta 包时,出现 returns 错误,我的数据框没有时区。我假设它在保存到 csv 时丢失了:
import pandas as pd
import time
import datetime
import pytz
import businesstimedelta
from pytz import timezone
workday = businesstimedelta.WorkDayRule(start_time=datetime.time(9,30),end_time=datetime.time(16),working_days=[0, 1, 2, 3, 4])
timestamps = pd.DataFrame([datetime.datetime.now(timezone('America/New_York'))])
time.sleep(5)
timestamps.to_csv('timestamps.csv')
timestamps2 = pd.read_csv('timestamps.csv')
difference = workday.difference(timestamps2,datetime.datetime.now(timezone('America/New_York'))).hours
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-23-0e7502d68c65> in <module>
10 timestamps.to_csv('timestamps.csv')
11 timestamps2 = pd.read_csv('production temp/positions.csv')
---> 12 difference = workday.difference(timestamps2,datetime.datetime.now(timezone('America/New_York'))).hours
c:\users\g\appdata\local\programs\python\python38\lib\site-packages\businesstimedelta\rules\rule.py in difference(self, dt1, dt2)
29 def difference(self, dt1, dt2):
30 """Calculate the business time between two datetime objects."""
---> 31 dt1 = localize_unlocalized_dt(dt1)
32 dt2 = localize_unlocalized_dt(dt2)
33 start_dt, end_dt = sorted([dt1, dt2])
c:\users\g\appdata\local\programs\python\python38\lib\site-packages\businesstimedelta\businesstimedelta.py in localize_unlocalized_dt(dt)
8 https://docs.python.org/3/library/datetime.html#datetime.timezone
9 """
---> 10 if dt.tzinfo is not None and dt.tzinfo.utcoffset(dt) is not None:
11 return dt
12 return pytz.utc.localize(dt)
c:\users\g\appdata\local\programs\python\python38\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5463 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5464 return self[name]
-> 5465 return object.__getattribute__(self, name)
5466
5467 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'tzinfo'
I am assuming that it is lost when saved to csv
是的,这是你问题的一部分。 CSV 是一种低保真数据格式,不会保留大多数对象的数据类型。一切都首先被读取为数字或字符串。然后 CSV 的 reader 负责确定要使用的数据类型。 (Pandas 自动检测做得不错。)
这里有几个选项:
- 读入数据帧后将字符串转换为正确的日期时间格式。
timestamps2["0"] = pd.to_datetime(timestamps2["0"])
- 准确地告诉 Pandas 它读取文件时要使用的转换器。
timestamps2 = pd.read_csv("./timestamps.csv", converters={"0": pd.to_datetime})
- 导出为保留数据类型的不同文件格式,例如 pickle。
现在,一旦您读取了数据并将其加载到日期时间数据类型而不是 object
,您会发现该系列的数据类型为 pandas._libs.tslibs.timestamps.Timestamp
:
dt1 = datetime.datetime.now(timezone('America/New_York'))
timestamps = pd.Series(data=[dt1])
print(type(dt1)) # <class 'datetime.datetime'>
print(timestamps.dtype) # datetime64[ns, America/New_York]
print(type(timestamps.at[0])) # <class 'pandas._libs.tslibs.timestamps.Timestamp'>
businesstimedelta
库似乎不提供对 Pandas 对象的矢量化操作,它似乎只适用于本机 Python 日期时间对象。所以这是一个解决方案:
dt1 = datetime.datetime.now(timezone('America/New_York'))
dt2 = dt1 + datetime.timedelta(seconds=1)
timestamps = pd.Series([dt1])
timestamps.apply(lambda dt: workday.difference(dt.to_pydatetime(), dt2))
0 <BusinessTimeDelta 0 hours 1 seconds>
dtype: object
您还应该查看 Pandas' 对时间增量的本机支持:https://pandas.pydata.org/docs/user_guide/timedeltas.html
以及各种营业日支持:https://pandas.pydata.org/docs/user_guide/timeseries.html?highlight=business#dateoffset-objects