为什么 pandas DataFrame.append() 给出时区值错误?
Why does pandas DataFrame.append() give an error with timezone values?
我有一个循环追加的数据框(如果有更好的方法将行迭代添加到数据框的末尾,欢迎提出建议)。下面的代码片段给出了一个错误:
import pandas as pd
import pytz
import datetime
x = 'astring'
t = (datetime.datetime(2018, 5, 31, 13, 15, 17, tzinfo=pytz.utc), datetime.datetime(2100, 5, 31, tzinfo=pytz.utc))
df = pd.DataFrame(columns=['a', 'b', 'c'])
df = df.append({'a': x, 'b': t[0], 'c': t[1]}, ignore_index=True)
TypeError Traceback (most recent call last)
<ipython-input-161-0df455a78607> in <module>()
2 t = (datetime.datetime(2018, 5, 31, 13, 15, 17, tzinfo=pytz.utc), datetime.datetime(2100, 5, 31, tzinfo=pytz.utc))
3 df = pd.DataFrame(columns=['a', 'b', 'c'])
----> 4 df = df.append({'a': x, 'b': t[0], 'c': t[1]}, ignore_index=True)
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity)
5192
5193 _shared_docs['pivot_table'] = """
-> 5194 Create a spreadsheet-style pivot table as a DataFrame. The levels in
5195 the pivot table will be stored in MultiIndex objects (hierarchical
5196 indexes) on the index and columns of the result DataFrame
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
211 a 1
212 >>> df6 = pd.DataFrame([2], index=['a'])
--> 213 >>> df6
214 0
215 a 2
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/reshape/concat.py in get_result(self)
406 mgrs_indexers = []
407 for obj in self.objs:
--> 408 mgr = obj._data
409 indexers = {}
410 for ax, new_labels in enumerate(self.new_axes):
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
5201 expanded label indexer
5202 """
-> 5203 mult = np.array(shape)[::-1].cumprod()[::-1]
5204 return _ensure_platform_int(
5205 np.sum(np.array(labels).T * np.append(mult, [1]), axis=1).T)
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
5330
5331 # see if we are only masking values that if putted
-> 5332 # will work in the current dtype
5333 try:
5334 nn = n[m]
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in <listcomp>(.0)
5330
5331 # see if we are only masking values that if putted
-> 5332 # will work in the current dtype
5333 try:
5334 nn = n[m]
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
5601 for ax, indexer in indexers.items():
5602 mgr_shape[ax] = len(indexer)
-> 5603 mgr_shape = tuple(mgr_shape)
5604
5605 if 0 in indexers:
TypeError: data type not understood
但是,以下代码片段可以正常工作:
x = 'astring'
t = (datetime.datetime(2018, 5, 31, 13, 15, 17), datetime.datetime(2100, 5, 31))
df = pd.DataFrame(columns=['a', 'b', 'c'])
df = df.append({'a': x, 'b': t[0], 'c': t[1]}, ignore_index=True)
还有陌生人,这也行:
t = (datetime.datetime(2018, 5, 31, 13, 15, 17, tzinfo=pytz.utc), datetime.datetime(2100, 5, 31, tzinfo=pytz.utc))
df = pd.DataFrame(columns=['b', 'c'])
df = df.append({'b': t[0], 'c': t[1]}, ignore_index=True)
我错过了什么?我只是在这里添加更多细节,因为 Whosebug 抱怨我 "need more detail" 提交这个问题,因为我想特别冗长是一件好事。谁知道?
pandas==0.23.0
pytz==2016.7
这看起来像是 pandas
和 pytz
库版本之间的兼容性问题。
我能够重现您在 Datalab 中遇到的错误,并且我能够通过升级到 pandas==0.23.0
来解决它(我使用的是全新的默认 0.22.0
Datalab 实例)和 pytz==2018.4
。此外,根据我看到的其他一些 Stack Overflow 帖子,numpy
可能存在一些问题,所以为了仔细检查,我使用 numpy==1.14.3
.
为了升级库版本,您应该:
- 创建一个新笔记本,并在第一个单元格中 运行 命令
!pip install --upgrade pandas
。这为我安装了 pytz==2018.4
,但如果它不适合你的情况,你也可以尝试手动安装它。
- 通过单击 Datalab 中的 "Reset session" 选项重新启动内核。
- 运行 再次输入您的代码,看看现在是否有效:
添加以下行以检查我提到的版本是否正在使用:
print(pd.__version__)
print(pytz.__version__)
print(np.__version__)
我有一个循环追加的数据框(如果有更好的方法将行迭代添加到数据框的末尾,欢迎提出建议)。下面的代码片段给出了一个错误:
import pandas as pd
import pytz
import datetime
x = 'astring'
t = (datetime.datetime(2018, 5, 31, 13, 15, 17, tzinfo=pytz.utc), datetime.datetime(2100, 5, 31, tzinfo=pytz.utc))
df = pd.DataFrame(columns=['a', 'b', 'c'])
df = df.append({'a': x, 'b': t[0], 'c': t[1]}, ignore_index=True)
TypeError Traceback (most recent call last)
<ipython-input-161-0df455a78607> in <module>()
2 t = (datetime.datetime(2018, 5, 31, 13, 15, 17, tzinfo=pytz.utc), datetime.datetime(2100, 5, 31, tzinfo=pytz.utc))
3 df = pd.DataFrame(columns=['a', 'b', 'c'])
----> 4 df = df.append({'a': x, 'b': t[0], 'c': t[1]}, ignore_index=True)
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity)
5192
5193 _shared_docs['pivot_table'] = """
-> 5194 Create a spreadsheet-style pivot table as a DataFrame. The levels in
5195 the pivot table will be stored in MultiIndex objects (hierarchical
5196 indexes) on the index and columns of the result DataFrame
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, copy)
211 a 1
212 >>> df6 = pd.DataFrame([2], index=['a'])
--> 213 >>> df6
214 0
215 a 2
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/reshape/concat.py in get_result(self)
406 mgrs_indexers = []
407 for obj in self.objs:
--> 408 mgr = obj._data
409 indexers = {}
410 for ax, new_labels in enumerate(self.new_axes):
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
5201 expanded label indexer
5202 """
-> 5203 mult = np.array(shape)[::-1].cumprod()[::-1]
5204 return _ensure_platform_int(
5205 np.sum(np.array(labels).T * np.append(mult, [1]), axis=1).T)
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in concatenate_join_units(join_units, concat_axis, copy)
5330
5331 # see if we are only masking values that if putted
-> 5332 # will work in the current dtype
5333 try:
5334 nn = n[m]
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in <listcomp>(.0)
5330
5331 # see if we are only masking values that if putted
-> 5332 # will work in the current dtype
5333 try:
5334 nn = n[m]
/usr/local/envs/py3env/lib/python3.5/site-packages/pandas/core/internals.py in get_reindexed_values(self, empty_dtype, upcasted_na)
5601 for ax, indexer in indexers.items():
5602 mgr_shape[ax] = len(indexer)
-> 5603 mgr_shape = tuple(mgr_shape)
5604
5605 if 0 in indexers:
TypeError: data type not understood
但是,以下代码片段可以正常工作:
x = 'astring'
t = (datetime.datetime(2018, 5, 31, 13, 15, 17), datetime.datetime(2100, 5, 31))
df = pd.DataFrame(columns=['a', 'b', 'c'])
df = df.append({'a': x, 'b': t[0], 'c': t[1]}, ignore_index=True)
还有陌生人,这也行:
t = (datetime.datetime(2018, 5, 31, 13, 15, 17, tzinfo=pytz.utc), datetime.datetime(2100, 5, 31, tzinfo=pytz.utc))
df = pd.DataFrame(columns=['b', 'c'])
df = df.append({'b': t[0], 'c': t[1]}, ignore_index=True)
我错过了什么?我只是在这里添加更多细节,因为 Whosebug 抱怨我 "need more detail" 提交这个问题,因为我想特别冗长是一件好事。谁知道?
pandas==0.23.0
pytz==2016.7
这看起来像是 pandas
和 pytz
库版本之间的兼容性问题。
我能够重现您在 Datalab 中遇到的错误,并且我能够通过升级到 pandas==0.23.0
来解决它(我使用的是全新的默认 0.22.0
Datalab 实例)和 pytz==2018.4
。此外,根据我看到的其他一些 Stack Overflow 帖子,numpy
可能存在一些问题,所以为了仔细检查,我使用 numpy==1.14.3
.
为了升级库版本,您应该:
- 创建一个新笔记本,并在第一个单元格中 运行 命令
!pip install --upgrade pandas
。这为我安装了pytz==2018.4
,但如果它不适合你的情况,你也可以尝试手动安装它。 - 通过单击 Datalab 中的 "Reset session" 选项重新启动内核。
- 运行 再次输入您的代码,看看现在是否有效:
添加以下行以检查我提到的版本是否正在使用:
print(pd.__version__)
print(pytz.__version__)
print(np.__version__)