在 Pandas 中合并两个数据帧时出现值错误

Getting a valueerror on merging two dataframes in Pandas

我尝试使用 panda 合并两个数据帧,但这是我得到的错误代码:

ValueError:您正在尝试合并 datetime64[ns] 和 datetime64[ns, UTC] 列。如果您想继续,您应该使用 pd.concat

我尝试了网上找到的不同解决方案,但没有任何效果!!代码已经提供给我,它似乎可以在其他 PC 上运行,但在我的计算机上不能运行。

这是我的代码:

import sys
import os
from datetime import datetime
import numpy  as np
import pandas as pd


# --------------------------------------------------------------------
# -- price, consumption and production                              --
# --------------------------------------------------------------------

fn = '../data/np_data.csv'
if os.path.isfile(fn):
    df_data = pd.read_csv(fn,header=[0],parse_dates=[0])
   
else:
    sys.exit('Could not open data file {}̈́'.format(fn))


# --------------------------------------------------------------------
# -- temp. data                                               --
# --------------------------------------------------------------------

fn = '../data/temp.csv'
if os.path.isfile(fn):
    dtemp = pd.read_csv(fn,header=[0],parse_dates=[0])
   
else:
    sys.exit('Could not open data file {}̈́'.format(fn))


# --------------------------------------------------------------------
# --  price data                                              --
# --   first date: 2014-01-13                                       --
# --   last  date: 2020-02-01                                       --
# --------------------------------------------------------------------

fn = '../data/eprice.csv'
if os.path.isfile(fn):
    eprice = pd.read_csv(fn,header=[0])
   
else:
    sys.exit('Could not open data file {}̈́'.format(fn))


# --------------------------------------------------------------------
# -- combine dataframes (and save as CSV file)                      --
# --------------------------------------------------------------------

#

df= df_data.merge(dtemp, on='time',how='left')      ## This is where I get the error.

print(df.info())
print(eprice.info())

#
# add eprice
df = df.merge(eprice, on='date', how='left')

#
# eprice available only available on trading days
#   fills in missing values, last observation is used
  df = df.fillna(method='ffill')

#
# keep only the relevant time period
df = df[df.date > '2014-01-23']
df = df[df.date < '2020-02-01']


df.to_csv('../data/my_data.csv',index=False)

已导入的数据集看起来正常,具有预期的列数和观察值。我在 Panda 中的版本是 1.0.3

编辑:

这是我第一次合并 df_data 和 dtemp 时的输出 (df)。

                           time  price_sys  price_no1  ...  temp_no3  temp_no4  temp_no5
0 2014-01-23 00:00:00+00:00      32.08      32.08  ...       NaN       NaN       NaN
1 2014-01-24 00:00:00+00:00      31.56      31.60  ...      -2.5      -8.7       2.5
2 2014-01-24 00:00:00+00:00      30.96      31.02  ...      -2.5      -8.7       2.5
3 2014-01-24 00:00:00+00:00      30.84      30.79  ...      -2.5      -8.7       2.5
4 2014-01-24 00:00:00+00:00      31.58      31.10  ...      -2.5      -8.7       2.5

[5 rows x 25 columns]

这是合并前 eprice 的输出:

    <bound method NDFrame.head of                      date  gas price  oil price  coal price  carbon price
0     2014-01-24 00:00:00      66.00     107.88       79.42          6.89
1     2014-01-27 00:00:00      64.20     106.69       79.43          7.04
2     2014-01-28 00:00:00      63.75     107.41       79.29          7.20
3     2014-01-29 00:00:00      63.20     107.85       78.52          7.21
4     2014-01-30 00:00:00      62.60     107.95       78.18          7.46
                  ...        ...        ...         ...           ...
1608  2020-03-25 00:00:00      22.30      27.39       67.81         17.51
1609  2020-03-26 00:00:00      21.55      26.34       70.35         17.35
1610  2020-03-27 00:00:00      18.90      24.93       72.46         16.39
1611  2020-03-30 00:00:00      19.20      22.76       71.63         17.06
1612  2020-03-31 00:00:00      18.00      22.74       71.13         17.68

[1613 rows x 5 columns]>

这是我合并 df 和 eprice 时发生的事情:

    <bound method NDFrame.head of                      date  gas price  oil price  coal price  carbon price
0     2014-01-24 00:00:00      66.00     107.88       79.42          6.89
1     2014-01-27 00:00:00      64.20     106.69       79.43          7.04
2     2014-01-28 00:00:00      63.75     107.41       79.29          7.20
3     2014-01-29 00:00:00      63.20     107.85       78.52          7.21
4     2014-01-30 00:00:00      62.60     107.95       78.18          7.46
                  ...        ...        ...         ...           ...
1608  2020-03-25 00:00:00      22.30      27.39       67.81         17.51
1609  2020-03-26 00:00:00      21.55      26.34       70.35         17.35
1610  2020-03-27 00:00:00      18.90      24.93       72.46         16.39
1611  2020-03-30 00:00:00      19.20      22.76       71.63         17.06
1612  2020-03-31 00:00:00      18.00      22.74       71.13         17.68

[1613 rows x 5 columns]>
                       time  price_sys  ...  coal price  carbon price
0 2014-01-23 00:00:00+00:00      32.08  ...         NaN           NaN
1 2014-01-24 00:00:00+00:00      31.56  ...         NaN           NaN
2 2014-01-24 00:00:00+00:00      30.96  ...         NaN           NaN
3 2014-01-24 00:00:00+00:00      30.84  ...         NaN           NaN
4 2014-01-24 00:00:00+00:00      31.58  ...         NaN           NaN

[5 rows x 29 columns]

在加入之前尝试在两个时间列上执行 df['Time'] = pd.to_datetime(df['Time'], utc = True)(或者更确切地说,没有 UTC 的时间列需要经过这个!)