Pandas 未与错误的位置和值正确合并

Pandas not corectly merged with wrong location and value

我有一些股票市场数据,我想合并 2 个包含 OHLCV 值的 CSV 文件,另一个是我自己的计算值,名为“hd”,添加在“Volume”列之后,条件为:

  1. 匹配代码名称
  2. 匹配日期

我刚刚用 left 方法尝试了这个合并代码,但是输出的值位置错误,有些列没有合并,如何正确合并?

import pandas as pd

data1 = pd.read_csv('ohlcv286.csv', parse_dates=True)
data2 = pd.read_csv('hd286.csv', parse_dates=True)

data2['Date/Time'] = pd.to_datetime(data2['Date/Time']).dt.strftime("%d/%m/%Y")

print(data2.tail(5))
print(data1.tail(5))

merge = data1.merge(data2, on=['Ticker','Date/Time'], how='left').fillna(0)

print(merge.tail(5))
print(merge.info())

merge.to_csv('exp.csv', index=False)

截图:

输出:

  Ticker   Date/Time            hd
0   ABDA  06/04/2021 -4.000000e+11
1   ABDA  06/11/2021 -4.000000e+11
2   ABDA  14/06/2021 -4.000000e+11
3   ABDA  15/06/2021 -4.000000e+11
4   ABDA  17/06/2021 -4.000000e+11
  Ticker   Date/Time    Open    High     Low   Close     Volume
0   AALI  02/06/2021  8900.0  9100.0  8825.0  9075.0  2188500.0
1   AALI  03/06/2021  9125.0  9325.0  9100.0  9200.0  2495200.0
2   AALI  04/06/2021  9225.0  9250.0  9150.0  9175.0  1298300.0
3   AALI  07/06/2021  9175.0  9325.0  9100.0  9125.0  1377700.0
4   AALI  08/06/2021  9125.0  9175.0  8800.0  8875.0  2981000.0
  Ticker   Date/Time    Open    High     Low   Close     Volume         hd
0   AALI  02/06/2021  8900.0  9100.0  8825.0  9075.0  2188500.0        0.0
1   AALI  03/06/2021  9125.0  9325.0  9100.0  9200.0  2495200.0        0.0
2   AALI  04/06/2021  9225.0  9250.0  9150.0  9175.0  1298300.0        0.0
3   AALI  07/06/2021  9175.0  9325.0  9100.0  9125.0  1377700.0 -9896930.0
4   AALI  08/06/2021  9125.0  9175.0  8800.0  8875.0  2981000.0 -8427643.0
<class 'pandas.core.frame.DataFrame'>
Int64Index: 103256 entries, 0 to 103255
Data columns (total 8 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   Ticker     103256 non-null  object 
 1   Date/Time  103256 non-null  object 
 2   Open       103256 non-null  float64
 3   High       103256 non-null  float64
 4   Low        103256 non-null  float64
 5   Close      103256 non-null  float64
 6   Volume     103256 non-null  float64
 7   hd         103256 non-null  float64
dtypes: float64(6), object(2)
memory usage: 7.1+ MB
None

Process finished with exit code 0

来自@not_speshal的解决方案 一些日期颠倒了,使数据在可视化时变得混乱

OHLCV data

Self Calculated Value

Output

似乎“Date/Time”列中的某些值同时具有日期和时间,而有些值只有日期。尝试只保留合并前的日期:

data1 = pd.read_csv('ohlcv286.csv')
data2 = pd.read_csv('hd286.csv')

data1["Date/Time"] = pd.to_datetime(data1["Date/Time"], format="%d/%m/%Y")
data2["Date/Time"] = pd.to_datetime(pd.to_datetime(data2["Date/Time"], format="%d/%m/%Y %H:%M:%S").dt.date, format="%Y-%m-%d")

output = data1.merge(data2, on=['Ticker','Date/Time'], how='left').fillna(0)

>>> output
       Ticker  Date/Time    Open  ...   Close     Volume            hd
0        AALI 2021-06-02  8900.0  ...  9075.0  2188500.0 -5.738800e+06
1        AALI 2021-06-03  9125.0  ...  9200.0  2495200.0 -5.717932e+06
2        AALI 2021-06-04  9225.0  ...  9175.0  1298300.0 -5.799378e+06
3        AALI 2021-06-07  9175.0  ...  9125.0  1377700.0 -5.916303e+06
4        AALI 2021-06-08  9125.0  ...  8875.0  2981000.0 -6.594500e+06
      ...        ...     ...  ...     ...        ...           ...
103251   ZYRX 2021-11-19   580.0  ...   565.0  7308900.0 -1.996443e+10
103252   ZYRX 2021-11-22   565.0  ...   575.0  7221700.0 -1.996425e+10
103253   ZYRX 2021-11-23   575.0  ...   570.0  7555100.0 -1.996424e+10
103254   ZYRX 2021-11-24   570.0  ...   575.0  7803100.0 -1.996366e+10
103255   ZYRX 2021-11-25   575.0  ...   560.0  5545200.0 -1.996360e+10

[103256 rows x 8 columns]
输出(代码:INKP):