Pandas 未与错误的位置和值正确合并
Pandas not corectly merged with wrong location and value
我有一些股票市场数据,我想合并 2 个包含 OHLCV 值的 CSV 文件,另一个是我自己的计算值,名为“hd”,添加在“Volume”列之后,条件为:
- 匹配代码名称
- 匹配日期
我刚刚用 left
方法尝试了这个合并代码,但是输出的值位置错误,有些列没有合并,如何正确合并?
import pandas as pd
data1 = pd.read_csv('ohlcv286.csv', parse_dates=True)
data2 = pd.read_csv('hd286.csv', parse_dates=True)
data2['Date/Time'] = pd.to_datetime(data2['Date/Time']).dt.strftime("%d/%m/%Y")
print(data2.tail(5))
print(data1.tail(5))
merge = data1.merge(data2, on=['Ticker','Date/Time'], how='left').fillna(0)
print(merge.tail(5))
print(merge.info())
merge.to_csv('exp.csv', index=False)
截图:
输出:
Ticker Date/Time hd
0 ABDA 06/04/2021 -4.000000e+11
1 ABDA 06/11/2021 -4.000000e+11
2 ABDA 14/06/2021 -4.000000e+11
3 ABDA 15/06/2021 -4.000000e+11
4 ABDA 17/06/2021 -4.000000e+11
Ticker Date/Time Open High Low Close Volume
0 AALI 02/06/2021 8900.0 9100.0 8825.0 9075.0 2188500.0
1 AALI 03/06/2021 9125.0 9325.0 9100.0 9200.0 2495200.0
2 AALI 04/06/2021 9225.0 9250.0 9150.0 9175.0 1298300.0
3 AALI 07/06/2021 9175.0 9325.0 9100.0 9125.0 1377700.0
4 AALI 08/06/2021 9125.0 9175.0 8800.0 8875.0 2981000.0
Ticker Date/Time Open High Low Close Volume hd
0 AALI 02/06/2021 8900.0 9100.0 8825.0 9075.0 2188500.0 0.0
1 AALI 03/06/2021 9125.0 9325.0 9100.0 9200.0 2495200.0 0.0
2 AALI 04/06/2021 9225.0 9250.0 9150.0 9175.0 1298300.0 0.0
3 AALI 07/06/2021 9175.0 9325.0 9100.0 9125.0 1377700.0 -9896930.0
4 AALI 08/06/2021 9125.0 9175.0 8800.0 8875.0 2981000.0 -8427643.0
<class 'pandas.core.frame.DataFrame'>
Int64Index: 103256 entries, 0 to 103255
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ticker 103256 non-null object
1 Date/Time 103256 non-null object
2 Open 103256 non-null float64
3 High 103256 non-null float64
4 Low 103256 non-null float64
5 Close 103256 non-null float64
6 Volume 103256 non-null float64
7 hd 103256 non-null float64
dtypes: float64(6), object(2)
memory usage: 7.1+ MB
None
Process finished with exit code 0
来自@not_speshal的解决方案
一些日期颠倒了,使数据在可视化时变得混乱
似乎“Date/Time”列中的某些值同时具有日期和时间,而有些值只有日期。尝试只保留合并前的日期:
data1 = pd.read_csv('ohlcv286.csv')
data2 = pd.read_csv('hd286.csv')
data1["Date/Time"] = pd.to_datetime(data1["Date/Time"], format="%d/%m/%Y")
data2["Date/Time"] = pd.to_datetime(pd.to_datetime(data2["Date/Time"], format="%d/%m/%Y %H:%M:%S").dt.date, format="%Y-%m-%d")
output = data1.merge(data2, on=['Ticker','Date/Time'], how='left').fillna(0)
>>> output
Ticker Date/Time Open ... Close Volume hd
0 AALI 2021-06-02 8900.0 ... 9075.0 2188500.0 -5.738800e+06
1 AALI 2021-06-03 9125.0 ... 9200.0 2495200.0 -5.717932e+06
2 AALI 2021-06-04 9225.0 ... 9175.0 1298300.0 -5.799378e+06
3 AALI 2021-06-07 9175.0 ... 9125.0 1377700.0 -5.916303e+06
4 AALI 2021-06-08 9125.0 ... 8875.0 2981000.0 -6.594500e+06
... ... ... ... ... ... ...
103251 ZYRX 2021-11-19 580.0 ... 565.0 7308900.0 -1.996443e+10
103252 ZYRX 2021-11-22 565.0 ... 575.0 7221700.0 -1.996425e+10
103253 ZYRX 2021-11-23 575.0 ... 570.0 7555100.0 -1.996424e+10
103254 ZYRX 2021-11-24 570.0 ... 575.0 7803100.0 -1.996366e+10
103255 ZYRX 2021-11-25 575.0 ... 560.0 5545200.0 -1.996360e+10
[103256 rows x 8 columns]
输出(代码:INKP):
我有一些股票市场数据,我想合并 2 个包含 OHLCV 值的 CSV 文件,另一个是我自己的计算值,名为“hd”,添加在“Volume”列之后,条件为:
- 匹配代码名称
- 匹配日期
我刚刚用 left
方法尝试了这个合并代码,但是输出的值位置错误,有些列没有合并,如何正确合并?
import pandas as pd
data1 = pd.read_csv('ohlcv286.csv', parse_dates=True)
data2 = pd.read_csv('hd286.csv', parse_dates=True)
data2['Date/Time'] = pd.to_datetime(data2['Date/Time']).dt.strftime("%d/%m/%Y")
print(data2.tail(5))
print(data1.tail(5))
merge = data1.merge(data2, on=['Ticker','Date/Time'], how='left').fillna(0)
print(merge.tail(5))
print(merge.info())
merge.to_csv('exp.csv', index=False)
截图:
输出:
Ticker Date/Time hd
0 ABDA 06/04/2021 -4.000000e+11
1 ABDA 06/11/2021 -4.000000e+11
2 ABDA 14/06/2021 -4.000000e+11
3 ABDA 15/06/2021 -4.000000e+11
4 ABDA 17/06/2021 -4.000000e+11
Ticker Date/Time Open High Low Close Volume
0 AALI 02/06/2021 8900.0 9100.0 8825.0 9075.0 2188500.0
1 AALI 03/06/2021 9125.0 9325.0 9100.0 9200.0 2495200.0
2 AALI 04/06/2021 9225.0 9250.0 9150.0 9175.0 1298300.0
3 AALI 07/06/2021 9175.0 9325.0 9100.0 9125.0 1377700.0
4 AALI 08/06/2021 9125.0 9175.0 8800.0 8875.0 2981000.0
Ticker Date/Time Open High Low Close Volume hd
0 AALI 02/06/2021 8900.0 9100.0 8825.0 9075.0 2188500.0 0.0
1 AALI 03/06/2021 9125.0 9325.0 9100.0 9200.0 2495200.0 0.0
2 AALI 04/06/2021 9225.0 9250.0 9150.0 9175.0 1298300.0 0.0
3 AALI 07/06/2021 9175.0 9325.0 9100.0 9125.0 1377700.0 -9896930.0
4 AALI 08/06/2021 9125.0 9175.0 8800.0 8875.0 2981000.0 -8427643.0
<class 'pandas.core.frame.DataFrame'>
Int64Index: 103256 entries, 0 to 103255
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Ticker 103256 non-null object
1 Date/Time 103256 non-null object
2 Open 103256 non-null float64
3 High 103256 non-null float64
4 Low 103256 non-null float64
5 Close 103256 non-null float64
6 Volume 103256 non-null float64
7 hd 103256 non-null float64
dtypes: float64(6), object(2)
memory usage: 7.1+ MB
None
Process finished with exit code 0
来自@not_speshal的解决方案
一些日期颠倒了,使数据在可视化时变得混乱
似乎“Date/Time”列中的某些值同时具有日期和时间,而有些值只有日期。尝试只保留合并前的日期:
data1 = pd.read_csv('ohlcv286.csv')
data2 = pd.read_csv('hd286.csv')
data1["Date/Time"] = pd.to_datetime(data1["Date/Time"], format="%d/%m/%Y")
data2["Date/Time"] = pd.to_datetime(pd.to_datetime(data2["Date/Time"], format="%d/%m/%Y %H:%M:%S").dt.date, format="%Y-%m-%d")
output = data1.merge(data2, on=['Ticker','Date/Time'], how='left').fillna(0)
>>> output
Ticker Date/Time Open ... Close Volume hd
0 AALI 2021-06-02 8900.0 ... 9075.0 2188500.0 -5.738800e+06
1 AALI 2021-06-03 9125.0 ... 9200.0 2495200.0 -5.717932e+06
2 AALI 2021-06-04 9225.0 ... 9175.0 1298300.0 -5.799378e+06
3 AALI 2021-06-07 9175.0 ... 9125.0 1377700.0 -5.916303e+06
4 AALI 2021-06-08 9125.0 ... 8875.0 2981000.0 -6.594500e+06
... ... ... ... ... ... ...
103251 ZYRX 2021-11-19 580.0 ... 565.0 7308900.0 -1.996443e+10
103252 ZYRX 2021-11-22 565.0 ... 575.0 7221700.0 -1.996425e+10
103253 ZYRX 2021-11-23 575.0 ... 570.0 7555100.0 -1.996424e+10
103254 ZYRX 2021-11-24 570.0 ... 575.0 7803100.0 -1.996366e+10
103255 ZYRX 2021-11-25 575.0 ... 560.0 5545200.0 -1.996360e+10
[103256 rows x 8 columns]