如何替换pandas中的离群数据?
How to replace outlier data in pandas?
我有一个从雅虎财经抓取的股票数据,调整后的收盘数据不知何故是错误的。
adj_close close ratio
date
2014-10-16 240.4076 2466.40 0.097473
2014-10-17 245.8173 2521.90 0.097473
2014-10-20 250.4522 2569.45 0.097473
2014-10-21 251.8850 2584.15 0.097473
2014-10-22 251.0175 2575.25 0.097473
2014-10-23 251.3392 2578.55 0.097473
2014-10-27 253.2155 2597.80 0.097473
2014-10-28 258.9616 2656.75 0.097473
2014-10-29 257.6944 2643.75 0.097473
2014-10-30 257.1339 2638.00 0.097473
2014-10-31 26.3450 2702.80 0.009747
2014-11-03 26.5463 2723.45 0.009747
2014-11-05 27.1160 2781.90 0.009747
2014-11-07 26.7320 2742.50 0.009747
2014-11-10 26.7027 2739.50 0.009747
这是调整后的收盘数据图:
如何使用插值等方法替换这样的数据?
试试这个:
In [71]: import pandas_datareader.data as web
In [110]: df = web.DataReader('SBIN.NS', 'yahoo', '2014-10-21', '2014-11-25')
In [111]: df
Out[111]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.8850
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.0175
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.3392
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.3392
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.2155
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.9616
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.6944
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.1339
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 26.3450 # <bad_data>
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 26.5463
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 26.5463
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 27.1160
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 27.1160
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 26.7320
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 26.7027
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 26.7110
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 26.7666
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 26.5059
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 27.1740
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 28.6629
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 28.7131
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 28.3695 # </bad_data>
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.5925
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.7802
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.0456
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.7568
注意:Adj Close
列从 2014-11-20
开始已 恢复 ,其他列 - 未 ,所以我将只关注 Adj Close
:
让我们找出离群值(我正在检查前一天 50+%
发生变化的异常值 - 您可能想要更改此阈值):
In [112]: bad_idx = df.index[df['Adj Close'].pct_change().abs().ge(0.5)]
In [113]: bad_idx
Out[113]: DatetimeIndex(['2014-10-31', '2014-11-20'], dtype='datetime64[ns]', name='Date', freq=None)
In [114]: df.loc[(df.index >= bad_idx.min()) & (df.index < bad_idx.max()), 'Adj Close'] *= 10
In [115]: df
Out[115]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.8850
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.0175
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.3392
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.3392
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.2155
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.9616
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.6944
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.1339
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 263.4500
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 265.4630
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 265.4630
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 271.1600
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 271.1600
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 267.3200
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 267.0270
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 267.1100
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 267.6660
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 265.0590
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 271.7400
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 286.6290
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 287.1310
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 283.6950
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.5925
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.7802
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.0456
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.7568
这是另一个使用插值的解决方案:
In [119]: df.loc[(df.index >= bad_idx.min()) & (df.index < bad_idx.max()), 'Adj Close'] = np.nan
In [120]: df
Out[120]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.8850
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.0175
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.3392
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.3392
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.2155
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.9616
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.6944
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.1339
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 NaN
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 NaN
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 NaN
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 NaN
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 NaN
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 NaN
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 NaN
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 NaN
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 NaN
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 NaN
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 NaN
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 NaN
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 NaN
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 NaN
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.5925
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.7802
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.0456
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.7568
In [122]: df['Adj Close'] = df['Adj Close'].interpolate()
In [123]: df
Out[123]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.885000
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.017500
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.339200
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.339200
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.215500
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.961600
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.694400
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.133900
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 259.297807
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 261.461713
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 263.625620
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 265.789527
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 267.953433
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 270.117340
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 272.281247
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 274.445153
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 276.609060
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 278.772967
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 280.936873
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 283.100780
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 285.264687
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 287.428593
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.592500
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.780200
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.045600
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.756800
问题不完全是数据...问题是您还没有了解这里的市场基本面。因此,与其将其视为数学问题 ("replace outliers"),不如将其视为数据 sourcing/cleaning 问题(修复数据)。
您正在查看的代码是 SBIN.NS(印度国家银行)。它在 2014 年 11 月 21 日进行了 1:10 拆分,如此处报道:http://articles.economictimes.indiatimes.com/2014-11-20/news/56304010_1_india-gains-state-bank-stock-split
您显示的图表清楚地表明雅虎数据在该日期前后出现了问题。
所以发生了什么?
第一个中断发生在 2014 年 10 月 31 日,当时雅虎显示价格跌至 1:10。这显然是一个错误。我猜想他们的自动公司行为解析器在该日期收到了待处理 1:10 拆分的通知,并立即应用它,而不是记录日期 2014-11-21。因此,从 2014-10-31 到 2014-11-20(含),您的价格错误了 10 倍。
在这种情况下,除非您可以从其他来源获取数据,否则最好的解决方案是简单地记下此错误并将错误日期的雅虎价格乘以 10。
我有一个从雅虎财经抓取的股票数据,调整后的收盘数据不知何故是错误的。
adj_close close ratio
date
2014-10-16 240.4076 2466.40 0.097473
2014-10-17 245.8173 2521.90 0.097473
2014-10-20 250.4522 2569.45 0.097473
2014-10-21 251.8850 2584.15 0.097473
2014-10-22 251.0175 2575.25 0.097473
2014-10-23 251.3392 2578.55 0.097473
2014-10-27 253.2155 2597.80 0.097473
2014-10-28 258.9616 2656.75 0.097473
2014-10-29 257.6944 2643.75 0.097473
2014-10-30 257.1339 2638.00 0.097473
2014-10-31 26.3450 2702.80 0.009747
2014-11-03 26.5463 2723.45 0.009747
2014-11-05 27.1160 2781.90 0.009747
2014-11-07 26.7320 2742.50 0.009747
2014-11-10 26.7027 2739.50 0.009747
这是调整后的收盘数据图:
如何使用插值等方法替换这样的数据?
试试这个:
In [71]: import pandas_datareader.data as web
In [110]: df = web.DataReader('SBIN.NS', 'yahoo', '2014-10-21', '2014-11-25')
In [111]: df
Out[111]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.8850
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.0175
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.3392
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.3392
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.2155
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.9616
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.6944
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.1339
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 26.3450 # <bad_data>
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 26.5463
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 26.5463
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 27.1160
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 27.1160
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 26.7320
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 26.7027
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 26.7110
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 26.7666
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 26.5059
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 27.1740
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 28.6629
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 28.7131
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 28.3695 # </bad_data>
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.5925
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.7802
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.0456
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.7568
注意:Adj Close
列从 2014-11-20
开始已 恢复 ,其他列 - 未 ,所以我将只关注 Adj Close
:
让我们找出离群值(我正在检查前一天 50+%
发生变化的异常值 - 您可能想要更改此阈值):
In [112]: bad_idx = df.index[df['Adj Close'].pct_change().abs().ge(0.5)]
In [113]: bad_idx
Out[113]: DatetimeIndex(['2014-10-31', '2014-11-20'], dtype='datetime64[ns]', name='Date', freq=None)
In [114]: df.loc[(df.index >= bad_idx.min()) & (df.index < bad_idx.max()), 'Adj Close'] *= 10
In [115]: df
Out[115]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.8850
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.0175
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.3392
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.3392
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.2155
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.9616
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.6944
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.1339
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 263.4500
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 265.4630
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 265.4630
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 271.1600
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 271.1600
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 267.3200
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 267.0270
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 267.1100
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 267.6660
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 265.0590
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 271.7400
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 286.6290
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 287.1310
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 283.6950
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.5925
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.7802
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.0456
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.7568
这是另一个使用插值的解决方案:
In [119]: df.loc[(df.index >= bad_idx.min()) & (df.index < bad_idx.max()), 'Adj Close'] = np.nan
In [120]: df
Out[120]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.8850
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.0175
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.3392
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.3392
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.2155
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.9616
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.6944
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.1339
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 NaN
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 NaN
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 NaN
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 NaN
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 NaN
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 NaN
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 NaN
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 NaN
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 NaN
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 NaN
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 NaN
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 NaN
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 NaN
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 NaN
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.5925
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.7802
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.0456
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.7568
In [122]: df['Adj Close'] = df['Adj Close'].interpolate()
In [123]: df
Out[123]:
Open High Low Close Volume Adj Close
Date
2014-10-21 2580.0000 2607.0001 2569.5999 2584.1501 15022300 251.885000
2014-10-22 2608.9999 2613.5999 2565.1001 2575.2499 14511100 251.017500
2014-10-23 2591.4001 2593.7000 2573.9999 2578.5501 2376200 251.339200
2014-10-24 2578.5501 2578.5501 2578.5501 2578.5501 0 251.339200
2014-10-27 2592.0001 2619.8999 2581.0001 2597.8000 13429500 253.215500
2014-10-28 2607.9999 2664.2999 2606.0001 2656.7499 22963400 258.961600
2014-10-29 2677.0001 2678.9999 2631.0001 2643.7500 17372900 257.694400
2014-10-30 2649.8999 2653.0499 2622.0001 2637.9999 15544200 257.133900
2014-10-31 265.2000 270.9800 264.6000 270.2800 20770200 259.297807
2014-11-03 270.6000 274.3500 269.4250 272.3450 17780600 261.461713
2014-11-04 272.3450 272.3450 272.3450 272.3450 0 263.625620
2014-11-05 273.3000 279.9800 272.4050 278.1900 26605100 265.789527
2014-11-06 278.1900 278.1900 278.1900 278.1900 0 267.953433
2014-11-07 277.5000 278.1000 273.0000 274.2500 18163000 270.117340
2014-11-10 275.9000 276.9000 273.3000 273.9500 12068800 272.281247
2014-11-11 274.7900 276.2500 270.5000 274.0350 17405900 274.445153
2014-11-12 275.3000 277.1500 273.5550 274.6050 16233200 276.609060
2014-11-13 275.6100 276.2250 269.5000 271.9300 16859000 278.772967
2014-11-14 273.0000 280.6900 272.0000 278.7850 50846600 280.936873
2014-11-17 279.4000 295.1300 279.2200 294.0600 49164100 283.100780
2014-11-18 295.6950 297.9000 292.4100 294.5750 32898300 285.264687
2014-11-19 294.9000 296.8000 290.3550 291.0500 20735900 287.428593
2014-11-20 294.7500 298.7500 291.2500 297.1000 18099500 289.592500
2014-11-21 299.9000 307.0000 297.2500 305.5000 21009200 297.780200
2014-11-24 307.8000 309.8500 306.0500 308.8500 18631400 301.045600
2014-11-25 309.9000 309.9500 301.0000 304.4500 26776600 296.756800
问题不完全是数据...问题是您还没有了解这里的市场基本面。因此,与其将其视为数学问题 ("replace outliers"),不如将其视为数据 sourcing/cleaning 问题(修复数据)。
您正在查看的代码是 SBIN.NS(印度国家银行)。它在 2014 年 11 月 21 日进行了 1:10 拆分,如此处报道:http://articles.economictimes.indiatimes.com/2014-11-20/news/56304010_1_india-gains-state-bank-stock-split
您显示的图表清楚地表明雅虎数据在该日期前后出现了问题。
所以发生了什么?
第一个中断发生在 2014 年 10 月 31 日,当时雅虎显示价格跌至 1:10。这显然是一个错误。我猜想他们的自动公司行为解析器在该日期收到了待处理 1:10 拆分的通知,并立即应用它,而不是记录日期 2014-11-21。因此,从 2014-10-31 到 2014-11-20(含),您的价格错误了 10 倍。
在这种情况下,除非您可以从其他来源获取数据,否则最好的解决方案是简单地记下此错误并将错误日期的雅虎价格乘以 10。