代码适用于 x70 列的数据框和 FAIL 1000 列(相同的数据结构)

Code works with a dataframe of x70 columns and FAIL with 1000 columns (same data-structure)

我使用 alphalens 模块编写了一个因子分析代码,该模块可以完美处理数据框中的 70 列,但当我尝试使用 1780 列时失败...

我不知道这怎么可能,因为它的结构完全相同,我检查了所有东西,但魔法在 alphalens 中停止了。

https://github.com/Ibsylonne/test_alphalens

如果您有任何线索或想法,请在下面发表评论。

运行

factor = pd.read_csv('original70columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()

date                      
1996-12-31  DU UH Equity      0.0
            SCL LN Equity     0.0
            BMA AR Equity     0.0
            GCLA AR Equity    0.0
            EBS AV Equity     0.0
dtype: float64

没有 运行

factor = pd.read_csv('test1780columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()

date                      
1996-12-31  DU UH Equity      0.0
            SCL LN Equity     0.0
            BMA AR Equity     0.0
            GCLA AR Equity    0.0
            EBS AV Equity     0.0
dtype: float64

对于那些熟悉 alphalens 的人:(尝试使用 1780 列)

factor_data = get_clean_factor_and_forward_returns(
    factor,
    prices,
    quantiles=2,
    periods=(1, 5, 10,),
    max_loss=1)

TypeError: unsupported operand type(s) for /: 'str' and 'float'

很神秘...

任何线索、想法,请在下面评论I_I 谢谢

更新,我认为已将 0 修正为不是 NaN,但我仍然遇到错误,仍在查看这个,但也许我到目前为止所做的也会给你一些想法,让我知道你的想法:

from numpy import nan
from pandas import (DataFrame, date_range)
import pandas as pd
import matplotlib.pyplot as plt

from alphalens.tears import (create_returns_tear_sheet,
                      create_information_tear_sheet,
                      create_turnover_tear_sheet,
                      create_summary_tear_sheet,
                      create_full_tear_sheet,
                      create_event_returns_tear_sheet,
                      create_event_study_tear_sheet)

from alphalens.utils import get_clean_factor_and_forward_returns

# build price
# Added skip for testing, it can be removed
skip=False
prices = pd.read_csv('prices_quant.csv', delimiter=';')
prices['date'] = pd.to_datetime(prices.date)
prices = prices.set_index('date')
prices = prices.fillna(0)
print(prices)


factor = pd.read_csv('test1.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor = factor.fillna(0)
print(factor)

try:
  factor_data = get_clean_factor_and_forward_returns(
     factor,
     prices,
     quantiles=5,
     periods=(1, 5, 10,),
     max_loss=1)
except Exception as e: 
  print(e)
  skip = True
  next 

if skip == False:
  create_full_tear_sheet(factor_data, long_short=True,)
  create_event_returns_tear_sheet(factor_data, prices,long_short=True)
  print("\nNo Errors\n")
else: 
  print("\nWe encountered an error\n")


            DU UH Equity  SCL LN Equity  BMA AR Equity  GCLA AR Equity  EBS AV Equity  OMV AV Equity  ...  RDF SJ Equity  HYP SJ Equity  AEL SJ Equity  MRP SJ Equity  EMI SJ Equity  AXL SJ Equity
date                                                                                                  ...                                                                                          
1996-12-31       0.00000            0.0        9.35256         0.00000        0.00000       11.11470  ...        0.00000        1.25522        1.06860        0.75895        0.00000        0.36700
1997-01-30       0.00000            0.0        9.68044         0.00000        0.00000       11.34016  ...        0.00000        1.25426        1.23754        0.74472        0.00000        0.40870
1997-02-27       0.00000            0.0        9.99271         0.00000        0.00000       11.75658  ...        0.00000        1.21265        1.25000        0.58482        0.00000        0.49981
1997-03-31       0.00000            0.0       11.00760         0.00000        0.00000       11.82128  ...        0.00000        1.27312        1.60597        0.73513        0.00000        0.42375
1997-04-30       0.00000            0.0       10.81243         0.00000        0.00000       10.88544  ...        0.00000        1.24338        1.73112        0.79811        0.00000        0.46649
...                  ...            ...            ...             ...            ...            ...  ...            ...            ...            ...            ...            ...            ...
2018-08-30       1.39123           63.4        4.34430         1.35635       39.73607       52.90799  ...        0.70425        6.94043        1.07509       15.33290        1.08665        0.04015
2018-09-30       1.36945           61.7        4.18581         1.66410       41.55489       56.20015  ...        0.70719        6.51431        1.15394       16.11004        1.05302        0.03670
2018-10-31       1.33406           51.9        4.43810         1.61537       40.70160       55.54638  ...        0.64929        6.11171        1.18551       15.63778        1.00880        0.03318
2018-11-29       1.35312           46.3        4.49952         1.32456       39.43278       50.48753  ...        0.68896        6.40823        1.27159       17.31443        1.06684        0.03305
2018-12-31       1.36956           46.3        4.35219         1.31402       33.23611       43.76183  ...        0.67241        5.66712        1.25163       17.11610        1.02912        0.02990

[265 rows x 1780 columns]
date                      
1996-12-31  DU UH Equity      0.000000
            SCL LN Equity     0.000000
            BMA AR Equity     0.000000
            GCLA AR Equity    0.000000
            EBS AV Equity     0.000000
                                ...   
2018-12-31  HYP SJ Equity     0.029605
            AEL SJ Equity     0.000777
            MRP SJ Equity     0.000000
            EMI SJ Equity     0.000000
            AXL SJ Equity     0.000000
Length: 471700, dtype: float64
unsupported operand type(s) for /: 'int' and 'str'

We encountered an error

如果有帮助请告诉我

原文Post: 有些价格有 NaN,我想知道这是否会导致问题,您知道将它们更改为 0 是否会产生影响。我不确定,但由于它被字符串除法,我认为这可能是根本原因,但这只是一个猜测:

$ python3 code_alphalens_analysis 
            DU UH Equity  SCL LN Equity  BMA AR Equity  GCLA AR Equity  EBS AV Equity  OMV AV Equity  ...  RDF SJ Equity  HYP SJ Equity  AEL SJ Equity  MRP SJ Equity  EMI SJ Equity  AXL SJ Equity
date                                                                                                  ...                                                                                          
1996-12-31           NaN            NaN        9.35256             NaN            NaN       11.11470  ...            NaN        1.25522        1.06860        0.75895            NaN        0.36700
1997-01-30           NaN            NaN        9.68044             NaN            NaN       11.34016  ...            NaN        1.25426        1.23754        0.74472            NaN        0.40870
1997-02-27           NaN            NaN        9.99271             NaN            NaN       11.75658  ...            NaN        1.21265        1.25000        0.58482            NaN        0.49981
1997-03-31           NaN            NaN       11.00760             NaN            NaN       11.82128  ...            NaN        1.27312        1.60597        0.73513            NaN        0.42375
1997-04-30           NaN            NaN       10.81243             NaN            NaN       10.88544  ...            NaN        1.24338        1.73112        0.79811            NaN        0.46649
...                  ...            ...            ...             ...            ...            ...  ...            ...            ...            ...            ...            ...            ...
2018-08-30       1.39123           63.4        4.34430         1.35635       39.73607       52.90799  ...        0.70425        6.94043        1.07509       15.33290        1.08665        0.04015
2018-09-30       1.36945           61.7        4.18581         1.66410       41.55489       56.20015  ...        0.70719        6.51431        1.15394       16.11004        1.05302        0.03670
2018-10-31       1.33406           51.9        4.43810         1.61537       40.70160       55.54638  ...        0.64929        6.11171        1.18551       15.63778        1.00880        0.03318
2018-11-29       1.35312           46.3        4.49952         1.32456       39.43278       50.48753  ...        0.68896        6.40823        1.27159       17.31443        1.06684        0.03305
2018-12-31       1.36956           46.3        4.35219         1.31402       33.23611       43.76183  ...        0.67241        5.66712        1.25163       17.11610        1.02912        0.02990