代码适用于 x70 列的数据框和 FAIL 1000 列(相同的数据结构)
Code works with a dataframe of x70 columns and FAIL with 1000 columns (same data-structure)
我使用 alphalens 模块编写了一个因子分析代码,该模块可以完美处理数据框中的 70 列,但当我尝试使用 1780 列时失败...
我不知道这怎么可能,因为它的结构完全相同,我检查了所有东西,但魔法在 alphalens 中停止了。
https://github.com/Ibsylonne/test_alphalens
如果您有任何线索或想法,请在下面发表评论。
运行
factor = pd.read_csv('original70columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()
date
1996-12-31 DU UH Equity 0.0
SCL LN Equity 0.0
BMA AR Equity 0.0
GCLA AR Equity 0.0
EBS AV Equity 0.0
dtype: float64
没有 运行
factor = pd.read_csv('test1780columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()
date
1996-12-31 DU UH Equity 0.0
SCL LN Equity 0.0
BMA AR Equity 0.0
GCLA AR Equity 0.0
EBS AV Equity 0.0
dtype: float64
对于那些熟悉 alphalens 的人:(尝试使用 1780 列)
factor_data = get_clean_factor_and_forward_returns(
factor,
prices,
quantiles=2,
periods=(1, 5, 10,),
max_loss=1)
TypeError: unsupported operand type(s) for /: 'str' and 'float'
很神秘...
任何线索、想法,请在下面评论I_I
谢谢
更新,我认为已将 0 修正为不是 NaN,但我仍然遇到错误,仍在查看这个,但也许我到目前为止所做的也会给你一些想法,让我知道你的想法:
from numpy import nan
from pandas import (DataFrame, date_range)
import pandas as pd
import matplotlib.pyplot as plt
from alphalens.tears import (create_returns_tear_sheet,
create_information_tear_sheet,
create_turnover_tear_sheet,
create_summary_tear_sheet,
create_full_tear_sheet,
create_event_returns_tear_sheet,
create_event_study_tear_sheet)
from alphalens.utils import get_clean_factor_and_forward_returns
# build price
# Added skip for testing, it can be removed
skip=False
prices = pd.read_csv('prices_quant.csv', delimiter=';')
prices['date'] = pd.to_datetime(prices.date)
prices = prices.set_index('date')
prices = prices.fillna(0)
print(prices)
factor = pd.read_csv('test1.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor = factor.fillna(0)
print(factor)
try:
factor_data = get_clean_factor_and_forward_returns(
factor,
prices,
quantiles=5,
periods=(1, 5, 10,),
max_loss=1)
except Exception as e:
print(e)
skip = True
next
if skip == False:
create_full_tear_sheet(factor_data, long_short=True,)
create_event_returns_tear_sheet(factor_data, prices,long_short=True)
print("\nNo Errors\n")
else:
print("\nWe encountered an error\n")
DU UH Equity SCL LN Equity BMA AR Equity GCLA AR Equity EBS AV Equity OMV AV Equity ... RDF SJ Equity HYP SJ Equity AEL SJ Equity MRP SJ Equity EMI SJ Equity AXL SJ Equity
date ...
1996-12-31 0.00000 0.0 9.35256 0.00000 0.00000 11.11470 ... 0.00000 1.25522 1.06860 0.75895 0.00000 0.36700
1997-01-30 0.00000 0.0 9.68044 0.00000 0.00000 11.34016 ... 0.00000 1.25426 1.23754 0.74472 0.00000 0.40870
1997-02-27 0.00000 0.0 9.99271 0.00000 0.00000 11.75658 ... 0.00000 1.21265 1.25000 0.58482 0.00000 0.49981
1997-03-31 0.00000 0.0 11.00760 0.00000 0.00000 11.82128 ... 0.00000 1.27312 1.60597 0.73513 0.00000 0.42375
1997-04-30 0.00000 0.0 10.81243 0.00000 0.00000 10.88544 ... 0.00000 1.24338 1.73112 0.79811 0.00000 0.46649
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-08-30 1.39123 63.4 4.34430 1.35635 39.73607 52.90799 ... 0.70425 6.94043 1.07509 15.33290 1.08665 0.04015
2018-09-30 1.36945 61.7 4.18581 1.66410 41.55489 56.20015 ... 0.70719 6.51431 1.15394 16.11004 1.05302 0.03670
2018-10-31 1.33406 51.9 4.43810 1.61537 40.70160 55.54638 ... 0.64929 6.11171 1.18551 15.63778 1.00880 0.03318
2018-11-29 1.35312 46.3 4.49952 1.32456 39.43278 50.48753 ... 0.68896 6.40823 1.27159 17.31443 1.06684 0.03305
2018-12-31 1.36956 46.3 4.35219 1.31402 33.23611 43.76183 ... 0.67241 5.66712 1.25163 17.11610 1.02912 0.02990
[265 rows x 1780 columns]
date
1996-12-31 DU UH Equity 0.000000
SCL LN Equity 0.000000
BMA AR Equity 0.000000
GCLA AR Equity 0.000000
EBS AV Equity 0.000000
...
2018-12-31 HYP SJ Equity 0.029605
AEL SJ Equity 0.000777
MRP SJ Equity 0.000000
EMI SJ Equity 0.000000
AXL SJ Equity 0.000000
Length: 471700, dtype: float64
unsupported operand type(s) for /: 'int' and 'str'
We encountered an error
如果有帮助请告诉我
原文Post:
有些价格有 NaN,我想知道这是否会导致问题,您知道将它们更改为 0 是否会产生影响。我不确定,但由于它被字符串除法,我认为这可能是根本原因,但这只是一个猜测:
$ python3 code_alphalens_analysis
DU UH Equity SCL LN Equity BMA AR Equity GCLA AR Equity EBS AV Equity OMV AV Equity ... RDF SJ Equity HYP SJ Equity AEL SJ Equity MRP SJ Equity EMI SJ Equity AXL SJ Equity
date ...
1996-12-31 NaN NaN 9.35256 NaN NaN 11.11470 ... NaN 1.25522 1.06860 0.75895 NaN 0.36700
1997-01-30 NaN NaN 9.68044 NaN NaN 11.34016 ... NaN 1.25426 1.23754 0.74472 NaN 0.40870
1997-02-27 NaN NaN 9.99271 NaN NaN 11.75658 ... NaN 1.21265 1.25000 0.58482 NaN 0.49981
1997-03-31 NaN NaN 11.00760 NaN NaN 11.82128 ... NaN 1.27312 1.60597 0.73513 NaN 0.42375
1997-04-30 NaN NaN 10.81243 NaN NaN 10.88544 ... NaN 1.24338 1.73112 0.79811 NaN 0.46649
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-08-30 1.39123 63.4 4.34430 1.35635 39.73607 52.90799 ... 0.70425 6.94043 1.07509 15.33290 1.08665 0.04015
2018-09-30 1.36945 61.7 4.18581 1.66410 41.55489 56.20015 ... 0.70719 6.51431 1.15394 16.11004 1.05302 0.03670
2018-10-31 1.33406 51.9 4.43810 1.61537 40.70160 55.54638 ... 0.64929 6.11171 1.18551 15.63778 1.00880 0.03318
2018-11-29 1.35312 46.3 4.49952 1.32456 39.43278 50.48753 ... 0.68896 6.40823 1.27159 17.31443 1.06684 0.03305
2018-12-31 1.36956 46.3 4.35219 1.31402 33.23611 43.76183 ... 0.67241 5.66712 1.25163 17.11610 1.02912 0.02990
我使用 alphalens 模块编写了一个因子分析代码,该模块可以完美处理数据框中的 70 列,但当我尝试使用 1780 列时失败...
我不知道这怎么可能,因为它的结构完全相同,我检查了所有东西,但魔法在 alphalens 中停止了。
https://github.com/Ibsylonne/test_alphalens
如果您有任何线索或想法,请在下面发表评论。
运行
factor = pd.read_csv('original70columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()
date
1996-12-31 DU UH Equity 0.0
SCL LN Equity 0.0
BMA AR Equity 0.0
GCLA AR Equity 0.0
EBS AV Equity 0.0
dtype: float64
没有 运行
factor = pd.read_csv('test1780columns.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor.head()
date
1996-12-31 DU UH Equity 0.0
SCL LN Equity 0.0
BMA AR Equity 0.0
GCLA AR Equity 0.0
EBS AV Equity 0.0
dtype: float64
对于那些熟悉 alphalens 的人:(尝试使用 1780 列)
factor_data = get_clean_factor_and_forward_returns(
factor,
prices,
quantiles=2,
periods=(1, 5, 10,),
max_loss=1)
TypeError: unsupported operand type(s) for /: 'str' and 'float'
很神秘...
任何线索、想法,请在下面评论I_I 谢谢
更新,我认为已将 0 修正为不是 NaN,但我仍然遇到错误,仍在查看这个,但也许我到目前为止所做的也会给你一些想法,让我知道你的想法:
from numpy import nan
from pandas import (DataFrame, date_range)
import pandas as pd
import matplotlib.pyplot as plt
from alphalens.tears import (create_returns_tear_sheet,
create_information_tear_sheet,
create_turnover_tear_sheet,
create_summary_tear_sheet,
create_full_tear_sheet,
create_event_returns_tear_sheet,
create_event_study_tear_sheet)
from alphalens.utils import get_clean_factor_and_forward_returns
# build price
# Added skip for testing, it can be removed
skip=False
prices = pd.read_csv('prices_quant.csv', delimiter=';')
prices['date'] = pd.to_datetime(prices.date)
prices = prices.set_index('date')
prices = prices.fillna(0)
print(prices)
factor = pd.read_csv('test1.csv', delimiter=';')
factor['date'] = pd.to_datetime(factor.date)
factor = factor.set_index('date').stack()
factor = factor.fillna(0)
print(factor)
try:
factor_data = get_clean_factor_and_forward_returns(
factor,
prices,
quantiles=5,
periods=(1, 5, 10,),
max_loss=1)
except Exception as e:
print(e)
skip = True
next
if skip == False:
create_full_tear_sheet(factor_data, long_short=True,)
create_event_returns_tear_sheet(factor_data, prices,long_short=True)
print("\nNo Errors\n")
else:
print("\nWe encountered an error\n")
DU UH Equity SCL LN Equity BMA AR Equity GCLA AR Equity EBS AV Equity OMV AV Equity ... RDF SJ Equity HYP SJ Equity AEL SJ Equity MRP SJ Equity EMI SJ Equity AXL SJ Equity
date ...
1996-12-31 0.00000 0.0 9.35256 0.00000 0.00000 11.11470 ... 0.00000 1.25522 1.06860 0.75895 0.00000 0.36700
1997-01-30 0.00000 0.0 9.68044 0.00000 0.00000 11.34016 ... 0.00000 1.25426 1.23754 0.74472 0.00000 0.40870
1997-02-27 0.00000 0.0 9.99271 0.00000 0.00000 11.75658 ... 0.00000 1.21265 1.25000 0.58482 0.00000 0.49981
1997-03-31 0.00000 0.0 11.00760 0.00000 0.00000 11.82128 ... 0.00000 1.27312 1.60597 0.73513 0.00000 0.42375
1997-04-30 0.00000 0.0 10.81243 0.00000 0.00000 10.88544 ... 0.00000 1.24338 1.73112 0.79811 0.00000 0.46649
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-08-30 1.39123 63.4 4.34430 1.35635 39.73607 52.90799 ... 0.70425 6.94043 1.07509 15.33290 1.08665 0.04015
2018-09-30 1.36945 61.7 4.18581 1.66410 41.55489 56.20015 ... 0.70719 6.51431 1.15394 16.11004 1.05302 0.03670
2018-10-31 1.33406 51.9 4.43810 1.61537 40.70160 55.54638 ... 0.64929 6.11171 1.18551 15.63778 1.00880 0.03318
2018-11-29 1.35312 46.3 4.49952 1.32456 39.43278 50.48753 ... 0.68896 6.40823 1.27159 17.31443 1.06684 0.03305
2018-12-31 1.36956 46.3 4.35219 1.31402 33.23611 43.76183 ... 0.67241 5.66712 1.25163 17.11610 1.02912 0.02990
[265 rows x 1780 columns]
date
1996-12-31 DU UH Equity 0.000000
SCL LN Equity 0.000000
BMA AR Equity 0.000000
GCLA AR Equity 0.000000
EBS AV Equity 0.000000
...
2018-12-31 HYP SJ Equity 0.029605
AEL SJ Equity 0.000777
MRP SJ Equity 0.000000
EMI SJ Equity 0.000000
AXL SJ Equity 0.000000
Length: 471700, dtype: float64
unsupported operand type(s) for /: 'int' and 'str'
We encountered an error
如果有帮助请告诉我
原文Post: 有些价格有 NaN,我想知道这是否会导致问题,您知道将它们更改为 0 是否会产生影响。我不确定,但由于它被字符串除法,我认为这可能是根本原因,但这只是一个猜测:
$ python3 code_alphalens_analysis
DU UH Equity SCL LN Equity BMA AR Equity GCLA AR Equity EBS AV Equity OMV AV Equity ... RDF SJ Equity HYP SJ Equity AEL SJ Equity MRP SJ Equity EMI SJ Equity AXL SJ Equity
date ...
1996-12-31 NaN NaN 9.35256 NaN NaN 11.11470 ... NaN 1.25522 1.06860 0.75895 NaN 0.36700
1997-01-30 NaN NaN 9.68044 NaN NaN 11.34016 ... NaN 1.25426 1.23754 0.74472 NaN 0.40870
1997-02-27 NaN NaN 9.99271 NaN NaN 11.75658 ... NaN 1.21265 1.25000 0.58482 NaN 0.49981
1997-03-31 NaN NaN 11.00760 NaN NaN 11.82128 ... NaN 1.27312 1.60597 0.73513 NaN 0.42375
1997-04-30 NaN NaN 10.81243 NaN NaN 10.88544 ... NaN 1.24338 1.73112 0.79811 NaN 0.46649
... ... ... ... ... ... ... ... ... ... ... ... ... ...
2018-08-30 1.39123 63.4 4.34430 1.35635 39.73607 52.90799 ... 0.70425 6.94043 1.07509 15.33290 1.08665 0.04015
2018-09-30 1.36945 61.7 4.18581 1.66410 41.55489 56.20015 ... 0.70719 6.51431 1.15394 16.11004 1.05302 0.03670
2018-10-31 1.33406 51.9 4.43810 1.61537 40.70160 55.54638 ... 0.64929 6.11171 1.18551 15.63778 1.00880 0.03318
2018-11-29 1.35312 46.3 4.49952 1.32456 39.43278 50.48753 ... 0.68896 6.40823 1.27159 17.31443 1.06684 0.03305
2018-12-31 1.36956 46.3 4.35219 1.31402 33.23611 43.76183 ... 0.67241 5.66712 1.25163 17.11610 1.02912 0.02990