使用 Santiment sanpy 库进行加密货币数据分析时出错

Error using Santiment sanpy library for cryptocurrency data analysis

我正在使用 sanpy 收集加密货币市场数据,使用 statsmodels 计算 alpha、beta 和 rsquared,然后创建一个 crypto = input("Cryptocurrency: ") 函数带有 while 循环,允许我向用户询问特定的加密货币并输出其各自的统计信息,然后通过再次显示输入。

使用以下代码我收到错误:ValueError:如果使用所有标量值,则必须传递索引

import san
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import statsmodels.api as sm
from statsmodels import regression

cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
"bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
"monero", "bitcoin-gold"]

def get_and_process_data(c):
    raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
    return raw_data.pct_change()[1:]


df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})

df['MKT Return'] = df.mean(axis=1) # avg market return
#print(df) # show dataframe with all data

def model(x, y):
    # Calculate r-squared
    X = sm.add_constant(x) # artificially add intercept to x, as advised in the docs
    model = sm.OLS(y,X).fit()
    rsquared = model.rsquared
    
    # Fit linear regression and calculate alpha and beta
    X = sm.add_constant(x)
    model = regression.linear_model.OLS(y,X).fit()
    alpha = model.params[0]
    beta = model.params[1]

    return rsquared, alpha, beta

results = pd.DataFrame({c: model(df[df[c].notnull()]['MKT Return'], df[df[c].notnull()][c]) for c in cryptos}).transpose()
results.columns = ['rsquared', 'alpha', 'beta']
print(results)

错误在以下行中:

df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})

我尝试通过将其更改为来解决问题:

df = {c: get_and_process_data(c) for c in cryptos}

df['MKT Return'] = df.mean(axis=1) # avg market return
print(df) # show dataframe with all data

但是,它给了我一个不同的错误:AttributeError: 'dict' object has no attribute 'mean'.

目标是创建一个 DataFrame,其中包含数据时间列、密码列及其 pct.change 数据、MKT 的附加列 Return 与来自所有加密货币的 pct.change 的每日 mean。然后,使用所有这些数据计算每个加密货币的统计数据,最后创建开头提到的输入函数。

我希望我说清楚了,希望有人能帮助我解决这个问题。

这是一个很好的开始,但我认为您对来自 san 的 return 感到困惑。如果你看

import san
import pandas as pd

# List of data we are interested in    
cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
"bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
"monero", "bitcoin-gold"]

# function to get the data from san into a dataframe and turn in into
# a daily percentage change
def get_and_process_data(c):
    raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
    return raw_data.pct_change()[1:]

# now set up an empty dataframe to get all the data put into
df = pd.DataFrame()
# cycle through your list
for c in cryptos:
    # get the data as percentage changes
    dftemp = get_and_process_data(c)
    # then add it to the output dataframe df
    df[c] = dftemp['value']

# have a look at what you have
print(df)

从那时起,您就知道自己拥有了一些不错的数据,可以继续使用它。

如果我可以建议您只获取一种货币并使用该货币进行回归,然后继续循环遍历所有货币。

您正在传递标量值,您需要传递列表,因此请尝试以下操作:

data = {c: [get_and_process_data(c)] for c in cryptos}
df = pd.DataFrame(data)

也许先试试这个