使用 Santiment sanpy 库进行加密货币数据分析时出错
Error using Santiment sanpy library for cryptocurrency data analysis
我正在使用 sanpy 收集加密货币市场数据,使用 statsmodels 计算 alpha、beta 和 rsquared,然后创建一个 crypto = input("Cryptocurrency: ") 函数带有 while 循环,允许我向用户询问特定的加密货币并输出其各自的统计信息,然后通过再次显示输入。
使用以下代码我收到错误:ValueError:如果使用所有标量值,则必须传递索引
import san
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import statsmodels.api as sm
from statsmodels import regression
cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
"bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
"monero", "bitcoin-gold"]
def get_and_process_data(c):
raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
return raw_data.pct_change()[1:]
df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})
df['MKT Return'] = df.mean(axis=1) # avg market return
#print(df) # show dataframe with all data
def model(x, y):
# Calculate r-squared
X = sm.add_constant(x) # artificially add intercept to x, as advised in the docs
model = sm.OLS(y,X).fit()
rsquared = model.rsquared
# Fit linear regression and calculate alpha and beta
X = sm.add_constant(x)
model = regression.linear_model.OLS(y,X).fit()
alpha = model.params[0]
beta = model.params[1]
return rsquared, alpha, beta
results = pd.DataFrame({c: model(df[df[c].notnull()]['MKT Return'], df[df[c].notnull()][c]) for c in cryptos}).transpose()
results.columns = ['rsquared', 'alpha', 'beta']
print(results)
错误在以下行中:
df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})
我尝试通过将其更改为来解决问题:
df = {c: get_and_process_data(c) for c in cryptos}
df['MKT Return'] = df.mean(axis=1) # avg market return
print(df) # show dataframe with all data
但是,它给了我一个不同的错误:AttributeError: 'dict' object has no attribute 'mean'.
目标是创建一个 DataFrame,其中包含数据时间列、密码列及其 pct.change 数据、MKT 的附加列 Return 与来自所有加密货币的 pct.change 的每日 mean。然后,使用所有这些数据计算每个加密货币的统计数据,最后创建开头提到的输入函数。
我希望我说清楚了,希望有人能帮助我解决这个问题。
这是一个很好的开始,但我认为您对来自 san 的 return 感到困惑。如果你看
import san
import pandas as pd
# List of data we are interested in
cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
"bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
"monero", "bitcoin-gold"]
# function to get the data from san into a dataframe and turn in into
# a daily percentage change
def get_and_process_data(c):
raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
return raw_data.pct_change()[1:]
# now set up an empty dataframe to get all the data put into
df = pd.DataFrame()
# cycle through your list
for c in cryptos:
# get the data as percentage changes
dftemp = get_and_process_data(c)
# then add it to the output dataframe df
df[c] = dftemp['value']
# have a look at what you have
print(df)
从那时起,您就知道自己拥有了一些不错的数据,可以继续使用它。
如果我可以建议您只获取一种货币并使用该货币进行回归,然后继续循环遍历所有货币。
您正在传递标量值,您需要传递列表,因此请尝试以下操作:
data = {c: [get_and_process_data(c)] for c in cryptos}
df = pd.DataFrame(data)
也许先试试这个
我正在使用 sanpy 收集加密货币市场数据,使用 statsmodels 计算 alpha、beta 和 rsquared,然后创建一个 crypto = input("Cryptocurrency: ") 函数带有 while 循环,允许我向用户询问特定的加密货币并输出其各自的统计信息,然后通过再次显示输入。
使用以下代码我收到错误:ValueError:如果使用所有标量值,则必须传递索引
import san
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import datetime
import statsmodels.api as sm
from statsmodels import regression
cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
"bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
"monero", "bitcoin-gold"]
def get_and_process_data(c):
raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
return raw_data.pct_change()[1:]
df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})
df['MKT Return'] = df.mean(axis=1) # avg market return
#print(df) # show dataframe with all data
def model(x, y):
# Calculate r-squared
X = sm.add_constant(x) # artificially add intercept to x, as advised in the docs
model = sm.OLS(y,X).fit()
rsquared = model.rsquared
# Fit linear regression and calculate alpha and beta
X = sm.add_constant(x)
model = regression.linear_model.OLS(y,X).fit()
alpha = model.params[0]
beta = model.params[1]
return rsquared, alpha, beta
results = pd.DataFrame({c: model(df[df[c].notnull()]['MKT Return'], df[df[c].notnull()][c]) for c in cryptos}).transpose()
results.columns = ['rsquared', 'alpha', 'beta']
print(results)
错误在以下行中:
df = pd.DataFrame({c: get_and_process_data(c) for c in cryptos})
我尝试通过将其更改为来解决问题:
df = {c: get_and_process_data(c) for c in cryptos}
df['MKT Return'] = df.mean(axis=1) # avg market return
print(df) # show dataframe with all data
但是,它给了我一个不同的错误:AttributeError: 'dict' object has no attribute 'mean'.
目标是创建一个 DataFrame,其中包含数据时间列、密码列及其 pct.change 数据、MKT 的附加列 Return 与来自所有加密货币的 pct.change 的每日 mean。然后,使用所有这些数据计算每个加密货币的统计数据,最后创建开头提到的输入函数。
我希望我说清楚了,希望有人能帮助我解决这个问题。
这是一个很好的开始,但我认为您对来自 san 的 return 感到困惑。如果你看
import san
import pandas as pd
# List of data we are interested in
cryptos = ["bitcoin", "ethereum", "ripple", "bitcoin-cash", "tether",
"bitcoin-sv", "litecoin", "binance-coin", "eos", "chainlink",
"monero", "bitcoin-gold"]
# function to get the data from san into a dataframe and turn in into
# a daily percentage change
def get_and_process_data(c):
raw_data = san.get("daily_closing_price_usd/" + c, from_date="2014-12-31", to_date="2019-12-31", interval="1d") # "query/slug"
return raw_data.pct_change()[1:]
# now set up an empty dataframe to get all the data put into
df = pd.DataFrame()
# cycle through your list
for c in cryptos:
# get the data as percentage changes
dftemp = get_and_process_data(c)
# then add it to the output dataframe df
df[c] = dftemp['value']
# have a look at what you have
print(df)
从那时起,您就知道自己拥有了一些不错的数据,可以继续使用它。
如果我可以建议您只获取一种货币并使用该货币进行回归,然后继续循环遍历所有货币。
您正在传递标量值,您需要传递列表,因此请尝试以下操作:
data = {c: [get_and_process_data(c)] for c in cryptos}
df = pd.DataFrame(data)
也许先试试这个