将 python 中的缓存 pandas 数据帧传递给另一个缓存函数会出现 "unhashable type: dataFrame" 错误
Passing cached pandas dataframe in python to another cached function give "unhashable type: dataFrame" error
我有三个功能,例如:
from cachetools import cached, TTLCache
import pandas as pd
cache=TTLCache(10,1000)
@cached(cache)
def function1():
df=pd.DataFrame({'one':range(5),'two':range(5,10)}) #just a little data, doesn't matter what
return df
@cached(cache)
def function2(df):
var1=df['one']
var2=df['two']
return var1, var2
def function3():
df=function1()
var1,var2=function2(df) #pass df to function 2 for some work
print('this is var1[0]: '+str(var1[0]))
print('this is var2[0]: '+str(var2[0]))
function3()
我想要 df、var1 和 var2 的缓存版本。基本上,我只想在没有缓存的情况下在 function3 内部重新分配 df,然后对依赖于 df 的 var1 和 var2 执行以下操作。有没有办法做到这一点?当我从 function2 中删除 @cached(cache)
时,代码就可以工作了。
这是我得到的错误
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
尝试使用缓存库,它对我有用
import pandas as pd
from cacheout import Cache
cache = Cache()
@cache.memoize()
def function1():
df = pd.DataFrame({'one': range(5), 'two': range(5, 10)})
return df
@cache.memoize()
def function2(df):
var1 = df['one']
var2 = df['two']
return var1, var2
def function3():
df = function1()
var1, var2 = function2(df)
print('this is var1[0]: ' + str(var1[0]))
print('this is var2[0]: ' + str(var2[0]))
function3()
输出:
this is var1[0]: 0
this is var2[0]: 5
正如所接受的答案所述,问题似乎与缓存工具有关。如果您绝对需要 cachetools,那么您可以将 df 转换为字符串并返回,但计算开销可能会令人望而却步。
cache=TTLCache(10,1000)
@cached(cache)
def function1():
df=pd.DataFrame({'one':range(5),'two':range(5,10)}) #just a little data, doesn't matter what
print('iran')
return df.to_csv(index=False) #return df as string
@cached(cache)
def function2(df):
df = pd.read_csv(StringIO(df)) #return string df to normal pandas df.
var1=df['one']
var2=df['two']
print('iran2')
return var1, var2
def function3():
df=function1()
var1,var2=function2(df)
print('this is var1[0]: '+str(var1[0]))
print('this is var2[0]: '+str(var2[0]))
function3()
我有三个功能,例如:
from cachetools import cached, TTLCache
import pandas as pd
cache=TTLCache(10,1000)
@cached(cache)
def function1():
df=pd.DataFrame({'one':range(5),'two':range(5,10)}) #just a little data, doesn't matter what
return df
@cached(cache)
def function2(df):
var1=df['one']
var2=df['two']
return var1, var2
def function3():
df=function1()
var1,var2=function2(df) #pass df to function 2 for some work
print('this is var1[0]: '+str(var1[0]))
print('this is var2[0]: '+str(var2[0]))
function3()
我想要 df、var1 和 var2 的缓存版本。基本上,我只想在没有缓存的情况下在 function3 内部重新分配 df,然后对依赖于 df 的 var1 和 var2 执行以下操作。有没有办法做到这一点?当我从 function2 中删除 @cached(cache)
时,代码就可以工作了。
这是我得到的错误
TypeError: 'DataFrame' objects are mutable, thus they cannot be hashed
尝试使用缓存库,它对我有用
import pandas as pd
from cacheout import Cache
cache = Cache()
@cache.memoize()
def function1():
df = pd.DataFrame({'one': range(5), 'two': range(5, 10)})
return df
@cache.memoize()
def function2(df):
var1 = df['one']
var2 = df['two']
return var1, var2
def function3():
df = function1()
var1, var2 = function2(df)
print('this is var1[0]: ' + str(var1[0]))
print('this is var2[0]: ' + str(var2[0]))
function3()
输出:
this is var1[0]: 0
this is var2[0]: 5
正如所接受的答案所述,问题似乎与缓存工具有关。如果您绝对需要 cachetools,那么您可以将 df 转换为字符串并返回,但计算开销可能会令人望而却步。
cache=TTLCache(10,1000)
@cached(cache)
def function1():
df=pd.DataFrame({'one':range(5),'two':range(5,10)}) #just a little data, doesn't matter what
print('iran')
return df.to_csv(index=False) #return df as string
@cached(cache)
def function2(df):
df = pd.read_csv(StringIO(df)) #return string df to normal pandas df.
var1=df['one']
var2=df['two']
print('iran2')
return var1, var2
def function3():
df=function1()
var1,var2=function2(df)
print('this is var1[0]: '+str(var1[0]))
print('this is var2[0]: '+str(var2[0]))
function3()