如果 Dask 系列包含不可散列类型,如何将其转换为字符串类型?
How to cast a Dask Series as string type if it contains an unhashable type?
我想在任意 Dask 系列上调用 .value_counts()
,如果系列包含不可散列的类型,我想将该系列转换为类型 string。如果不需要,我不想将系列转换为字符串。我也不想在调用 .value_counts()
之前调用 .compute()
。我试过了
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
try:
val_counts = srs.value_counts()
except TypeError:
srs = srs.astype(str)
val_counts = srs.value_counts()
val_counts.compute()
给出了错误
TypeError: unhashable type: 'list'
和
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
def func(srs):
try:
val_counts = srs.value_counts()
except TypeError:
srs = srs.astype(str)
val_counts = srs.value_counts()
return val_counts
val_counts = dask.compute(func(srs))
给出了同样的错误。
我也试过了
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
if srs.apply(lambda y: isinstance(y, list), meta=srs).any():
srs = srs.astype(str)
srs.value_counts().compute()
给出了错误
TypeError: Trying to convert dd.Scalar<series-..., type=str> to a boolean value.
也许先将列表转换成元组之类的可散列的东西?
s.apply(tuple).value_counts() ?
我想在任意 Dask 系列上调用 .value_counts()
,如果系列包含不可散列的类型,我想将该系列转换为类型 string。如果不需要,我不想将系列转换为字符串。我也不想在调用 .value_counts()
之前调用 .compute()
。我试过了
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
try:
val_counts = srs.value_counts()
except TypeError:
srs = srs.astype(str)
val_counts = srs.value_counts()
val_counts.compute()
给出了错误
TypeError: unhashable type: 'list'
和
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
def func(srs):
try:
val_counts = srs.value_counts()
except TypeError:
srs = srs.astype(str)
val_counts = srs.value_counts()
return val_counts
val_counts = dask.compute(func(srs))
给出了同样的错误。
我也试过了
df = pd.DataFrame({"a":[[1], ["foo"], ["foo", "bar"]]})
df = dd.from_pandas(df, npartitions=1)
srs = df["a"]
if srs.apply(lambda y: isinstance(y, list), meta=srs).any():
srs = srs.astype(str)
srs.value_counts().compute()
给出了错误
TypeError: Trying to convert dd.Scalar<series-..., type=str> to a boolean value.
也许先将列表转换成元组之类的可散列的东西?
s.apply(tuple).value_counts() ?