Dask 警告提供明确的输出类型
Dask warning provide an explicit output types
我正在使用 Dask 执行以下操作。
import dask.dataframe as dd
import pandas as pd
salary_df = pd.DataFrame({"Salary":[10000, 50000, 25000, 30000, 7000]})
salary_category = pd.DataFrame({"Hi":[5000, 20000, 25000, 30000, 90000],
"Low":[0, 5001, 20001, 25001, 30001],
"category":["Very Poor", "Poor", "Medium", "Rich", "Super Rich" ]
})
sal_ddf = dd.from_pandas(salary_df, npartitions=10)
salary_category.index = pd.IntervalIndex.from_arrays(salary_category['Low'],salary_category['Hi'],closed='both')
sal_ddf['Category'] = sal_ddf['Salary'].apply(lambda x : salary_category.iloc[salary_category.index.get_loc(x)]['category'])
我确实得到了结果,但下面一行有警告
sal_ddf['Category'] = sal_ddf['Salary'].apply(lambda x : salary_category.iloc[salary_category.index.get_loc(x)]['category'])
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
Before: .apply(func)
After: .apply(func, meta=('Salary', 'object'))
我在这里错过了什么?
此处缺少的关键字参数是 meta
。 Dask 生成自动建议(在警告消息中):
After: .apply(func, meta=('Salary', 'object'))
由于这是一条警告消息,对于许多用例来说,指定 meta
是可选的,但如果您想明确说明计算变量的 dtype
可能会很有用。
运行 下面的代码片段不应生成警告消息:
# extracted your code into `func` for readability only
func = lambda x: salary_category.iloc[salary_category.index.get_loc(x)]['category']
sal_ddf['Category'] = sal_ddf['Salary'].apply(func, meta=('Salary', 'object'))
有关详细信息,此 link 可能有用:meta。
我正在使用 Dask 执行以下操作。
import dask.dataframe as dd
import pandas as pd
salary_df = pd.DataFrame({"Salary":[10000, 50000, 25000, 30000, 7000]})
salary_category = pd.DataFrame({"Hi":[5000, 20000, 25000, 30000, 90000],
"Low":[0, 5001, 20001, 25001, 30001],
"category":["Very Poor", "Poor", "Medium", "Rich", "Super Rich" ]
})
sal_ddf = dd.from_pandas(salary_df, npartitions=10)
salary_category.index = pd.IntervalIndex.from_arrays(salary_category['Low'],salary_category['Hi'],closed='both')
sal_ddf['Category'] = sal_ddf['Salary'].apply(lambda x : salary_category.iloc[salary_category.index.get_loc(x)]['category'])
我确实得到了结果,但下面一行有警告
sal_ddf['Category'] = sal_ddf['Salary'].apply(lambda x : salary_category.iloc[salary_category.index.get_loc(x)]['category'])
You did not provide metadata, so Dask is running your function on a small dataset to guess output types. It is possible that Dask will guess incorrectly.
To provide an explicit output types or to silence this message, please provide the `meta=` keyword, as described in the map or apply function that you are using.
Before: .apply(func)
After: .apply(func, meta=('Salary', 'object'))
我在这里错过了什么?
此处缺少的关键字参数是 meta
。 Dask 生成自动建议(在警告消息中):
After: .apply(func, meta=('Salary', 'object'))
由于这是一条警告消息,对于许多用例来说,指定 meta
是可选的,但如果您想明确说明计算变量的 dtype
可能会很有用。
运行 下面的代码片段不应生成警告消息:
# extracted your code into `func` for readability only
func = lambda x: salary_category.iloc[salary_category.index.get_loc(x)]['category']
sal_ddf['Category'] = sal_ddf['Salary'].apply(func, meta=('Salary', 'object'))
有关详细信息,此 link 可能有用:meta。