在 dedupe 库中增加 max_components 变量
Increase max_components variable in dedupe library
如何增加 max_components
变量的默认值?
默认情况下 max_components
设置为 30000。我需要增加此限制,因为每次我执行重复数据删除(使用相同的数据集)时都会得到不同的结果。
我认为我的数据中的簇总数大于 30000。
回答来自Github
Issue in dedupe github Increase max_components = 30000
If you are getting different results using same saved settings file,
then what you reporting is a bug. If you are getting different results
from different training data (or even the same training data), that's
expected as at various points dedupe uses a random sample to learn
good rules.
In either case, I doubt that max_components is related. But, if you
want to change it, fork the code and change it.
如何增加 max_components
变量的默认值?
默认情况下 max_components
设置为 30000。我需要增加此限制,因为每次我执行重复数据删除(使用相同的数据集)时都会得到不同的结果。
我认为我的数据中的簇总数大于 30000。
回答来自Github
Issue in dedupe github Increase max_components = 30000
If you are getting different results using same saved settings file, then what you reporting is a bug. If you are getting different results from different training data (or even the same training data), that's expected as at various points dedupe uses a random sample to learn good rules.
In either case, I doubt that max_components is related. But, if you want to change it, fork the code and change it.