使用 Amazon Sagemaker ground truth 自动标记文本数据
Auto labeling for Text Data with Amazon Sagemaker ground truth
ground truth 进行自动标记所需的最少文本行数是多少?我有包含 1000 行的文本文件,这是否足以开始使用 sagemaker ground truth 进行自动标记?
根据文档,
You should use automated data labeling only on large datasets. The neural networks used with active learning require a significant amount of data for every new dataset. With larger datasets there is more potential to automatically label the data and therefore reduce the total cost of labeling. We recommend that you use thousands of data objects when using automated data labeling. You must use at least 5,000 data objects
https://docs.aws.amazon.com/sagemaker/latest/dg/sms-automated-labeling.html
我是 Amazon SageMaker Ground Truth 团队的产品经理,我很乐意帮助您解决这个问题。最低系统要求是 1,000 个对象。在文本分类实践中,只有当您拥有 2,000 到 3,000 个文本对象时,我们通常才会看到有意义的结果(自动标记的数据百分比)。请记住,性能是可变的,取决于您的数据集和任务的复杂性。
ground truth 进行自动标记所需的最少文本行数是多少?我有包含 1000 行的文本文件,这是否足以开始使用 sagemaker ground truth 进行自动标记?
根据文档,
You should use automated data labeling only on large datasets. The neural networks used with active learning require a significant amount of data for every new dataset. With larger datasets there is more potential to automatically label the data and therefore reduce the total cost of labeling. We recommend that you use thousands of data objects when using automated data labeling. You must use at least 5,000 data objects
https://docs.aws.amazon.com/sagemaker/latest/dg/sms-automated-labeling.html
我是 Amazon SageMaker Ground Truth 团队的产品经理,我很乐意帮助您解决这个问题。最低系统要求是 1,000 个对象。在文本分类实践中,只有当您拥有 2,000 到 3,000 个文本对象时,我们通常才会看到有意义的结果(自动标记的数据百分比)。请记住,性能是可变的,取决于您的数据集和任务的复杂性。