防止AWS glue爬虫创建多张表

Prevent AWS glue crawler to create multiple tables

aws-glue

我创建了一个 glue 爬虫，它可以爬取数据并在 glue 数据目录中创建 table。假设我有一个 CSV 文件 (file1.csv)，它的架构类似于 (id,name)，一旦爬虫作业执行完成，它就会创建 Athena table (crawler_file) 有 2 列 (id,name)。现在有一个新文件 (file2.csv)，其架构类似于 (id、name、roll_no)。目前，当胶水爬虫正在执行时，它正在创建一个新的 Athena table (crawler_file_111)，其模式为 (id、name、roll_no)。我可以配置爬虫，而不是创建新的 table 来更新 table 的现有架构吗？在这种情况下，与其创建新的 Athena table (crawler_file_111)，不如更新现有的 Athena table (crawler_file)。我能以某种方式实现这种情况吗？

在编辑抓取工具页面中，请启用以下选项。这一定对你有用。

防止AWS glue爬虫创建多张表

Prevent AWS glue crawler to create multiple tables

aws-glue