Drupal:合并具有大量重复项的分类术语
Drupal: Merging Taxonomy Terms with Massive Duplicates
我有一个用于研究目的的数据库。不幸的是,在这项研究中,算法被允许进行的时间太长,这无意中创建了重复的分类术语,而不是为术语的第一个实例重新使用原始 TID。
为了更正此问题,尝试使用 "term_merge" 和 "taxonomy_manager" 模块。 "term_merge" 提供了一个用于删除重复项的界面,并且它能够设置一次加载的术语数量限制,以防止耗尽数据库服务器的内存限制。然而,对于我的用例,我什至无法加载位于 /admin/structure/taxonomy/[My-Vocabulary]/merge 的配置屏幕,更不用说在 /admin/structure/taxonomy/[My-Vocabulary] 找到的重复界面]/merge/duplicates,因为尽管上述限制设置为 1024M,但两者都耗尽了内存限制。
为了解决这个问题,我编写了一个自定义模块,它调用 term_merge 模块中的 term_merge 函数。由于该项目中只有一个节点包使用了相关的分类词汇表,因此我能够安全地编写自己的逻辑来合并重复的术语,而无需使用 term_merge 模块提供的功能,但是我想利用它,因为它是为此目的而设计的,并且理论上可以实现更安全的过程。
我的模块提供了一个页面回调以及用于获取引用重复分类术语的 TID 列表的逻辑。这是包含对 term_merge 函数的调用的代码:
//Use first element, with lowest TID value, as the 'trunk'
// which all other terms will be merged into
$trunk = $tids[0];
//Remove first element from branch array, to ensure the trunk
//is not being merged into itself
array_shift($tids);
//Set the merge settings array, similarly to the default values
//which are given in _term_merge_batch_process of term_merge.batch.inc
$merge_settings = array(
'term_branch_keep' => FALSE,
'merge_fields' => array(),
'keep_only_unique' => TRUE,
'redirect' => -1,
'synonyms' => array(),
);
term_merge($tids, $trunk, $merge_settings);
这不会导致任何合并条款,也不会在 Watchdog 或网络服务器日志中提供任何错误或通知。
我也曾尝试为每个要合并的重复 TID 调用 term_merge,而不是将 TID 数组作为一个整体使用。
对于如何最好地以编程方式使用 term_merge 函数的任何意见,或者允许我从大型数据库中删除许多重复术语的替代方法,我将不胜感激,其中某些术语有数千个重复项。
作为参考,以下注释提供了有关在 term_merge 中采用的参数的信息,可在贡献的 term_merge 模块的 term_merge.module 中找到:
/**
* Merge terms one into another using batch API.
*
* @param array $term_branch
* A single term tid or an array of term tids to be merged, aka term branches
* @param int $term_trunk
* The tid of the term to merge term branches into, aka term trunk
* @param array $merge_settings
* Array of settings that control how merging should happen. Currently
* supported settings are:
* - term_branch_keep: (bool) Whether the term branches should not be
* deleted, also known as "merge only occurrences" option
* - merge_fields: (array) Array of field names whose values should be
* merged into the values of corresponding fields of term trunk (until
* each field's cardinality limit is reached)
* - keep_only_unique: (bool) Whether after merging within one field only
* unique taxonomy term references should be kept in other entities. If
* before merging your entity had 2 values in its taxonomy term reference
* field and one was pointing to term branch while another was pointing to
* term trunk, after merging you will end up having your entity
* referencing to the same term trunk twice. If you pass TRUE in this
* parameter, only a single reference will be stored in your entity after
* merging
* - redirect: (int) HTTP code for redirect from $term_branch to
* $term_trunk, 0 stands for the default redirect defined in Redirect
* module. Use constant TERM_MERGE_NO_REDIRECT to denote not creating any
* HTTP redirect. Note: this parameter requires Redirect module enabled,
* otherwise it will be disregarded
* - synonyms: (array) Array of field names of trunk term into which branch
* terms should be added as synonyms (until each field's cardinality limit
* is reached). Note: this parameter requires Synonyms module enabled,
* otherwise it will be disregarded
* - step: (int) How many term branches to merge per script run in batch. If
* you are hitting time or memory limits, decrease this parameter
*/
看来,由于函数 term_merge 的开发意图是在函数中用于处理表单提交,因此我的自定义模块以 batch_process 无法做到的方式使用它被调用。
显式调用以下解决此问题:
batch_process()
不需要向函数传递参数。
我有一个用于研究目的的数据库。不幸的是,在这项研究中,算法被允许进行的时间太长,这无意中创建了重复的分类术语,而不是为术语的第一个实例重新使用原始 TID。
为了更正此问题,尝试使用 "term_merge" 和 "taxonomy_manager" 模块。 "term_merge" 提供了一个用于删除重复项的界面,并且它能够设置一次加载的术语数量限制,以防止耗尽数据库服务器的内存限制。然而,对于我的用例,我什至无法加载位于 /admin/structure/taxonomy/[My-Vocabulary]/merge 的配置屏幕,更不用说在 /admin/structure/taxonomy/[My-Vocabulary] 找到的重复界面]/merge/duplicates,因为尽管上述限制设置为 1024M,但两者都耗尽了内存限制。
为了解决这个问题,我编写了一个自定义模块,它调用 term_merge 模块中的 term_merge 函数。由于该项目中只有一个节点包使用了相关的分类词汇表,因此我能够安全地编写自己的逻辑来合并重复的术语,而无需使用 term_merge 模块提供的功能,但是我想利用它,因为它是为此目的而设计的,并且理论上可以实现更安全的过程。
我的模块提供了一个页面回调以及用于获取引用重复分类术语的 TID 列表的逻辑。这是包含对 term_merge 函数的调用的代码:
//Use first element, with lowest TID value, as the 'trunk'
// which all other terms will be merged into
$trunk = $tids[0];
//Remove first element from branch array, to ensure the trunk
//is not being merged into itself
array_shift($tids);
//Set the merge settings array, similarly to the default values
//which are given in _term_merge_batch_process of term_merge.batch.inc
$merge_settings = array(
'term_branch_keep' => FALSE,
'merge_fields' => array(),
'keep_only_unique' => TRUE,
'redirect' => -1,
'synonyms' => array(),
);
term_merge($tids, $trunk, $merge_settings);
这不会导致任何合并条款,也不会在 Watchdog 或网络服务器日志中提供任何错误或通知。
我也曾尝试为每个要合并的重复 TID 调用 term_merge,而不是将 TID 数组作为一个整体使用。
对于如何最好地以编程方式使用 term_merge 函数的任何意见,或者允许我从大型数据库中删除许多重复术语的替代方法,我将不胜感激,其中某些术语有数千个重复项。
作为参考,以下注释提供了有关在 term_merge 中采用的参数的信息,可在贡献的 term_merge 模块的 term_merge.module 中找到:
/**
* Merge terms one into another using batch API.
*
* @param array $term_branch
* A single term tid or an array of term tids to be merged, aka term branches
* @param int $term_trunk
* The tid of the term to merge term branches into, aka term trunk
* @param array $merge_settings
* Array of settings that control how merging should happen. Currently
* supported settings are:
* - term_branch_keep: (bool) Whether the term branches should not be
* deleted, also known as "merge only occurrences" option
* - merge_fields: (array) Array of field names whose values should be
* merged into the values of corresponding fields of term trunk (until
* each field's cardinality limit is reached)
* - keep_only_unique: (bool) Whether after merging within one field only
* unique taxonomy term references should be kept in other entities. If
* before merging your entity had 2 values in its taxonomy term reference
* field and one was pointing to term branch while another was pointing to
* term trunk, after merging you will end up having your entity
* referencing to the same term trunk twice. If you pass TRUE in this
* parameter, only a single reference will be stored in your entity after
* merging
* - redirect: (int) HTTP code for redirect from $term_branch to
* $term_trunk, 0 stands for the default redirect defined in Redirect
* module. Use constant TERM_MERGE_NO_REDIRECT to denote not creating any
* HTTP redirect. Note: this parameter requires Redirect module enabled,
* otherwise it will be disregarded
* - synonyms: (array) Array of field names of trunk term into which branch
* terms should be added as synonyms (until each field's cardinality limit
* is reached). Note: this parameter requires Synonyms module enabled,
* otherwise it will be disregarded
* - step: (int) How many term branches to merge per script run in batch. If
* you are hitting time or memory limits, decrease this parameter
*/
看来,由于函数 term_merge 的开发意图是在函数中用于处理表单提交,因此我的自定义模块以 batch_process 无法做到的方式使用它被调用。
显式调用以下解决此问题:
batch_process()
不需要向函数传递参数。