Drupal:合并具有大量重复项的分类术语

Drupal: Merging Taxonomy Terms with Massive Duplicates

我有一个用于研究目的的数据库。不幸的是,在这项研究中,算法被允许进行的时间太长,这无意中创建了重复的分类术语,而不是为术语的第一个实例重新使用原始 TID。

为了更正此问题,尝试使用 "term_merge" 和 "taxonomy_manager" 模块。 "term_merge" 提供了一个用于删除重复项的界面,并且它能够设置一次加载的术语数量限制,以防止耗尽数据库服务器的内存限制。然而,对于我的用例,我什至无法加载位于 /admin/structure/taxonomy/[My-Vocabulary]/merge 的配置屏幕,更不用说在 /admin/structure/taxonomy/[My-Vocabulary] 找到的重复界面]/merge/duplicates,因为尽管上述限制设置为 1024M,但两者都耗尽了内存限制。

为了解决这个问题,我编写了一个自定义模块,它调用 term_merge 模块中的 term_merge 函数。由于该项目中只有一个节点包使用了相关的分类词汇表,因此我能够安全地编写自己的逻辑来合并重复的术语,而无需使用 term_merge 模块提供的功能,但是我想利用它,因为它是为此目的而设计的,并且理论上可以实现更安全的过程。

我的模块提供了一个页面回调以及用于获取引用重复分类术语的 TID 列表的逻辑。这是包含对 term_merge 函数的调用的代码:

//Use first element, with lowest TID value, as the 'trunk'
// which all other terms will be merged into

$trunk = $tids[0];

//Remove first element from branch array, to ensure the trunk 
//is not being merged into itself

array_shift($tids);

//Set the merge settings array, similarly to the default values 
//which are given in _term_merge_batch_process of term_merge.batch.inc

$merge_settings = array(
  'term_branch_keep' => FALSE,
  'merge_fields' => array(),
  'keep_only_unique' => TRUE,
  'redirect' => -1,
  'synonyms' => array(),
);

term_merge($tids, $trunk, $merge_settings);

这不会导致任何合并条款,也不会在 Watchdog 或网络服务器日志中提供任何错误或通知。

我也曾尝试为每个要合并的重复 TID 调用 term_merge,而不是将 TID 数组作为一个整体使用。

对于如何最好地以编程方式使用 term_merge 函数的任何意见,或者允许我从大型数据库中删除许多重复术语的替代方法,我将不胜感激,其中某些术语有数千个重复项。

作为参考,以下注释提供了有关在 term_merge 中采用的参数的信息,可在贡献的 term_merge 模块的 term_merge.module 中找到:

/**
 * Merge terms one into another using batch API.
 *
 * @param array $term_branch
 *   A single term tid or an array of term tids to be merged, aka term branches
 * @param int $term_trunk
 *   The tid of the term to merge term branches into, aka term trunk
 * @param array $merge_settings
 *   Array of settings that control how merging should happen.     Currently
 *   supported settings are:
 *     - term_branch_keep: (bool) Whether the term branches should not be
 *       deleted, also known as "merge only occurrences" option
 *     - merge_fields: (array) Array of field names whose values should be
 *       merged into the values of corresponding fields of term trunk (until
 *       each field's cardinality limit is reached)
 *     - keep_only_unique: (bool) Whether after merging within one field only
 *       unique taxonomy term references should be kept in other entities. If
 *       before merging your entity had 2 values in its taxonomy term reference
 *       field and one was pointing to term branch while another was pointing to
 *       term trunk, after merging you will end up having your entity
 *       referencing to the same term trunk twice. If you pass TRUE in this
 *       parameter, only a single reference will be stored in your entity after
 *       merging
 *     - redirect: (int) HTTP code for redirect from $term_branch to
 *       $term_trunk, 0 stands for the default redirect defined in Redirect
 *       module. Use constant TERM_MERGE_NO_REDIRECT to denote not creating any
 *       HTTP redirect. Note: this parameter requires Redirect module enabled,
 *       otherwise it will be disregarded
 *     - synonyms: (array) Array of field names of trunk term into which branch
 *       terms should be added as synonyms (until each field's cardinality limit
 *       is reached). Note: this parameter requires Synonyms module enabled,
 *       otherwise it will be disregarded
 *     - step: (int) How many term branches to merge per script run in batch. If
 *       you are hitting time or memory limits, decrease this parameter
 */

看来,由于函数 term_merge 的开发意图是在函数中用于处理表单提交,因此我的自定义模块以 batch_process 无法做到的方式使用它被调用。

显式调用以下解决此问题:

batch_process()

不需要向函数传递参数。