按非等值过滤多维数组以排除重复记录
Filtering multidimensional arrays by non-equal values to exclude duplicated records
下面是来自两个不同提要的两个数组,它们共享不同的 ID。因此,我必须依赖 'BriefTitle':我可以通过 'BriefTitle' 和其他数据(例如 [LocationCountry]、[StartDate]、[Condition])判断这是同一条记录。我想将 'BriefTitle' 的 substr 与其他 'BriefTitle' 记录进行比较以过滤掉重复项,因为它们彼此包含。我不是在寻找完全匹配,这是我在这里找到的大多数解决方案。
我喜欢 sevavietl/mickmackusa 提出的简短解决方案:
$result = array_reverse(array_values(array_column(
array_reverse($data),
null,
'BriefTitle'
)));
然而,我的 'BriefTitle' 是一个数组(似乎不适用于 array_column),我不确定如何将 substr 函数应用于上述解决方案。
一些快速笔记:
- 幸运的是,[BriefTitle][0] 始终是要比较的值
- 如果可能的话,我只想获取数据集的第一个实例,拒绝任何后续重复项。
有什么想法我应该如何处理这个问题?数组:
[0] => Array
(
[Rank] => 422
[id] => Array
(
[0] => 152091
)
[Condition] => Array
(
[0] => Depression
[1] => Ketamine
)
[BriefTitle] => Array
(
[0] => Positron Emission Tomography Assessment of Ketamine Binding of the Serotonin Transporter
)
[LocationCountry] => Array
(
[0] => Austria
)
[StartDate] => Array
(
[0] => May 5, 2016
)
[LastUpdatePostDate] => Array
(
[0] => October 15, 2018
)
[Entheogen] => ketamine
[Source] => clinicaltrials.gov
)
[1] => Array
(
[Rank] => 6673
[id] => Array
(
[0] => YSBSZ18291
)
[Condition] => Array
(
[0] => Depressive Disorder
[1] => Ketamine
)
[BriefTitle] => Array
(
[0] => Positron Emission Tomography assessment of Ketamine Binding of the Serotonin Transporter and its Relevance for Rapid Antidepressant Response
[1] => Die Rolle des Serotonintransporters bei der akuten antidepressiven Wirkung von Ketamin, untersucht mit Positronen-Emissions-Tomographie
)
[LocationCountry] => Array
(
[0] => Austria
)
[StartDate] => Array
(
[0] => 2016 05 01
)
[LastUpdatePostDate] => Array
(
[0] => 2018 10 15
)
[Entheogen] => ketamine
[Source] => clinicaltrialsregister.eu
)
不幸的是,由于您的数据的性质(匹配的字符串可能是其他字符串的子字符串,大小写不同),唯一真正的选择是 brute-force 这个。遍历数组,边走边存储标题并检查当前标题是否匹配其中任何一个:
$result = array();
$brieftitles = array();
foreach ($array as $arr) {
$foundtitle = false;
$title = $arr['BriefTitle'][0];
foreach ($brieftitles as $btitle) {
$foundtitle = (stripos($title, $btitle) !== false) || (stripos($btitle, $title) !== false);
if ($foundtitle) break;
}
if (!$foundtitle) {
$result[] = $arr;
$brieftitles[] = $arr['BriefTitle'][0];
}
}
print_r($result);
下面是来自两个不同提要的两个数组,它们共享不同的 ID。因此,我必须依赖 'BriefTitle':我可以通过 'BriefTitle' 和其他数据(例如 [LocationCountry]、[StartDate]、[Condition])判断这是同一条记录。我想将 'BriefTitle' 的 substr 与其他 'BriefTitle' 记录进行比较以过滤掉重复项,因为它们彼此包含。我不是在寻找完全匹配,这是我在这里找到的大多数解决方案。
我喜欢 sevavietl/mickmackusa 提出的简短解决方案:
$result = array_reverse(array_values(array_column(
array_reverse($data),
null,
'BriefTitle'
)));
然而,我的 'BriefTitle' 是一个数组(似乎不适用于 array_column),我不确定如何将 substr 函数应用于上述解决方案。
一些快速笔记:
- 幸运的是,[BriefTitle][0] 始终是要比较的值
- 如果可能的话,我只想获取数据集的第一个实例,拒绝任何后续重复项。
有什么想法我应该如何处理这个问题?数组:
[0] => Array
(
[Rank] => 422
[id] => Array
(
[0] => 152091
)
[Condition] => Array
(
[0] => Depression
[1] => Ketamine
)
[BriefTitle] => Array
(
[0] => Positron Emission Tomography Assessment of Ketamine Binding of the Serotonin Transporter
)
[LocationCountry] => Array
(
[0] => Austria
)
[StartDate] => Array
(
[0] => May 5, 2016
)
[LastUpdatePostDate] => Array
(
[0] => October 15, 2018
)
[Entheogen] => ketamine
[Source] => clinicaltrials.gov
)
[1] => Array
(
[Rank] => 6673
[id] => Array
(
[0] => YSBSZ18291
)
[Condition] => Array
(
[0] => Depressive Disorder
[1] => Ketamine
)
[BriefTitle] => Array
(
[0] => Positron Emission Tomography assessment of Ketamine Binding of the Serotonin Transporter and its Relevance for Rapid Antidepressant Response
[1] => Die Rolle des Serotonintransporters bei der akuten antidepressiven Wirkung von Ketamin, untersucht mit Positronen-Emissions-Tomographie
)
[LocationCountry] => Array
(
[0] => Austria
)
[StartDate] => Array
(
[0] => 2016 05 01
)
[LastUpdatePostDate] => Array
(
[0] => 2018 10 15
)
[Entheogen] => ketamine
[Source] => clinicaltrialsregister.eu
)
不幸的是,由于您的数据的性质(匹配的字符串可能是其他字符串的子字符串,大小写不同),唯一真正的选择是 brute-force 这个。遍历数组,边走边存储标题并检查当前标题是否匹配其中任何一个:
$result = array();
$brieftitles = array();
foreach ($array as $arr) {
$foundtitle = false;
$title = $arr['BriefTitle'][0];
foreach ($brieftitles as $btitle) {
$foundtitle = (stripos($title, $btitle) !== false) || (stripos($btitle, $title) !== false);
if ($foundtitle) break;
}
if (!$foundtitle) {
$result[] = $arr;
$brieftitles[] = $arr['BriefTitle'][0];
}
}
print_r($result);