PHP mb_strpos 希腊字符串失败

Question

我有一个文件浏览器，我正在尝试查找哪些文件名包含给定的 query.The 代码如下：

$query = (isset($_POST['s']))? mb_strtolower($_POST['s'],'UTF-8') : ''; 
$res = opendir($dir); 
    while(false!== ($file = readdir($res))) { 
if(mb_strpos(mb_strtolower($file,'UTF-8'),mb_strtolower($query,'UTF-8'),0,'UTF-8')!== false) {
    echo $file;
}}

对于英语单词，这很好用，但是当文本是希腊语时，结果并不像预期的那样，这意味着它适用于一些但不是所有的希腊语 words.Could 谁能帮我解决这个问题？

Answer 1

字形可能呈现相同或相似，但它们的表示方式不同。例如：

ά在这里表示为Unicode Character 'GREEK SMALL LETTER ALPHA WITH TONOS' (U+03AC)
ά在这里表示为Unicode Character 'GREEK SMALL LETTER ALPHA' (U+03B1) followed by Unicode Character 'COMBINING ACUTE ACCENT' (U+0301)

这些是直接从复制的。

为了比较它们，您应该首先使用 normalizer_normalize() on both strings to obtain them in their normalized forms。使用哪种 type 规范化形式最终取决于您。有四个：

NFD（规范分解）
NFC（典型分解，然后是典型组合）
NFKD（兼容性分解）
NFKC（兼容性分解，然后是规范组合）

因为这个规范化完全在内部使用，只是忽略了 NFC 和 NFKC，所以没有必要重新组合。这使您可以选择 NFD 或 NFKD - 规范的或兼容的。这些名称为您提供了一些线索，让您了解它们对等价性的严格程度。

1.1 Canonical and Compatibility Equivalence:

Canonical equivalence is a fundamental equivalency between characters or sequences of characters that represent the same abstract character, and when correctly displayed should always have the same visual appearance and behavior.

Compatibility equivalence is a weaker equivalence between characters or sequences of characters that represent the same abstract character, but may have a different visual appearance or behavior.

对于搜索，我会选择后者。

示例：

$foo = "παράρτημα";
$bar = "παράρτημα";
var_dump($foo === $bar);
var_dump(
    normalizer_normalize($foo, Normalizer::FORM_KD) ===
    normalizer_normalize($bar, Normalizer::FORM_KD)
);

输出：

bool(false)
bool(true)

PHP mb_strpos 希腊字符串失败

PHP mb_strpos fails for Greek strings

php

strpos

示例：

输出：