生成指向博客的永久链接 post 印地语 PHP

Generate permalink to a blog post Hindi PHP

我有一种形式,其中从用户那里获取以下输入:

我正在将博客标题转换为小写并将空格替换为破折号 (-) 并 将其存储在 Permalink to access blog 中。
下面是处理这个操作的代码:

setlocale(LC_ALL, 'en_US.UTF8');

function toAscii($str, $replace=array(), $delimiter='-') {
  if( !empty($replace) ) {
     $str = str_replace((array)$replace, ' ', $str);
  }
     $clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str);
     $clean = preg_replace("/[^a-zA-Z0-9\/_|+ -]/", '', $clean);
     $clean = strtolower(trim($clean, '-'));
     $clean = preg_replace("/[\/_|+ -]+/", $delimiter, $clean);
     return $clean;
}    

$prmlkn = toAscii($blog_headline, $replace=array(), $delimiter='-');

这段代码在 Blog headline 是英文之前一切正常。但是,如果用户输入 Hindi,那么我只会得到 -,因为永久链接意味着它无法识别 印地语 POST 值。

发生这种情况是因为印地语使用 UTF-8 中的扩展字符集,而您正在转换为仅提供拉丁字符的 ASCII,因此:

$str = "नमस्ते"
$clean = iconv('UTF-8', 'ASCII//TRANSLIT', $str); // clean is an empty string ""

根据rfc3986

  1. Characters

...

The ABNF notation defines its terminal values to be non-negative
integers (codepoints) based on the US-ASCII coded character set
[ASCII]. Because a URI is a sequence of characters, we must invert
that relation in order to understand the URI syntax. Therefore, the

integer values used by the ABNF must be mapped back to their
corresponding characters via US-ASCII in order to complete the syntax rules.

A URI is composed from a limited set of characters consisting of
digits, letters, and a few graphic symbols. A reserved subset of
those characters may be used to delimit syntax components within a
URI while the remaining characters, including both the unreserved set and those reserved characters not acting as delimiters, define each
component's identifying data.

您最好使用 urlencode(),但请注意,这可能会产生非常丑陋且冗长的永久链接

$str = "नमस्ते hello";
$clean = urlencode("$str");
printf("%s",$clean);

会导致有效但丑陋:

%E0%A4%A8%E0%A4%AE%E0%A4%B8%E0%A5%8D%E0%A4%A4%E0%A5%87+hello