php 仅来自 mb 字符串的第一个单词

Question

我使用了 preg_match 但它返回的 pdf 是英文的，这就是为什么。

但我只想练马春日町Ⅳ

有什么方法可以检测 mb 字符串吗？

<?php 
// Initialize a sentence to a variable 
$sentence = '練馬春日町Ⅳ　清掃レポート.pdf'; 

// Use preg_match() function to get the 
// first word of a string 
preg_match('/\b\w+\b/i', $sentence, $result);  

// Display result 
echo "The first word of string is: ".$result[0]; 

?>

FIDDLE

Answer 1

为了使您的代码正常工作，您只需将 u 标志添加到正则表达式，以便它匹配 unicode 字符：

preg_match('/^\w+/iu', $sentence, $result);  
echo "\nThe first word of string is: ".$result[0];

输出：

The first word of string is: 練馬春日町Ⅳ

请注意，由于您想要第一个单词，您可以简单地使用 ^ 锚定您的正则表达式，第二个 \b 不是必需的，因为 \w+ 将匹配尽可能多的单词字符可以，即直到它到达第一个单词中断。

或者，您可以使用 mb_split with a regex of \p{Z} 匹配任何 unicode 空格或不可见分隔符：

$sentence = '練馬春日町Ⅳ　清掃レポート.pdf'; 
$first_word = mb_split('\p{Z}', $sentence);
echo $first_word[0];

输出：

練馬春日町Ⅳ

Demo on 3v4l.org

php 仅来自 mb 字符串的第一个单词

php first word only from mb string

php

preg-match

mbstring

preg-split