PHP substr 中断表情符号

PHP substr breaks emoji

我只需要从用户提交的段落中提取 30 个字符。如果第 30 个字符是表情符号,则输出显示问号。如何避免破坏表情符号?

echo substr("Hello world Hello world Hell ", 0, 30);

输出:Hello world Hello world Hell

另外,当使用json_encode到return输出时,输出为空白。

$myvariable = array();
$myvariable['hello'] = substr("Hello world Hello world Hell ", 0, 30);
echo json_encode($myvariable);
 <meta charset="ISO-8859-1"> 

OR

function entities( $string ) {
    $stringBuilder = "";
    $offset = 0;

    if ( empty( $string ) ) {
        return "";
    }

    while ( $offset >= 0 ) {
        $decValue = ordutf8( $string, $offset );
        $char = unichr($decValue);

        $htmlEntited = htmlentities( $char );
        if( $char != $htmlEntited ){
            $stringBuilder .= $htmlEntited;
        } elseif( $decValue >= 128 ){
            $stringBuilder .= "&#" . $decValue . ";";
        } else {
            $stringBuilder .= $char;
        }
    }

    return $stringBuilder;
}

// source - http://php.net/manual/en/function.ord.php#109812
function ordutf8($string, &$offset) {
    $code = ord(substr($string, $offset,1));
    if ($code >= 128) {        //otherwise 0xxxxxxx
        if ($code < 224) $bytesnumber = 2;                //110xxxxx
        else if ($code < 240) $bytesnumber = 3;        //1110xxxx
        else if ($code < 248) $bytesnumber = 4;    //11110xxx
        $codetemp = $code - 192 - ($bytesnumber > 2 ? 32 : 0) - ($bytesnumber > 3 ? 16 : 0);
        for ($i = 2; $i <= $bytesnumber; $i++) {
            $offset ++;
            $code2 = ord(substr($string, $offset, 1)) - 128;        //10xxxxxx
            $codetemp = $codetemp*64 + $code2;
        }
        $code = $codetemp;
    }
    $offset += 1;
    if ($offset >= strlen($string)) $offset = -1;
    return $code;
}

// source - http://php.net/manual/en/function.chr.php#88611
function unichr($u) {
    return mb_convert_encoding('&#' . intval($u) . ';', 'UTF-8', 'HTML-ENTITIES');
}

/* ---- */

var_dump( entities( "&" ) ) . "\n";
var_dump( entities( "<" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "☚" ) ) . "\n";
var_dump( entities( "" ) ) . "\n";
var_dump( entities( "A" ) ) . "\n";
var_dump( entities( "Hello  world" ) ) . "\n";
var_dump( entities( "this & that " ) ) . "\n";
$first = preg_replace_callback('/[\x{80}-\x{10FFFF}]/u', function ($m) {
    $char = current($m);
    $utf = iconv('UTF-8', 'UCS-4', $char);
    return sprintf("&#x%s;", ltrim(strtoupper(bin2hex($utf)), "0"));
}, $string);

输出

string 'Fran&#xE7;ais' (length=13)

echo json_decode('"\uD83D\uDE00"');

我认为最简单的解决方案是使用 mb_substr

Performs a multi-byte safe substr() operation based on number of characters.

php > $myvariable = array();
php > $myvariable['hello'] = mb_substr("Hello world Hello world Hell ", 0, 30);
php > var_dump($myvariable);
array(1) {
  ["hello"]=>
  string(33) "Hello world Hello world Hell "
}
php > echo json_encode($myvariable);
{"hello":"Hello world Hello world Hell\ud83d\ude04 "}
php >