PHP 从 utf8_general_ci 到 latin1_swedish_ci 的转换

Question

我从一个网站收到大量数据，所有这些字符串值都需要添加到我们的数据库中。

在插入数据库期间 SQL 有时会抛出以下错误：

Warning:  PDOStatement::execute(): SQLSTATE[HY000]: General error: 1267 Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE)

数据库表实际上设置为使用 Latin1。

使用 json_encode() 对我的值进行编码后，我发现了此错误的原因。字符串之间代表一些特殊字符的UTF序列需要转换成它们的实际值：

编码字符串： candidate\u00e2\u0080\u0099s个别情况

序列 \u00e2\u0080\u0099 表示此示例中的 '。

无论如何，只有几个不同的序列，我也知道我 want/need 可以用来替换它们的值，但我正在为转换而苦苦挣扎。

我尝试了几种方法，但 none 成功了，

使用 str_replace:

str_replace('\u00e2\u0080\u0099', '\'', ($string));

没有更改字符串中的任何内容

使用 mb_functions:

$encodedStr = mb_convert_encoding($string, 'ASCII')

给我留下了一些神秘的东西？？而不是 UTF 序列，但它不会引发数据库错误，但它仍然不是我需要的。

使用 preg_replace:

preg_replace('/\u00e2\u0080\u0099/', '\'', $string)

抛出错误：PCRE 不支持偏移量 1

处的 \L、\l、\N{name}、\U 或 \u

我已经尝试了更多选项，但是当我开始强制解决这个问题时，我想到了三个选项，但我无法弄清楚为什么这些功能，尤其是 str_replace 无法按预期方式工作.

Answer 1

我终于解决了这个问题。以防万一有人遇到同样的问题。对我有用的解决方案发布在

I have a string with "\u00a0", and I need to replace it with "" str_replace fails

private function convert($string) {
    /* Strings to remove:    
     *      \u00a0 = 
     *      \u00e2\u0080\u0099 = '
     *      
     */
    $string = str_replace(chr(194).chr(160), '', $string);  //removes \u00a0
    $string = str_replace('â', '', $string);  //removes \u00e2
    $string = str_replace(chr(194).chr(128).chr(194).chr(153), '\'', $string);  //removes \u0080\u0099

    return $string;
    }

PHP 从 utf8_general_ci 到 latin1_swedish_ci 的转换

PHP conversion from utf8_general_ci to latin1_swedish_ci

php

string

preg-replace

str-replace

mb-convert-encoding