无法将 UTF-8 特殊字符正确写入 MySQL (PHP)
Can't get UTF-8 Special Chars to Correctly Write to MySQL (PHP)
我正在典型的 LAMP 堆栈(L = OS X)上从命令行创建一个 PHP 脚本 运行,我遇到了很多麻烦获取特殊字符以在数据库中正确记录。
此脚本递归扫描目录并将完整路径插入 MySQL 数据库 table。我对如何让特殊字符写入 MySQL 做了很多研究,但它们显示为 ?
个字符。
代码如下:
<?PHP
ini_set('default_charset', 'UTF-8');
$link = mysql_connect('localhost', '--USER--', '--PASSWORD--');
mysql_set_charset('utf8',$link);
if (!$link) {
die('Could not connect: ' . mysql_error());
}
if(!mysql_select_db("files")) {
die('Could not connect: ' . mysql_error());
}
mysql_query("SET NAMES utf8");
mysql_query("SET CHARACTER SET utf8");
function startsWith($haystack, $needle) {
return $needle === "" || strrpos($haystack, $needle, -strlen($haystack)) !== FALSE;
}
function getDirContents($dir, &$results = array()) {
$files = scandir($dir);
foreach($files as $key => $value) {
$path = realpath($dir.DIRECTORY_SEPARATOR.$value);
if(startsWith($path,'/Volumes/Macintosh HD/')) {
unset($files[$key]);
} else if(!is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
$results[] = $path;
$query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','0')";
mysql_query($query);
} else if(is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
getDirContents($path, $results);
$results[] = $path;
$query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','1')";
mysql_query($query);
}
}
return $results;
}
$directory='/Volumes';
$files=getDirContents($directory);
sort($files);
print_r($files);
?>
有问题的路径是:
/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo
注意 Jürgen
中的元音字符。当脚本打印数组中的所有文件时,ü
正确显示。
如果我在 PHP 脚本中添加一行来打印 mysql_query()
,将返回以下内容:
INSERT IGNORE INTO files (path,dir) VALUES ('/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo','0')
ü
再次正确显示。
从 MySQL 命令行客户端,我 SELECT
这个路径:
mysql> select * from files where path like '%susan%';
...以及响应:
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| ID | path | dir | google_id | md5 | deleted_local |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| 644990 | /Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Ju?rgen.dvdproj/Contents/PkgInfo | 0 | NULL | NULL | 0 |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
...注意 Jürgen 中的 ü
显示为 u?
(Ju?rgen)
我已努力确保:
- php.ini 的默认字符集为 UTF-8
- table 的默认字符集是 utf8
- DB 连接定义为 utf8 连接
我在该脚本的顶部附近添加了 phpinfo();
(在 ini_set()
之后),并从 CLI 中添加了 运行。 default_charset => UTF-8 => UTF-8
出现在响应中。
在脚本中连接数据库后,我添加了 echo mysql_client_encoding($link);
并且脚本打印了 utf8
。
另外,我运行:
mysql> show variables like 'char%';
回复:
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.24-osx10.8-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.05 sec)
那么,我做错了什么?
编辑 table 的结构是:
mysql> DESCRIBE files;
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| ID | int(11) unsigned | NO | PRI | NULL | auto_increment |
| path | varchar(510) | YES | UNI | NULL | |
| dir | enum('0','1') | YES | | 0 | |
| google_id | varchar(255) | YES | | NULL | |
| md5 | varchar(255) | YES | | NULL | |
| deleted_local | enum('0','1') | YES | | 0 | |
+---------------+------------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
另一个编辑:
mysql> show create table files;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| files | CREATE TABLE `files` (
`ID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`path` varchar(510) CHARACTER SET latin1 DEFAULT NULL,
`dir` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
`google_id` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`md5` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`deleted_local` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
PRIMARY KEY (`ID`),
UNIQUE KEY `path` (`path`)
) ENGINE=InnoDB AUTO_INCREMENT=961879 DEFAULT CHARSET=utf8 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
如您的第二次编辑所示,路径列具有 latin1 字符集,即使 table 默认为 utf8。也许您通过更改现有的 table?
进入了这种状态
尝试ALTER TABLE files MODIFY path VARCHAR(510) CHARACTER SET utf8;
1.set 数据库 table 字段的排序规则类型为 utf8_unicode_ci
2.change 在元标记中。
元 http-equiv="Content-Type" 内容="text/html; charset=UTF-8"
- 你可以使用 echo utf8_encode($value);在您的页面中。
我正在典型的 LAMP 堆栈(L = OS X)上从命令行创建一个 PHP 脚本 运行,我遇到了很多麻烦获取特殊字符以在数据库中正确记录。
此脚本递归扫描目录并将完整路径插入 MySQL 数据库 table。我对如何让特殊字符写入 MySQL 做了很多研究,但它们显示为 ?
个字符。
代码如下:
<?PHP
ini_set('default_charset', 'UTF-8');
$link = mysql_connect('localhost', '--USER--', '--PASSWORD--');
mysql_set_charset('utf8',$link);
if (!$link) {
die('Could not connect: ' . mysql_error());
}
if(!mysql_select_db("files")) {
die('Could not connect: ' . mysql_error());
}
mysql_query("SET NAMES utf8");
mysql_query("SET CHARACTER SET utf8");
function startsWith($haystack, $needle) {
return $needle === "" || strrpos($haystack, $needle, -strlen($haystack)) !== FALSE;
}
function getDirContents($dir, &$results = array()) {
$files = scandir($dir);
foreach($files as $key => $value) {
$path = realpath($dir.DIRECTORY_SEPARATOR.$value);
if(startsWith($path,'/Volumes/Macintosh HD/')) {
unset($files[$key]);
} else if(!is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
$results[] = $path;
$query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','0')";
mysql_query($query);
} else if(is_dir($path) && !startsWith($value,'.') && startsWith($path,'/Volumes/')) {
getDirContents($path, $results);
$results[] = $path;
$query="INSERT IGNORE INTO files (path,dir) VALUES ('$path','1')";
mysql_query($query);
}
}
return $results;
}
$directory='/Volumes';
$files=getDirContents($directory);
sort($files);
print_r($files);
?>
有问题的路径是:
/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo
注意 Jürgen
中的元音字符。当脚本打印数组中的所有文件时,ü
正确显示。
如果我在 PHP 脚本中添加一行来打印 mysql_query()
,将返回以下内容:
INSERT IGNORE INTO files (path,dir) VALUES ('/Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Jürgen.dvdproj/Contents/PkgInfo','0')
ü
再次正确显示。
从 MySQL 命令行客户端,我 SELECT
这个路径:
mysql> select * from files where path like '%susan%';
...以及响应:
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| ID | path | dir | google_id | md5 | deleted_local |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
| 644990 | /Volumes/Mac Stadium Shuttle 1/DIG2008060702/files/Susan-Ju?rgen.dvdproj/Contents/PkgInfo | 0 | NULL | NULL | 0 |
+--------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------+------+-----------+------+---------------+
...注意 Jürgen 中的 ü
显示为 u?
(Ju?rgen)
我已努力确保:
- php.ini 的默认字符集为 UTF-8
- table 的默认字符集是 utf8
- DB 连接定义为 utf8 连接
我在该脚本的顶部附近添加了 phpinfo();
(在 ini_set()
之后),并从 CLI 中添加了 运行。 default_charset => UTF-8 => UTF-8
出现在响应中。
在脚本中连接数据库后,我添加了 echo mysql_client_encoding($link);
并且脚本打印了 utf8
。
另外,我运行:
mysql> show variables like 'char%';
回复:
+--------------------------+--------------------------------------------------------+
| Variable_name | Value |
+--------------------------+--------------------------------------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql-5.6.24-osx10.8-x86_64/share/charsets/ |
+--------------------------+--------------------------------------------------------+
8 rows in set (0.05 sec)
那么,我做错了什么?
编辑 table 的结构是:
mysql> DESCRIBE files;
+---------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+------------------+------+-----+---------+----------------+
| ID | int(11) unsigned | NO | PRI | NULL | auto_increment |
| path | varchar(510) | YES | UNI | NULL | |
| dir | enum('0','1') | YES | | 0 | |
| google_id | varchar(255) | YES | | NULL | |
| md5 | varchar(255) | YES | | NULL | |
| deleted_local | enum('0','1') | YES | | 0 | |
+---------------+------------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)
另一个编辑:
mysql> show create table files;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| files | CREATE TABLE `files` (
`ID` int(11) unsigned NOT NULL AUTO_INCREMENT,
`path` varchar(510) CHARACTER SET latin1 DEFAULT NULL,
`dir` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
`google_id` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`md5` varchar(255) CHARACTER SET latin1 DEFAULT NULL,
`deleted_local` enum('0','1') CHARACTER SET latin1 DEFAULT '0',
PRIMARY KEY (`ID`),
UNIQUE KEY `path` (`path`)
) ENGINE=InnoDB AUTO_INCREMENT=961879 DEFAULT CHARSET=utf8 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.04 sec)
如您的第二次编辑所示,路径列具有 latin1 字符集,即使 table 默认为 utf8。也许您通过更改现有的 table?
进入了这种状态尝试ALTER TABLE files MODIFY path VARCHAR(510) CHARACTER SET utf8;
1.set 数据库 table 字段的排序规则类型为 utf8_unicode_ci
2.change 在元标记中。
元 http-equiv="Content-Type" 内容="text/html; charset=UTF-8"
- 你可以使用 echo utf8_encode($value);在您的页面中。