PHP javascript escape() 和 unescape() 的实现

PHP implementation of javascript escape() and unescape()

首先我明白 JS escape()unescape() 都被弃用了。基本上我们有一个古老的系统,JS escape() 数据在存储到 DB 之前,每次我们需要 unescape() 客户端的数据才能显示实际数据(我知道这很愚蠢但它是多年前完成,以支持非 unicode 兼容数据库上的 Unicode 字符)。

是否存在模拟 JavaScript escape()unescape() 函数的现有 PHP 实现?

您正在寻找 urlencode()。如果您不能接受该编码的输出,您可以尝试使用 rawurlencode()。

这里有更多信息:

http://php.net/manual/en/function.urldecode.php

http://php.net/manual/en/function.urlencode.php

但是如果你只是想进行解码以将数据存储到 mysql 数据库中,那么你可以使用内置的 mysql 转义字符串函数将输入转换为合适的输出可以注入 mysql 数据库的格式。

参见:

http://php.net/manual/en/mysqli.real-escape-string.php

经过一些搜索,我能够将两个 PHP 函数放在一起,它们可以满足我的需求。这些代码不是很漂亮,但 100% 可以处理我们目前拥有的数据,所以我想我会在这里分享它们。

/**
 *  Simulate javascript escape() function
 */
function escapejs($source) {
    $map = array(           
      ,'~'        => '%7E'
      ,'!'        => '%21'
      ,'\''       => '%27'       // single quote
      ,'('        => '%28'
      ,')'        => '%29'
      ,'#'        => '%23'
      ,'$'        => '%24'
      ,'&'        => '%26'
      ,','        => '%2C'
      ,':'        => '%3A'
      ,';'        => '%3B'
      ,'='        => '%3D'
      ,'?'        => '%3F'
      ,' '       => '%20'       // space
      ,'"'        => '%22'       // double quote
      ,'%'        => '%25'
      ,'<'        => '%3C'
      ,'>'        => '%3E'
      ,'['        => '%5B'
      ,'\'       => '%5C'       // forward slash \
      ,']'        => '%5D'
      ,'^'        => '%5E'
      ,'{'        => '%7B'
      ,'|'        => '%7C'
      ,'}'        => '%7D'
      ,'`'        => '%60'
      ,chr(9)     => '%09'
      ,chr(10)    => '%0A'
      ,chr(13)    => '%0D'
      ,'¡'       => '%A1'
      ,'¢'       => '%A2'
      ,'£'       => '%A3'
      ,'¤'       => '%A4'
      ,'¥'       => '%A5'
      ,'¦'       => '%A6'
      ,'§'       => '%A7'
      ,'¨'       => '%A8'
      ,'©'       => '%A9'
      ,'ª'       => '%AA'
      ,'«'       => '%AB'
      ,'¬'       => '%AC'
      ,'¯'       => '%AD'
      ,'®'       => '%AE'
      ,'¯'       => '%AF'
      ,'°'       => '%B0'
      ,'±'       => '%B1'
      ,'²'       => '%B2'
      ,'³'       => '%B3'
      ,'´'       => '%B4'
      ,'µ'       => '%B5'
      ,'¶'       => '%B6'
      ,'·'       => '%B7'
      ,'¸'       => '%B8'
      ,'¹'       => '%B9'
      ,'º'       => '%BA'
      ,'»'       => '%BB'
      ,'¼'       => '%BC'
      ,'½'       => '%BD'
      ,'¾'       => '%BE'
      ,'¿'       => '%BF'
      ,'À'       => '%C0'
      ,'Á'       => '%C1'
      ,'Â'       => '%C2'
      ,'Ã'       => '%C3'
      ,'Ä'       => '%C4'
      ,'Å'       => '%C5'
      ,'Æ'       => '%C6'
      ,'Ç'       => '%C7'
      ,'È'       => '%C8'
      ,'É'       => '%C9'
      ,'Ê'       => '%CA'
      ,'Ë'       => '%CB'
      ,'Ì'       => '%CC'
      ,'Í'       => '%CD'
      ,'Î'       => '%CE'
      ,'Ï'       => '%CF'
      ,'Ð'       => '%D0'
      ,'Ñ'       => '%D1'
      ,'Ò'       => '%D2'
      ,'Ó'       => '%D3'
      ,'Ô'       => '%D4'
      ,'Õ'       => '%D5'
      ,'Ö'       => '%D6'
      ,'×'       => '%D7'
      ,'Ø'       => '%D8'
      ,'Ù'       => '%D9'
      ,'Ú'       => '%DA'
      ,'Û'       => '%DB'
      ,'Ü'       => '%DC'
      ,'Ý'       => '%DD'
      ,'Þ'       => '%DE'
      ,'ß'       => '%DF'
      ,'à'       => '%E0'
      ,'á'       => '%E1'
      ,'â'       => '%E2'
      ,'ã'       => '%E3'
      ,'ä'       => '%E4'
      ,'å'       => '%E5'
      ,'æ'       => '%E6'
      ,'ç'       => '%E7'
      ,'è'       => '%E8'
      ,'é'       => '%E9'
      ,'ê'       => '%EA'
      ,'ë'       => '%EB'
      ,'ì'       => '%EC'
      ,'í'       => '%ED'
      ,'î'       => '%EE'
      ,'ï'       => '%EF'
      ,'ð'       => '%F0'
      ,'ñ'       => '%F1'
      ,'ò'       => '%F2'
      ,'ó'       => '%F3'
      ,'ô'       => '%F4'
      ,'õ'       => '%F5'
      ,'ö'       => '%F6'
      ,'÷'       => '%F7'
      ,'ø'       => '%F8'
      ,'ù'       => '%F9'
      ,'ú'       => '%FA'
      ,'û'       => '%FB'
      ,'ü'       => '%FC'
      ,'ý'       => '%FD'
      ,'þ'       => '%FE'
      ,'ÿ'       => '%FF'
    );

    $convmap = array(0x80, 0x10ffff, 0, 0xffffff);

    $org = $source;

    // make sure string is UTF8
    if (false === mb_check_encoding($source, 'UTF-8')) {
        if (false === ($source = iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $source))) {
          $source = $org;
        }
    }

    $chrArray = preg_split('//u', $source, -1, PREG_SPLIT_NO_EMPTY);  // split up the UTF8 string into chars
    $oChrArray = array();

    foreach ($chrArray as $index => $chr) {

      if (isset($map[$chr])) {
        $chr = $map[$chr];
      }
      // if char doesn't fall within ASCII then assume unicode, get the hex html entities
      //elseif (mb_detect_encoding($chr, 'ASCII', true) !== 'ASCII') {
      else {
        $chr = mb_encode_numericentity($chr, $convmap, "UTF-8", true);

        // since we will be converting the &#x notation to the non-standard %u for backward compatbility, make sure the code is 4 digits long by prepending 0p
        if (substr($chr, 0, 3) == '&#x' && substr($chr, -1) == ';' && strlen($chr) == 7)
          $chr = '&#x0'.substr($chr, 3);
      }

      $oChrArray[] = $chr;
    }
    $decodedStr = implode('', $oChrArray);
    $decodedStr = preg_replace('/&#x([0-9A-F]{4});/', '%u', $decodedStr);   // we need to use the %uXXXX format to simulate results generated with js escape()
    return $decodedStr;
}

/**
 *  Simulate javascript unescape() function
 */
function unescapejs($source) {
    $source = str_replace(array('%0B'), array(''), $source);    // stripe out vertical tab
    $s= preg_replace('/%u(....)/', '&#x;', $source);
    $s= preg_replace('/%(..)/', '&#x;', $s);
    return html_entity_decode($s, ENT_QUOTES, 'UTF-8');
}