我得到的编码 url 未被 $_GET[tag] 解码，例如 %5Cu003d 将解码“=”

Question

我收到一些用户请求带有编码 URL 的页面，但无法通过 $_GET[tag] 解码。

我心目中最糟糕的罪犯是 %5Cu003d，但还有其他人。在这个例子中 page.php?tag%5Cu003d44 应该是 page.php?tag=44 因为 %5C 是 / 所以 /u003D 是 unicode 003D 或 "="

我不知道哪个网站对此进行了编码 URL 但我试图在不手动解码的情况下为人们提供他们想要的东西。是否有一些开关或方法可以使 $_GET 工作？应该不是吧？

我尝试根据其他关于 SO 的讨论发送此 header，但没有帮助。 header ('Content-type: text/html; charset=utf-8');

编辑******************************

这里有更多错误 URL 的例子：

page.php?lat=25.79&amp%3Blon=-80.16
page.php?lat=41.46u0026lon%3D-82.71
page.php?lat%5Cu003d30.31%5Cu0026lon%5Cu003d-89.33
page.php?lat=28.94-89.4&lon

Answer 1

如果这是我的项目，我可能不会尊重这些 URL——即使利益相关者很好地询问。这真的是一团糟，数据很可能在解码过程中被破坏。 ...但是如果你想尝试一下，你可以从这样的事情开始：

代码：(Demo)

// this is hack until you can manage to resolve the encoding issue in a more professional manner
// use $_SERVER['QUERY_STRING'] to extract the query string from the url

$queryStrings = [
    'lat=25.79&amp%3Blon=-80.16',
    'lat=41.46u0026lon%3D-82.71',
    'lat%5Cu003d30.31%5Cu0026lon%5Cu003d-89.33',
    'lat=28.94-89.4&lon',
    'tag%5Cu003d44'
];

foreach ($queryStrings as $queryString) {

    // replace unicode-like substrings
    $queryString = preg_replace_callback('/u([\da-f]{4})/i', function ($match) {
        return mb_convert_encoding(pack('H*', $match[1]), 'UTF-8', 'UCS-2BE');
    }, urldecode($queryString));
    // courtesy of Gumbo: 
    
    // replace ampersands and remove backslashes
    $queryString = strtr($queryString, ['&amp;' => '&', '\' => '']);
    
    // parse the decoded query string back into the GET superglobal so that regular processing can resume
    parse_str($queryString, $_GET);
    var_export($_GET);
    echo "\n";
}

输出：

array (
  'lat' => '25.79',
  'lon' => '-80.16',
)
array (
  'lat' => '41.46',
  'lon' => '-82.71',
)
array (
  'lat' => '30.31',
  'lon' => '-89.33',
)
array (
  'lat' => '28.94-89.4',    // <-- I guess you'll need to massage this into the correct shape too
  'lon' => '',
)
array (
  'tag' => '44',
)

Answer 2

我决定尝试对错误的 URL 进行解码，因为出于未知原因，它们显示为来自我的页面。我担心某些设备正在对调用进行编码，也许 Android，也许是一些新的浏览器。我不知道是什么在对它们进行编码，但因为有些似乎来自我的网站，所以我认为我应该修复它们。澄清一下，这是嵌入在我的一个网站中的 php 图片。到目前为止，这已经捕获了过去几天的所有实例。这个想法是获取查询字符串并慢慢对其进行解码，然后手动获取两个变量，但前提是它们未使用正常过程成功解码。这样一来，我只处理我本来会拒绝的电话，因此任何意想不到的后果都是微不足道的。

<?
$latitude = trim(strip_tags($_GET['lat']));
$longitude = trim(strip_tags($_GET['lon']));
$request = getenv("QUERY_STRING");
$request = urldecode($request);// get rid of %5C type conversions
$request = unicode_decode($request);// with the %5c stuff removed, convert any unicode
$i = strpos($request,"lon");
$j = strpos($request,"lat");
// only decode things that didn't work with normal $_GET
if ($i != "" && $longitude == "") $longitude = substr($request,$i+4) + 0;
if (($j != "" || $j == 0) && $latitude == "") $latitude = substr($request,$j+4) + 0;
?>

我得到的编码 url 未被 $_GET[tag] 解码，例如 %5Cu003d 将解码“=”

I am getting encoded url that are not decoded by $_GET[tag] such as %5Cu003d which would decode "="

php

unicode

get