Bing 将地理数据解压缩算法端口映射到 PHP
Bing Maps Geodata Decompress Algorithm port to PHP
我正在尝试将 Microsoft 的解压缩算法从 Java 移植到 PHP(或者可能是它的 C++ 或 C#,因为那是 Microsoft)。这是一种从 Bing 地图地理数据 API 结果中提取压缩形状数据并将其扩展为 lat/lon 坐标的算法。他们在 https://msdn.microsoft.com/en-us/library/dn306801.aspx
的网站上发布了他们的算法
我的数据库中存储了一个坐标列表,我正在尝试检索定义多边形以处理该形状的坐标数组。我的结果不同。谁能指出两者之间的差异?
EDIT:我认为我的问题在于 PHP 不处理 LONG 类型的整数,并且在进行按位运算时会发生精度损失。我可能需要转换一些操作才能使用 BCMath。帮忙吗?
解压算法(微软的)
public const string safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
private static bool TryParseEncodedValue(string value, out List<Coordinate> parsedValue)
{
parsedValue = null;
var list = new List<Coordinate>();
int index = 0;
int xsum = 0, ysum = 0;
while (index < value.Length) // While we have more data,
{
long n = 0; // initialize the accumulator
int k = 0; // initialize the count of bits
while (true)
{
if (index >= value.Length) // If we ran out of data mid-number
return false; // indicate failure.
int b = safeCharacters.IndexOf(value[index++]);
if (b == -1) // If the character wasn't on the valid list,
return false; // indicate failure.
n |= ((long)b & 31) << k; // mask off the top bit and append the rest to the accumulator
k += 5; // move to the next position
if (b < 32) break; // If the top bit was not set, we're done with this number.
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
int diagonal = (int)((Math.Sqrt(8 * n + 5) - 1) / 2);
// subtract the total number of points from lower diagonals
n -= diagonal * (diagonal + 1L) / 2;
// get the X and Y from what's left over
int ny = (int)n;
int nx = diagonal - ny;
// undo the sign encoding
nx = (nx >> 1) ^ -(nx & 1);
ny = (ny >> 1) ^ -(ny & 1);
// undo the delta encoding
xsum += nx;
ysum += ny;
// position the decimal point
list.Add(new Coordinate { Latitude = ysum * 0.00001, Longitude = xsum * 0.00001 });
}
parsedValue = list;
return true;
}
我的解压算法(PHP)
function tryParseEncodedValue($value) {
$value = 'vx1vilihnM6hR7mEl2Q';
var_error_log($value);
$safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
$list = array();
$index = 0;
(int)$xsum = 0;
(int)$ysum = 0;
while ($index < strlen($value)) // While we have more data,
{
$n = 0; // initialize the accumulator
$k = 0; // initialize the count of bits
while (true)
{
if ($index >= strlen($value)) // If we ran out of data mid-number
{
var_error_log('failed: inxed >= strlen($value)');
return false; // indicate failure.
}
(int)$b = strpos($safeCharacters, $value[$index++]);
if (!$b) { // If the character wasn't on the valid list,
var_error_log('failed: character not in valid list');
return false; // indicate failure.
}
$n |= ($b & 31) << $k; // mask off the top bit and append the rest to the accumulator
$k = $k+5; // move to the next position
if ($b < 32) break; // If the top bit was not set, we're done with this number.
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
$diagonal = (int)((sqrt(8 * $n + 5) - 1) / 2);
// subtract the total number of points from lower diagonals
$n -= $diagonal * ($diagonal + (int)1) / 2;
// get the X and Y from what's left over
$ny = (int)$n;
$nx = $diagonal - $ny;
// undo the sign encoding
$nx = pow(($nx >> 1), (-($nx & 1)) );
$ny = pow(($ny >> 1), (-($ny & 1)) );
// undo the delta encoding
$xsum += $nx;
$ysum += $ny;
// position the decimal point
$coordinates = array($ysum * 0.00001, $xsum * 0.00001);
array_push($list, $coordinates);
}
$parsedValue = $list;
var_error_log($parsedValue);
return $parsedValue;
}
已知输入
Microsoft 提供了一个示例输入和输出来验证您的算法。 https://msdn.microsoft.com/en-us/library/jj158958.aspx#TestingYourAlg
compressed shape = 'vx1vilihnM6hR7mEl2Q'
预期输出
an array of coordinates
35.894309002906084, -110.72522000409663
35.893930979073048, -110.72577999904752
35.893744984641671, -110.72606003843248
35.893366960808635, -110.72661500424147
我的输出
array(4) {
[0]=>
array(2) {
[0]=>
float(1.0E-5)
[1]=>
float(1.0E-5)
}
[1]=>
array(2) {
[0]=>
float(1.027027027027E-5)
[1]=>
float(1.0181818181818E-5)
}
[2]=>
array(2) {
[0]=>
float(1.0825825825826E-5)
[1]=>
float(1.0552188552189E-5)
}
[3]=>
array(2) {
[0]=>
float(1.1103603603604E-5)
[1]=>
float(1.0734006734007E-5)
}
}
所以,我们可以看到 PHP 输出没有被正确计算,我觉得这与在 Java 和 [=62] 中转换为 Long 整数的差异有关=] 整数的按位运算。 PHP 应该处理整数,无论它们是长整数、浮点数还是整数,但我觉得我忽略了一些东西。
我敢打赌问题与这条线有关。谁能指出差异?
n |= ((long)b & 31) << k; // mask off the top bit and append the rest to the accumulator
我怀疑您的问题出在您转换以下 C# 代码时:
nx = (nx >> 1) ^ -(nx & 1);
ny = (ny >> 1) ^ -(ny & 1);
在您的 PHP 代码中,您将其转换为:
$nx = pow(($nx >> 1), (-($nx & 1)) );
$ny = pow(($ny >> 1), (-($ny & 1)) );
在 C# 中,^ 是按位异或运算而不是幂。 PHP 对按位异或使用相同的符号,因此请尝试将您的代码更改为:
$nx = ($nx >> 1) ^ (-($nx & 1));
$ny = ($ny >> 1) ^ (-($ny & 1));
我已将 C# 代码转换为 PHP。问题确实在于 php 中的大浮点数会丢失精度。由于某些值超出了 32 位整数的范围,并在 C# 中存储为 64 位整数,因此必须将这些值转换为 PHP's GMP class。 GMP 支持长按位运算。
/*
* Microsoft's decompression algorithm - php version
* returns an array of coordinates (pairs of doubles)
*/
function tryParseEncodedValue($value) {
$safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
$list = array();
(int)$index = 0;
(int)$xsum = 0;
(int)$ysum = 0;
while ($index < strlen($value)) // While we have more data,
{
$n = 0; // initialize the accumulator
$k = 0; // initialize the count of bits
while (true)
{
if ($index >= strlen($value)) // If we ran out of data mid-number
{
var_error_log('failed: inxed >= strlen($value)');
return false; // indicate failure.
}
$b = strpos($safeCharacters, $value[$index++]);
if ($b === false) { // If the character wasn't on the valid list,
var_error_log('failed: character not in valid list');
return false; // indicate failure.
}
// mask off the top bit and append the rest to the accumulator
// n |= ((long)b & 31) << k;
$bgmp = gmp_init($b); // Here i'm breaking out this function
$bitwiseand = gmp_and($bgmp, 31); // on multiple lines because there's
$shifted = gmp_shiftl($bitwiseand, $k); // so many steps
$n = gmp_or($n, $shifted);
$k += 5;
if (gmp_cmp($bgmp, gmp_init(32)) < 0) break; // gmp compare: b < 32
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
//$diagonal = (int)((sqrt(8 * $n + 5) - 1) / 2);
$diagonal = gmp_intval(gmp_div_q(gmp_sub(gmp_sqrt(gmp_add(gmp_mul($n, 8), 5)), 1), 2));
// subtract the total number of points from lower diagonals
// n -= diagonal * (diagonal + 1L) / 2;
$n = gmp_sub($n, gmp_div_q(gmp_mul($diagonal, gmp_add($diagonal, 1)), 2));
// get the X and Y from what's left over
(int)$ny = gmp_intval($n);
(int)$nx = $diagonal - $ny;
// undo the sign encoding
$nx = ($nx >> 1)^ (-($nx & 1));
$ny = ($ny >> 1)^ (-($ny & 1));
// undo the delta encoding
$xsum += $nx;
$ysum += $ny;
// position the decimal point
$coordinate = array($ysum * 0.00001, $xsum * 0.00001);
array_push($list, $coordinate);
}
return $list;
}
// shift left, $x number to shift, $n shift n times.
function gmp_shiftl($x,$n) {
return(gmp_mul($x,gmp_pow(2,$n)));
}
我正在尝试将 Microsoft 的解压缩算法从 Java 移植到 PHP(或者可能是它的 C++ 或 C#,因为那是 Microsoft)。这是一种从 Bing 地图地理数据 API 结果中提取压缩形状数据并将其扩展为 lat/lon 坐标的算法。他们在 https://msdn.microsoft.com/en-us/library/dn306801.aspx
的网站上发布了他们的算法我的数据库中存储了一个坐标列表,我正在尝试检索定义多边形以处理该形状的坐标数组。我的结果不同。谁能指出两者之间的差异?
EDIT:我认为我的问题在于 PHP 不处理 LONG 类型的整数,并且在进行按位运算时会发生精度损失。我可能需要转换一些操作才能使用 BCMath。帮忙吗?
解压算法(微软的)
public const string safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
private static bool TryParseEncodedValue(string value, out List<Coordinate> parsedValue)
{
parsedValue = null;
var list = new List<Coordinate>();
int index = 0;
int xsum = 0, ysum = 0;
while (index < value.Length) // While we have more data,
{
long n = 0; // initialize the accumulator
int k = 0; // initialize the count of bits
while (true)
{
if (index >= value.Length) // If we ran out of data mid-number
return false; // indicate failure.
int b = safeCharacters.IndexOf(value[index++]);
if (b == -1) // If the character wasn't on the valid list,
return false; // indicate failure.
n |= ((long)b & 31) << k; // mask off the top bit and append the rest to the accumulator
k += 5; // move to the next position
if (b < 32) break; // If the top bit was not set, we're done with this number.
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
int diagonal = (int)((Math.Sqrt(8 * n + 5) - 1) / 2);
// subtract the total number of points from lower diagonals
n -= diagonal * (diagonal + 1L) / 2;
// get the X and Y from what's left over
int ny = (int)n;
int nx = diagonal - ny;
// undo the sign encoding
nx = (nx >> 1) ^ -(nx & 1);
ny = (ny >> 1) ^ -(ny & 1);
// undo the delta encoding
xsum += nx;
ysum += ny;
// position the decimal point
list.Add(new Coordinate { Latitude = ysum * 0.00001, Longitude = xsum * 0.00001 });
}
parsedValue = list;
return true;
}
我的解压算法(PHP)
function tryParseEncodedValue($value) {
$value = 'vx1vilihnM6hR7mEl2Q';
var_error_log($value);
$safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
$list = array();
$index = 0;
(int)$xsum = 0;
(int)$ysum = 0;
while ($index < strlen($value)) // While we have more data,
{
$n = 0; // initialize the accumulator
$k = 0; // initialize the count of bits
while (true)
{
if ($index >= strlen($value)) // If we ran out of data mid-number
{
var_error_log('failed: inxed >= strlen($value)');
return false; // indicate failure.
}
(int)$b = strpos($safeCharacters, $value[$index++]);
if (!$b) { // If the character wasn't on the valid list,
var_error_log('failed: character not in valid list');
return false; // indicate failure.
}
$n |= ($b & 31) << $k; // mask off the top bit and append the rest to the accumulator
$k = $k+5; // move to the next position
if ($b < 32) break; // If the top bit was not set, we're done with this number.
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
$diagonal = (int)((sqrt(8 * $n + 5) - 1) / 2);
// subtract the total number of points from lower diagonals
$n -= $diagonal * ($diagonal + (int)1) / 2;
// get the X and Y from what's left over
$ny = (int)$n;
$nx = $diagonal - $ny;
// undo the sign encoding
$nx = pow(($nx >> 1), (-($nx & 1)) );
$ny = pow(($ny >> 1), (-($ny & 1)) );
// undo the delta encoding
$xsum += $nx;
$ysum += $ny;
// position the decimal point
$coordinates = array($ysum * 0.00001, $xsum * 0.00001);
array_push($list, $coordinates);
}
$parsedValue = $list;
var_error_log($parsedValue);
return $parsedValue;
}
已知输入 Microsoft 提供了一个示例输入和输出来验证您的算法。 https://msdn.microsoft.com/en-us/library/jj158958.aspx#TestingYourAlg
compressed shape = 'vx1vilihnM6hR7mEl2Q'
预期输出
an array of coordinates
35.894309002906084, -110.72522000409663
35.893930979073048, -110.72577999904752
35.893744984641671, -110.72606003843248
35.893366960808635, -110.72661500424147
我的输出
array(4) {
[0]=>
array(2) {
[0]=>
float(1.0E-5)
[1]=>
float(1.0E-5)
}
[1]=>
array(2) {
[0]=>
float(1.027027027027E-5)
[1]=>
float(1.0181818181818E-5)
}
[2]=>
array(2) {
[0]=>
float(1.0825825825826E-5)
[1]=>
float(1.0552188552189E-5)
}
[3]=>
array(2) {
[0]=>
float(1.1103603603604E-5)
[1]=>
float(1.0734006734007E-5)
}
}
所以,我们可以看到 PHP 输出没有被正确计算,我觉得这与在 Java 和 [=62] 中转换为 Long 整数的差异有关=] 整数的按位运算。 PHP 应该处理整数,无论它们是长整数、浮点数还是整数,但我觉得我忽略了一些东西。
我敢打赌问题与这条线有关。谁能指出差异?
n |= ((long)b & 31) << k; // mask off the top bit and append the rest to the accumulator
我怀疑您的问题出在您转换以下 C# 代码时:
nx = (nx >> 1) ^ -(nx & 1);
ny = (ny >> 1) ^ -(ny & 1);
在您的 PHP 代码中,您将其转换为:
$nx = pow(($nx >> 1), (-($nx & 1)) );
$ny = pow(($ny >> 1), (-($ny & 1)) );
在 C# 中,^ 是按位异或运算而不是幂。 PHP 对按位异或使用相同的符号,因此请尝试将您的代码更改为:
$nx = ($nx >> 1) ^ (-($nx & 1));
$ny = ($ny >> 1) ^ (-($ny & 1));
我已将 C# 代码转换为 PHP。问题确实在于 php 中的大浮点数会丢失精度。由于某些值超出了 32 位整数的范围,并在 C# 中存储为 64 位整数,因此必须将这些值转换为 PHP's GMP class。 GMP 支持长按位运算。
/*
* Microsoft's decompression algorithm - php version
* returns an array of coordinates (pairs of doubles)
*/
function tryParseEncodedValue($value) {
$safeCharacters = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789_-";
$list = array();
(int)$index = 0;
(int)$xsum = 0;
(int)$ysum = 0;
while ($index < strlen($value)) // While we have more data,
{
$n = 0; // initialize the accumulator
$k = 0; // initialize the count of bits
while (true)
{
if ($index >= strlen($value)) // If we ran out of data mid-number
{
var_error_log('failed: inxed >= strlen($value)');
return false; // indicate failure.
}
$b = strpos($safeCharacters, $value[$index++]);
if ($b === false) { // If the character wasn't on the valid list,
var_error_log('failed: character not in valid list');
return false; // indicate failure.
}
// mask off the top bit and append the rest to the accumulator
// n |= ((long)b & 31) << k;
$bgmp = gmp_init($b); // Here i'm breaking out this function
$bitwiseand = gmp_and($bgmp, 31); // on multiple lines because there's
$shifted = gmp_shiftl($bitwiseand, $k); // so many steps
$n = gmp_or($n, $shifted);
$k += 5;
if (gmp_cmp($bgmp, gmp_init(32)) < 0) break; // gmp compare: b < 32
}
// The resulting number encodes an x, y pair in the following way:
//
// ^ Y
// |
// 14
// 9 13
// 5 8 12
// 2 4 7 11
// 0 1 3 6 10 ---> X
// determine which diagonal it's on
//$diagonal = (int)((sqrt(8 * $n + 5) - 1) / 2);
$diagonal = gmp_intval(gmp_div_q(gmp_sub(gmp_sqrt(gmp_add(gmp_mul($n, 8), 5)), 1), 2));
// subtract the total number of points from lower diagonals
// n -= diagonal * (diagonal + 1L) / 2;
$n = gmp_sub($n, gmp_div_q(gmp_mul($diagonal, gmp_add($diagonal, 1)), 2));
// get the X and Y from what's left over
(int)$ny = gmp_intval($n);
(int)$nx = $diagonal - $ny;
// undo the sign encoding
$nx = ($nx >> 1)^ (-($nx & 1));
$ny = ($ny >> 1)^ (-($ny & 1));
// undo the delta encoding
$xsum += $nx;
$ysum += $ny;
// position the decimal point
$coordinate = array($ysum * 0.00001, $xsum * 0.00001);
array_push($list, $coordinate);
}
return $list;
}
// shift left, $x number to shift, $n shift n times.
function gmp_shiftl($x,$n) {
return(gmp_mul($x,gmp_pow(2,$n)));
}