在 symfony/goutte 中加入 URL
Join URLs in symfony/goutte
我有一个 Goutte/Client(goutte 使用 symfony 来处理请求),我想加入路径并获得最终的 URL:
$client = new Goutte\Client();
$crawler = $client->request('GET', 'http://DOMAIN/some/path/')
// $crawler is instance of Symfony\Component\DomCrawler\Crawler
$new_path = '../new_page';
$final path = $crawler->someMagicFunction($new_path);
// final path == http://DOMAIN/some/new_page
我正在寻找的是一种简单的方法,将 $new_path
变量与请求中的当前页面连接起来,并获取新的 URL.
请注意 $new_page
可以是以下任何一个:
new_page ==> http://DOMAIN/some/path/new_page
../new_page ==> http://DOMAIN/some/new_page
/new_page ==> http://DOMAIN/new_page
symfony/goutte/guzzle 是否提供了任何简单的方法?
我从 Symfony\Component\HttpFoundation\Request
中找到了 getUriForPath
,但我没有看到将 Symfony\Component\BrowserKit\Request
转换为 HttpFoundation\Request
的任何简单方法
您可以使用 parse_url
获取 url 的路径:
$components = parse_url('http://DOMAIN/some/path/');
$path = $components['path'];
然后你需要一种方法来规范它。 This answer 可以帮到你:
function normalizePath($path, $separator = '\/')
{
// Remove any kind of funky unicode whitespace
$normalized = preg_replace('#\p{C}+|^\./#u', '', $path);
// Path remove self referring paths ("/./").
$normalized = preg_replace('#/\.(?=/)|^\./|\./$#', '', $normalized);
// Regex for resolving relative paths
$regex = '#\/*[^/\.]+/\.\.#Uu';
while (preg_match($regex, $normalized)) {
$normalized = preg_replace($regex, '', $normalized);
}
if (preg_match('#/\.{2}|\.{2}/#', $normalized)) {
throw new LogicException('Path is outside of the defined root, path: [' . $path . '], resolved: [' . $normalized . ']');
}
return trim($normalized, $separator);
}
剩下要做的就是重建 url,你可以看到 this comment:
function unparse_url($parsed_url) {
$scheme = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : '';
$host = isset($parsed_url['host']) ? $parsed_url['host'] : '';
$port = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : '';
$user = isset($parsed_url['user']) ? $parsed_url['user'] : '';
$pass = isset($parsed_url['pass']) ? ':' . $parsed_url['pass'] : '';
$pass = ($user || $pass) ? "$pass@" : '';
$path = isset($parsed_url['path']) ? $parsed_url['path'] : '';
$query = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : '';
$fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : '';
return "$scheme$user$pass$host$port/$path$query$fragment";
}
最终路径:
$new_path = '../new_page';
if (strpos($new_path, '/') === 0) { // absolute path, replace it entirely
$path = $new_path;
} else { // relative path, append it
$path = $path . $new_path;
}
放在一起:
// http://DOMAIN/some/new_page
echo unparse_url(array_replace($components, array('path' => normalizePath($path))));
使用 guzzlehttp/prs7
包中的 Uri::resolve()
。此方法允许您从基础和相关部分创建 标准化 URL。
一个例子(使用优秀psysh shell):
Psy Shell v0.7.2 (PHP 7.0.12 — cli) by Justin Hileman
>>> $base = new GuzzleHttp\Psr7\Uri('http://example.com/some/dir')
=> GuzzleHttp\Psr7\Uri {#208}
>>> (string) GuzzleHttp\Psr7\Uri::resolve($base, '/new_base/next/next/../../back_2')
=> "http://example.com/new_base/back_2"
另请查看与您的问题相关的 UriNormalizer class. There is an example (test case)。
来自测试用例:
$uri = new Uri('http://example.org/../a/b/../c/./d.html');
$normalizedUri = UriNormalizer::normalize($uri, UriNormalizer::REMOVE_DOT_SEGMENTS);
$this->assertSame('http://example.org/a/c/d.html', (string) $normalizedUri);
我有一个 Goutte/Client(goutte 使用 symfony 来处理请求),我想加入路径并获得最终的 URL:
$client = new Goutte\Client();
$crawler = $client->request('GET', 'http://DOMAIN/some/path/')
// $crawler is instance of Symfony\Component\DomCrawler\Crawler
$new_path = '../new_page';
$final path = $crawler->someMagicFunction($new_path);
// final path == http://DOMAIN/some/new_page
我正在寻找的是一种简单的方法,将 $new_path
变量与请求中的当前页面连接起来,并获取新的 URL.
请注意 $new_page
可以是以下任何一个:
new_page ==> http://DOMAIN/some/path/new_page
../new_page ==> http://DOMAIN/some/new_page
/new_page ==> http://DOMAIN/new_page
symfony/goutte/guzzle 是否提供了任何简单的方法?
我从 Symfony\Component\HttpFoundation\Request
中找到了 getUriForPath
,但我没有看到将 Symfony\Component\BrowserKit\Request
转换为 HttpFoundation\Request
您可以使用 parse_url
获取 url 的路径:
$components = parse_url('http://DOMAIN/some/path/');
$path = $components['path'];
然后你需要一种方法来规范它。 This answer 可以帮到你:
function normalizePath($path, $separator = '\/')
{
// Remove any kind of funky unicode whitespace
$normalized = preg_replace('#\p{C}+|^\./#u', '', $path);
// Path remove self referring paths ("/./").
$normalized = preg_replace('#/\.(?=/)|^\./|\./$#', '', $normalized);
// Regex for resolving relative paths
$regex = '#\/*[^/\.]+/\.\.#Uu';
while (preg_match($regex, $normalized)) {
$normalized = preg_replace($regex, '', $normalized);
}
if (preg_match('#/\.{2}|\.{2}/#', $normalized)) {
throw new LogicException('Path is outside of the defined root, path: [' . $path . '], resolved: [' . $normalized . ']');
}
return trim($normalized, $separator);
}
剩下要做的就是重建 url,你可以看到 this comment:
function unparse_url($parsed_url) {
$scheme = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : '';
$host = isset($parsed_url['host']) ? $parsed_url['host'] : '';
$port = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : '';
$user = isset($parsed_url['user']) ? $parsed_url['user'] : '';
$pass = isset($parsed_url['pass']) ? ':' . $parsed_url['pass'] : '';
$pass = ($user || $pass) ? "$pass@" : '';
$path = isset($parsed_url['path']) ? $parsed_url['path'] : '';
$query = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : '';
$fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : '';
return "$scheme$user$pass$host$port/$path$query$fragment";
}
最终路径:
$new_path = '../new_page';
if (strpos($new_path, '/') === 0) { // absolute path, replace it entirely
$path = $new_path;
} else { // relative path, append it
$path = $path . $new_path;
}
放在一起:
// http://DOMAIN/some/new_page
echo unparse_url(array_replace($components, array('path' => normalizePath($path))));
使用 guzzlehttp/prs7
包中的 Uri::resolve()
。此方法允许您从基础和相关部分创建 标准化 URL。
一个例子(使用优秀psysh shell):
Psy Shell v0.7.2 (PHP 7.0.12 — cli) by Justin Hileman
>>> $base = new GuzzleHttp\Psr7\Uri('http://example.com/some/dir')
=> GuzzleHttp\Psr7\Uri {#208}
>>> (string) GuzzleHttp\Psr7\Uri::resolve($base, '/new_base/next/next/../../back_2')
=> "http://example.com/new_base/back_2"
另请查看与您的问题相关的 UriNormalizer class. There is an example (test case)。
来自测试用例:
$uri = new Uri('http://example.org/../a/b/../c/./d.html');
$normalizedUri = UriNormalizer::normalize($uri, UriNormalizer::REMOVE_DOT_SEGMENTS);
$this->assertSame('http://example.org/a/c/d.html', (string) $normalizedUri);