解析 Newegg 网站的价格
Parsing Newegg website for price
我正在尝试获取以下产品的价格
http://www.newegg.com/Product/Product.aspx?Item=N82E16822236715.
仅使用 jQuery,以下代码有效并在预处理前为我提供了值
jQuery('.grpPrimary div[style = "display: block;"] .price-current').text()
我正在尝试使用 Goutte 和 Laravel:
做同样的事情
$client = new Client();
$crawler = $client->request('GET', 'http://www.newegg.com/Product/Product.aspx?Item=N82E16822236715');
$crawler = $crawler->filter('.grpPrimary div[style = "display: block;"] .price-current' )->each(function ($node) {
ECHO $node->text()."\n";});
而且我没有得到价格值,我尝试 select 它的父 div,但价格值没有显示在其中任何一个中。 phpquery也是一样。
我不认为这是一个好方法,只是通过探索返回的 HTML,我看到有这个小 JS:
var utag_data = {
page_breadcrumb:'Home > Components > Hard Drives > Desktop External Hard Drives > Western Digital > Item#:9SIA29P2YK5768',
page_tab_name:'Components',
product_category_id:['15'],
product_category_name:['Hard Drives'],
product_subcategory_id:['414'],
product_subcategory_name:['Desktop External Hard Drives'],
product_id:['9SIA29P2YK5768'],
product_web_id:['9SIA29P2YK5768'],
product_title:['WD Elements 2TB USB 3.0 External Desktop Storage WDBWLG0020HBK-NESN Black'],
product_manufacture:['Western Digital'],
product_unit_price:['89.99'],
product_sale_price:['79.99'],
product_default_shipping_cost:['0.01'],
product_type:['Seller'],
product_model:['WDBWLG0020HBK-NESN'],
product_instock:['1'],
product_group_id:['30896206'],
hl_seller_id_list:'A29P|A4P0|1|A6ZP|A2F8|A1N8|A24G|A0ZX|A8H5|A6AH',
hl_prod_id_list:'9SIA29P2YK5768|9SIA4P02RJ6296|N82E16822236715|9SIA6ZP3K22742|9SIA2F83426160|9SIA1N81YB4387|9SIA24G2179173|9SIA0ZX1W07804|9SIA8H531C6923|9SIA6AH3AB2901',
hl_prod_p_list:'79.99|89.99|94.99|106.62|109.35|110.29|109.59|113.93|131.86|131.69',
hl_prod_qty_list:'1|1|1|1|1|1|1|1|1|1',
parent_item:'N82E16822236715',
page_type:'Product',
site_region:'USA',
site_currency:'USD',
page_name:'ProductDetail',
search_scope:jQuery('#haQuickSearchStore option:selected').text(),
user_nvtc:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.NVTC),
user_name:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.LOGIN,'LOGINID6'),
third_party_render:['4774d6780334a7bf9c3c95255c60401916d07cae','78b8b16d9d0f6f2e8419ac12fa710f5153f1cee3','65531e14b4d9b9a223cc3bfcb65ce7b5f356011d','2a5e772a0f941c862180037f8a5c118c7abf2f7d','9011adc5233493f5adc5f0f0f1bcb655892c09e3']
};
所以如果我是你,我会做一个正则表达式,它会取 product_sale_price:['
(product_sale_price:\['
)和 ']
(之间的每个数字和点([\d.]+
) '\]
)
所以它是这样的:
product_sale_price:\['([\d.]+)'\]
所以在 PHP 中它将是:
$str = '...'; // The JS array OR the full HTML page will also work
preg_match("/product_sale_price:\['([\d.]+)'\]/", $str, $matches);
因此您的结果将存储在 $matches
数组中;所以:
$price = floatval($matches[1]);
我正在尝试获取以下产品的价格 http://www.newegg.com/Product/Product.aspx?Item=N82E16822236715.
仅使用 jQuery,以下代码有效并在预处理前为我提供了值
jQuery('.grpPrimary div[style = "display: block;"] .price-current').text()
我正在尝试使用 Goutte 和 Laravel:
做同样的事情$client = new Client();
$crawler = $client->request('GET', 'http://www.newegg.com/Product/Product.aspx?Item=N82E16822236715');
$crawler = $crawler->filter('.grpPrimary div[style = "display: block;"] .price-current' )->each(function ($node) {
ECHO $node->text()."\n";});
而且我没有得到价格值,我尝试 select 它的父 div,但价格值没有显示在其中任何一个中。 phpquery也是一样。
我不认为这是一个好方法,只是通过探索返回的 HTML,我看到有这个小 JS:
var utag_data = {
page_breadcrumb:'Home > Components > Hard Drives > Desktop External Hard Drives > Western Digital > Item#:9SIA29P2YK5768',
page_tab_name:'Components',
product_category_id:['15'],
product_category_name:['Hard Drives'],
product_subcategory_id:['414'],
product_subcategory_name:['Desktop External Hard Drives'],
product_id:['9SIA29P2YK5768'],
product_web_id:['9SIA29P2YK5768'],
product_title:['WD Elements 2TB USB 3.0 External Desktop Storage WDBWLG0020HBK-NESN Black'],
product_manufacture:['Western Digital'],
product_unit_price:['89.99'],
product_sale_price:['79.99'],
product_default_shipping_cost:['0.01'],
product_type:['Seller'],
product_model:['WDBWLG0020HBK-NESN'],
product_instock:['1'],
product_group_id:['30896206'],
hl_seller_id_list:'A29P|A4P0|1|A6ZP|A2F8|A1N8|A24G|A0ZX|A8H5|A6AH',
hl_prod_id_list:'9SIA29P2YK5768|9SIA4P02RJ6296|N82E16822236715|9SIA6ZP3K22742|9SIA2F83426160|9SIA1N81YB4387|9SIA24G2179173|9SIA0ZX1W07804|9SIA8H531C6923|9SIA6AH3AB2901',
hl_prod_p_list:'79.99|89.99|94.99|106.62|109.35|110.29|109.59|113.93|131.86|131.69',
hl_prod_qty_list:'1|1|1|1|1|1|1|1|1|1',
parent_item:'N82E16822236715',
page_type:'Product',
site_region:'USA',
site_currency:'USD',
page_name:'ProductDetail',
search_scope:jQuery('#haQuickSearchStore option:selected').text(),
user_nvtc:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.NVTC),
user_name:Web.StateManager.Cookies.get(Web.StateManager.Cookies.Name.LOGIN,'LOGINID6'),
third_party_render:['4774d6780334a7bf9c3c95255c60401916d07cae','78b8b16d9d0f6f2e8419ac12fa710f5153f1cee3','65531e14b4d9b9a223cc3bfcb65ce7b5f356011d','2a5e772a0f941c862180037f8a5c118c7abf2f7d','9011adc5233493f5adc5f0f0f1bcb655892c09e3']
};
所以如果我是你,我会做一个正则表达式,它会取 product_sale_price:['
(product_sale_price:\['
)和 ']
(之间的每个数字和点([\d.]+
) '\]
)
所以它是这样的:
product_sale_price:\['([\d.]+)'\]
所以在 PHP 中它将是:
$str = '...'; // The JS array OR the full HTML page will also work
preg_match("/product_sale_price:\['([\d.]+)'\]/", $str, $matches);
因此您的结果将存储在 $matches
数组中;所以:
$price = floatval($matches[1]);