将 html 和 request/cheerio 拼凑成 js object
Scrape html with request/cheerio into js object
Cheerio 和 js 的新手。我正在尝试将所有投手的名字及其相关统计数据刮到 JSON object 中,如下所示:
var pitchers = {
name: 'Just Verlander',
era: 6.62
etc...
etc...
}
这是我要抓取的html:
<tr class="">
<td class="stat-name-width"><img src="../../style/assets/img/mlb/team-logos/tigers.png" height="20"/>
<span class="pitcher-name">Justin Verlander</span>
<div class="fantasy-blue inline fantasy-data pitcher-salary-fd">,100</div>
<small class="text-muted pitches">(R)</small>
<small class="text-muted matchup">(@ BOS)</small></td>
<td class="stat-stat-width fantasy-blue fantasy-points">
<td class="stat-stat-width">0-3</td>
<td class="stat-stat-width">6.62</td>
<td class="stat-stat-width">1.50</td>
<td class="stat-stat-width">5.82</td>
<td class="stat-stat-width">3.18</td>
<td class="stat-stat-width">2.12</td>
<td class="stat-stat-width">5.67</td>
<td class="stat-stat-width">1.03x</td>
<td class="stat-stat-width">0.96x</td>
<td class="stat-stat-width">1.09x</td>
<td class="stat-stat-width">0.90x</td>
</tr>
同一页面上大约有 30 个具有相同结构的投手。
这是我目前的情况:
test = $('span.pitcher-name').text();给我所有投手的名字,不只是一个。
显然我什至不接近...我无法弄清楚如何将投手姓名的 children 关联到 javascript object.. . 非常感谢任何帮助!
你见过the documentation吗?如果你往下看,有大量关于如何遍历站点元素的示例。
例如:
$('#span.pitcher-name').next()
//{['<small class="text-muted pitches">(R)</small>']}
看起来你想要的是 $().each() 函数。
使用此函数,您可以遍历标记的每个实例并执行回调函数,如下所示:
var someObjArr = [];
$('span.pitcher-name').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the name property of our object with the text value.
someObjArr[i].name = text;
});
$('div.pitcher-salary-fd').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the salary property of our object with the text value.
someObjArr[i].salary = text;
});
console.log(someObjArr); //[ { name: 'Justin Verlander', salary: ',100' } ]
这个函数最好的地方之一是它是同步工作的,所以它与 for-loop 很相似并且很容易理解。
请记住,您可以在回调的 $(this) 部分打印出每个的 sub-elements。这在您需要确定需要作为标签放置的特定内容的情况下特别有用。例如:
$('span.pitcher-name').each(function(i, element){
//Return the entire element.
var pitcherNameElement = $(this);
//Prints all of the element's properties.
console.log(pitcherNameElement);
});
现在,为了检索更抽象的事物,例如都在同一 table 行中的一组项目,事情会变得稍微复杂一些。为此,我们需要在 table 行上使用 $().each 函数,然后检查每个 child 的 class 是否匹配。这样,我们就可以保持相同的索引。
$('tr').each(function(i, element){
//get all children of a table row
var children = $(this)['0'].children;
//this array will hold the matchup data
var matchupArr = [];
//class to extract
var statClass = 'stat-stat-width';
//for loop-ing the children
for(var myInt=0; myInt<children.length; myInt++){
//the next element of this child
var next = children[myInt].next;
//sometimes next is undefined
if(next != undefined){
//get the html attribs of the next element
var attribs = next.attribs;
//sometimes the next element has no attribs
if(attribs != undefined){
//class of the next element
var myClass = attribs.class;
//if the next element's class if the one we want
if(myClass == statClass){
//push it to our matchup array
matchupArr.push(next.children[0].data);
};
};
};
};
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the matchup property of our object with our array.
if(matchupArr.length >0){
someObjArr[i].matchups = matchupArr;
};
});
这有点乱七八糟,但它展示了基本概念。允许您在 parent P 内对所有 children C 执行回调的方法将是对库的一个很好的补充。但是,唉,我们生活在一个不完美的世界里。
祝你好运,刮痧快乐!
Cheerio 和 js 的新手。我正在尝试将所有投手的名字及其相关统计数据刮到 JSON object 中,如下所示:
var pitchers = {
name: 'Just Verlander',
era: 6.62
etc...
etc...
}
这是我要抓取的html:
<tr class="">
<td class="stat-name-width"><img src="../../style/assets/img/mlb/team-logos/tigers.png" height="20"/>
<span class="pitcher-name">Justin Verlander</span>
<div class="fantasy-blue inline fantasy-data pitcher-salary-fd">,100</div>
<small class="text-muted pitches">(R)</small>
<small class="text-muted matchup">(@ BOS)</small></td>
<td class="stat-stat-width fantasy-blue fantasy-points">
<td class="stat-stat-width">0-3</td>
<td class="stat-stat-width">6.62</td>
<td class="stat-stat-width">1.50</td>
<td class="stat-stat-width">5.82</td>
<td class="stat-stat-width">3.18</td>
<td class="stat-stat-width">2.12</td>
<td class="stat-stat-width">5.67</td>
<td class="stat-stat-width">1.03x</td>
<td class="stat-stat-width">0.96x</td>
<td class="stat-stat-width">1.09x</td>
<td class="stat-stat-width">0.90x</td>
</tr>
同一页面上大约有 30 个具有相同结构的投手。
这是我目前的情况:
test = $('span.pitcher-name').text();给我所有投手的名字,不只是一个。
显然我什至不接近...我无法弄清楚如何将投手姓名的 children 关联到 javascript object.. . 非常感谢任何帮助!
你见过the documentation吗?如果你往下看,有大量关于如何遍历站点元素的示例。
例如:
$('#span.pitcher-name').next()
//{['<small class="text-muted pitches">(R)</small>']}
看起来你想要的是 $().each() 函数。
使用此函数,您可以遍历标记的每个实例并执行回调函数,如下所示:
var someObjArr = [];
$('span.pitcher-name').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the name property of our object with the text value.
someObjArr[i].name = text;
});
$('div.pitcher-salary-fd').each(function(i, element){
//Get the text from cheerio.
var text = $(this).text();
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the salary property of our object with the text value.
someObjArr[i].salary = text;
});
console.log(someObjArr); //[ { name: 'Justin Verlander', salary: ',100' } ]
这个函数最好的地方之一是它是同步工作的,所以它与 for-loop 很相似并且很容易理解。
请记住,您可以在回调的 $(this) 部分打印出每个的 sub-elements。这在您需要确定需要作为标签放置的特定内容的情况下特别有用。例如:
$('span.pitcher-name').each(function(i, element){
//Return the entire element.
var pitcherNameElement = $(this);
//Prints all of the element's properties.
console.log(pitcherNameElement);
});
现在,为了检索更抽象的事物,例如都在同一 table 行中的一组项目,事情会变得稍微复杂一些。为此,我们需要在 table 行上使用 $().each 函数,然后检查每个 child 的 class 是否匹配。这样,我们就可以保持相同的索引。
$('tr').each(function(i, element){
//get all children of a table row
var children = $(this)['0'].children;
//this array will hold the matchup data
var matchupArr = [];
//class to extract
var statClass = 'stat-stat-width';
//for loop-ing the children
for(var myInt=0; myInt<children.length; myInt++){
//the next element of this child
var next = children[myInt].next;
//sometimes next is undefined
if(next != undefined){
//get the html attribs of the next element
var attribs = next.attribs;
//sometimes the next element has no attribs
if(attribs != undefined){
//class of the next element
var myClass = attribs.class;
//if the next element's class if the one we want
if(myClass == statClass){
//push it to our matchup array
matchupArr.push(next.children[0].data);
};
};
};
};
//if undefined, create the object inside of our array.
if(someObjArr[i] == undefined){
someObjArr[i] = {};
};
//Update the matchup property of our object with our array.
if(matchupArr.length >0){
someObjArr[i].matchups = matchupArr;
};
});
这有点乱七八糟,但它展示了基本概念。允许您在 parent P 内对所有 children C 执行回调的方法将是对库的一个很好的补充。但是,唉,我们生活在一个不完美的世界里。
祝你好运,刮痧快乐!