将 html 和 request/cheerio 拼凑成 js object

Scrape html with request/cheerio into js object

Cheerio 和 js 的新手。我正在尝试将所有投手的名字及其相关统计数据刮到 JSON object 中,如下所示:

var pitchers = {
    name: 'Just Verlander',
    era: 6.62
    etc...
    etc...
}

这是我要抓取的html:

<tr class="">
<td class="stat-name-width"><img src="../../style/assets/img/mlb/team-logos/tigers.png" height="20"/>  
<span class="pitcher-name">Justin Verlander</span> 
<div class="fantasy-blue inline fantasy-data pitcher-salary-fd">,100</div>   
<small class="text-muted pitches">(R)</small> 
<small class="text-muted matchup">(@ BOS)</small></td>
        <td class="stat-stat-width fantasy-blue fantasy-points">
        <td class="stat-stat-width">0-3</td>
        <td class="stat-stat-width">6.62</td>
        <td class="stat-stat-width">1.50</td>
        <td class="stat-stat-width">5.82</td>
        <td class="stat-stat-width">3.18</td>
        <td class="stat-stat-width">2.12</td>
        <td class="stat-stat-width">5.67</td>
        <td class="stat-stat-width">1.03x</td>
        <td class="stat-stat-width">0.96x</td>
        <td class="stat-stat-width">1.09x</td>
        <td class="stat-stat-width">0.90x</td>
</tr> 

同一页面上大约有 30 个具有相同结构的投手。

这是我目前的情况:

test = $('span.pitcher-name').text();给我所有投手的名字,不只是一个。

显然我什至不接近...我无法弄清楚如何将投手姓名的 children 关联到 javascript object.. . 非常感谢任何帮助!

你见过the documentation吗?如果你往下看,有大量关于如何遍历站点元素的示例。

例如:

$('#span.pitcher-name').next() //{['<small class="text-muted pitches">(R)</small>']}

看起来你想要的是 $().each() 函数。

使用此函数,您可以遍历标记的每个实例并执行回调函数,如下所示:

var someObjArr = [];

$('span.pitcher-name').each(function(i, element){

    //Get the text from cheerio.
    var text = $(this).text();

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the name property of our object with the text value.
    someObjArr[i].name = text;
}); 

$('div.pitcher-salary-fd').each(function(i, element){

    //Get the text from cheerio.
    var text = $(this).text();

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the salary property of our object with the text value.
    someObjArr[i].salary = text;
}); 

console.log(someObjArr); //[ { name: 'Justin Verlander', salary: ',100' } ]

这个函数最好的地方之一是它是同步工作的,所以它与 for-loop 很相似并且很容易理解。

请记住,您可以在回调的 $(this) 部分打印出每个的 sub-elements。这在您需要确定需要作为标签放置的特定内容的情况下特别有用。例如:

$('span.pitcher-name').each(function(i, element){

    //Return the entire element.
    var pitcherNameElement = $(this);

    //Prints all of the element's properties.
    console.log(pitcherNameElement); 

});

现在,为了检索更抽象的事物,例如都在同一 table 行中的一组项目,事情会变得稍微复杂一些。为此,我们需要在 table 行上使用 $().each 函数,然后检查每个 child 的 class 是否匹配。这样,我们就可以保持相同的索引。

$('tr').each(function(i, element){

    //get all children of a table row
    var children = $(this)['0'].children;

    //this array will hold the matchup data
    var matchupArr = [];

    //class to extract
    var statClass = 'stat-stat-width';

    //for loop-ing the children
    for(var myInt=0; myInt<children.length; myInt++){

        //the next element of this child
        var next = children[myInt].next;

        //sometimes next is undefined
        if(next != undefined){

            //get the html attribs of the next element
            var attribs = next.attribs;

            //sometimes the next element has no attribs
            if(attribs != undefined){

                //class of the next element
                var myClass = attribs.class;

                //if the next element's class if the one we want
                if(myClass == statClass){

                    //push it to our matchup array
                    matchupArr.push(next.children[0].data);
                };
            };
        };
    };

    //if undefined, create the object inside of our array.
    if(someObjArr[i] == undefined){

        someObjArr[i] = {};
    };

    //Update the matchup property of our object with our array.
    if(matchupArr.length >0){
        someObjArr[i].matchups = matchupArr;
    };
});

这有点乱七八糟,但它展示了基本概念。允许您在 parent P 内对所有 children C 执行回调的方法将是对库的一个很好的补充。但是,唉,我们生活在一个不完美的世界里。

祝你好运,刮痧快乐!