从数组中删除重复项,但注释剩余行还有其他行

Remove Duplicates from array, but annotate remaining row that there were others

我有一份每天收到的报纸文章列表。因为许多报纸都是大型连锁店的一部分,所以我不想看到同一篇文章的每个版本,但是我们确实想看看它在多少其他媒体中被刊登了。

所以..这是我想看到的

第1条 资料来源 - National Post、另见西雅图火焰、纽约时报

第2条 来源 - 华盛顿 Post

我使用这段代码成功地做到了这一点..但它看起来很笨重

样本JSON

    var data = {
        "articles": [
                    {
                        "id": "1",
                        "title": "xxxx'",
                        "body": "<p>Body goes here",
                        "publication": {
                            "id": 1,
                            "name": "National Post"
                        },
                        "articleUrl": "http://www.foo.com/1"
                    },
                    {
                        "id": "2",
                        "title": "yyyy'",
                        "body": "<p>Body goes here",
                        "publication": {
                            "id": 1,
                            "name": "Washington Post"
                        },
                        "articleUrl": "http://www.foo.com/2"
                    },
                    {
                        "id": "3",
                        "title": "xxxx'",
                        "body": "<p>Body goes here",
                        "publication": {
                            "id": 1,
                            "name": "Seattle Blaze"
                        },
                        "articleUrl": "http://www.foo.com/3"
                    },
                    {
                        "id": "4",
                        "title": "xxxx'",
                        "body": "<p>Body goes here",
                        "publication": {
                            "id": 1,
                            "name": "New York Times"
                        },
                        "articleUrl": "http://www.foo.com/4"
                    }
                ]
            }


js.utils.RemoveDups = function RemoveDups(json) {

var articles = new Array();
for (var i = 0; i < json.length; i++) {
    var seen = false;
    for (var j = 0; j != articles.length; ++j) {

        if (json[i] != null && articles[j] != null) {
            if (articles[j].title == json[i].title) {
                seen = true;

                articles[j].publication.name = articles[j].publication.name + ", <a href='" + json[i].articleUrl + "' target='_blank'>" + json[i].publication.name + '</a>';
            }
        }
    }
    if (!seen) articles.push(json[i]);
}
return articles;
};

我现在正在处理这段代码,它更紧凑而且可能更快,但是因为我没有来自

的完整对象
dataArr = data.map(function (item) { return item.title });

我不能return我要删除的当前出版物名称

//Clean the Data
if (json != null) {

    var data = json.articles,
    dataArr = data.map(function (item) { return item.title });

    //Remove Duplicates
    dataArr.some(function (item, index) {
        var isDuplicate = dataArr.indexOf(item, index + 1) !== -1;
        if (isDuplicate) {
            data[index].publication.name = data[index].publication.name + ',' + item[index].publication.name //<- dont have full object
            data = removeDuplicate(data, item);
        }
    });
 function removeDuplicate(data, title) {
  $.each(data, function (index) {
    if (this.title == title) {
        data.splice(index, 1);
        return false;
    }
  });
 return data;
 }

:奖金问题......我不完全确定机器使用什么参数来确定要保留哪个副本以及要删除哪个副本......理想情况下,我想保留项目对象所在的版本(item.wordCount) wordCount 最高...

首先不要使用数组,使用一个object,其键是文章标题。

js.utils.RemoveDups = function RemoveDups(json) {
    var articles = {};
    json.articles.forEach(function(a) {
        if (a.title in articles) {
            articles[a.title].publication.name += ', ' + a.publication.name;
        } else {
            articles[a.title] = a;
        }
    });
    return articles;
}

如果您需要将结果转回数组,请将 return articles; 替换为:

    return Object.keys(articles).map(function(title) {
        return articles[title];
    });