使用 node.js 计算文件中的重复行

Count Duplicate Lines from File using node.js

我必须逐行读取一个大的 .csv 文件,然后从文件的第一列中提取国家/地区并计算重复项。 例如,如果文件包含:

USA
UK
USA

输出应该是:

USA - 2
UK -1

代码:

const fs = require('fs')
const readline = require('readline')

const file = readline.createInterface({
    input: fs.createReadStream('file.csv'),
    output: process.stdout,
    terminal: false
})

file.on('line', line => {
    const country = line.split(",", 1)
    const number = ??? // don't know how to check duplicates
    const result = country + number

    if(lineCount >= 1 && country != `""`) {
        console.log(result)
    }
    lineCount++
})

所以对于初学者来说,Array.prototype.split returns 一个数组,当你拆分它时你似乎想要数组中的第一个值,因为你将它限制为一个。您可以在这里阅读:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split

接下来您可以创建所有国家/地区的地图,并存储它们出现的次数,然后在文件读取完成后记录结果


const countries = {}
let lineCount = 0
file.on('line', line => {
    // Destructure the array and grab the first value
    const [country] = line.split(",", 1)
    // Calling trim on the country should remove outer white space
    if (lineCount >= 1 && country.trim() !== "") {
        // If the country is not in the map, then store it
        if (!countries[country]) {
            countries[country] = 1
        } else {
            countries[country]++
        }
    }
    lineCount++
})

// Add another event listener for when the file has finished being read
// You may access the country data here, since this callback function
// won't be called till the file has been read
// https://nodejs.org/api/readline.html#event-close
file.on('close', () => {
    for (const country in countries) {
        console.log(`${country} - ${countries[country]}`)
    }
})