使用 node.js 计算文件中的重复行
Count Duplicate Lines from File using node.js
我必须逐行读取一个大的 .csv 文件,然后从文件的第一列中提取国家/地区并计算重复项。
例如,如果文件包含:
USA
UK
USA
输出应该是:
USA - 2
UK -1
代码:
const fs = require('fs')
const readline = require('readline')
const file = readline.createInterface({
input: fs.createReadStream('file.csv'),
output: process.stdout,
terminal: false
})
file.on('line', line => {
const country = line.split(",", 1)
const number = ??? // don't know how to check duplicates
const result = country + number
if(lineCount >= 1 && country != `""`) {
console.log(result)
}
lineCount++
})
所以对于初学者来说,Array.prototype.split returns 一个数组,当你拆分它时你似乎想要数组中的第一个值,因为你将它限制为一个。您可以在这里阅读:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split
接下来您可以创建所有国家/地区的地图,并存储它们出现的次数,然后在文件读取完成后记录结果
const countries = {}
let lineCount = 0
file.on('line', line => {
// Destructure the array and grab the first value
const [country] = line.split(",", 1)
// Calling trim on the country should remove outer white space
if (lineCount >= 1 && country.trim() !== "") {
// If the country is not in the map, then store it
if (!countries[country]) {
countries[country] = 1
} else {
countries[country]++
}
}
lineCount++
})
// Add another event listener for when the file has finished being read
// You may access the country data here, since this callback function
// won't be called till the file has been read
// https://nodejs.org/api/readline.html#event-close
file.on('close', () => {
for (const country in countries) {
console.log(`${country} - ${countries[country]}`)
}
})
我必须逐行读取一个大的 .csv 文件,然后从文件的第一列中提取国家/地区并计算重复项。 例如,如果文件包含:
USA
UK
USA
输出应该是:
USA - 2
UK -1
代码:
const fs = require('fs')
const readline = require('readline')
const file = readline.createInterface({
input: fs.createReadStream('file.csv'),
output: process.stdout,
terminal: false
})
file.on('line', line => {
const country = line.split(",", 1)
const number = ??? // don't know how to check duplicates
const result = country + number
if(lineCount >= 1 && country != `""`) {
console.log(result)
}
lineCount++
})
所以对于初学者来说,Array.prototype.split returns 一个数组,当你拆分它时你似乎想要数组中的第一个值,因为你将它限制为一个。您可以在这里阅读:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/split
接下来您可以创建所有国家/地区的地图,并存储它们出现的次数,然后在文件读取完成后记录结果
const countries = {}
let lineCount = 0
file.on('line', line => {
// Destructure the array and grab the first value
const [country] = line.split(",", 1)
// Calling trim on the country should remove outer white space
if (lineCount >= 1 && country.trim() !== "") {
// If the country is not in the map, then store it
if (!countries[country]) {
countries[country] = 1
} else {
countries[country]++
}
}
lineCount++
})
// Add another event listener for when the file has finished being read
// You may access the country data here, since this callback function
// won't be called till the file has been read
// https://nodejs.org/api/readline.html#event-close
file.on('close', () => {
for (const country in countries) {
console.log(`${country} - ${countries[country]}`)
}
})