拆分字符串同时保留标点符号
Split a string whilst keeping punctuation
我有大约 1500 - 2000 个字符的文本。此文本应拆分为文本块,每个文本块大约有 400 个字符(不必正好是 400 个字符)。然而,它不应该只是每 400 个字符拆分文本,而应该只在有句号的地方拆分文本。所以在不破坏标点符号的情况下,基本上把一个大文本分成几个块。
有什么想法吗?
我们可以尝试减少
我在这里做 500,因为你的文本 < 400
let str = `I have a text with roughly 1500 - 2000 characters. This text should be split into blocks of text with roughly 400 characters each (does not have to be exactly 400 characters). However it should not just split the text every 400 characters, but split the text only at places where there is a full-stop. So basically divide one big text into several chunks without destroying punctuation. I have a text with roughly 1500 - 2000 characters. This text should be split into blocks of text with roughly 400 characters each (does not have to be exactly 400 characters). However it should not just split the text every 400 characters, but split the text only at places where there is a full-stop. So basically divide one big text into several chunks without destroying punctuation. I have a text with roughly 1500 - 2000 characters. This text should be split into blocks of text with roughly 400 characters each (does not have to be exactly 400 characters). However it should not just split the text every 400 characters, but split the text only at places where there is a full-stop. So basically divide one big text into several chunks without destroying punctuation.`
let nextPunct = str.indexOf(".")
const lines = str.split(/\.\s+/)
.reduce((acc, line, i) => {
if (i === 0) acc.push(line)
else if ((acc[acc.length - 1].length + line.length) > 500) acc.push(line)
else acc[acc.length - 1] += ". " + line;
return acc
}, [])
const res = lines.join(".\n")
console.log(res, lines.map(line => line.length))
我有大约 1500 - 2000 个字符的文本。此文本应拆分为文本块,每个文本块大约有 400 个字符(不必正好是 400 个字符)。然而,它不应该只是每 400 个字符拆分文本,而应该只在有句号的地方拆分文本。所以在不破坏标点符号的情况下,基本上把一个大文本分成几个块。
有什么想法吗?
我们可以尝试减少
我在这里做 500,因为你的文本 < 400
let str = `I have a text with roughly 1500 - 2000 characters. This text should be split into blocks of text with roughly 400 characters each (does not have to be exactly 400 characters). However it should not just split the text every 400 characters, but split the text only at places where there is a full-stop. So basically divide one big text into several chunks without destroying punctuation. I have a text with roughly 1500 - 2000 characters. This text should be split into blocks of text with roughly 400 characters each (does not have to be exactly 400 characters). However it should not just split the text every 400 characters, but split the text only at places where there is a full-stop. So basically divide one big text into several chunks without destroying punctuation. I have a text with roughly 1500 - 2000 characters. This text should be split into blocks of text with roughly 400 characters each (does not have to be exactly 400 characters). However it should not just split the text every 400 characters, but split the text only at places where there is a full-stop. So basically divide one big text into several chunks without destroying punctuation.`
let nextPunct = str.indexOf(".")
const lines = str.split(/\.\s+/)
.reduce((acc, line, i) => {
if (i === 0) acc.push(line)
else if ((acc[acc.length - 1].length + line.length) > 500) acc.push(line)
else acc[acc.length - 1] += ". " + line;
return acc
}, [])
const res = lines.join(".\n")
console.log(res, lines.map(line => line.length))