根据多个特定字符从平面字符串数组创建嵌套 objects
Creating nested objects from a flat array of strings based on a number of specific characters
我有一个大文本,使用正则表达式从中提取了标题字符串。标题以 1-6 个主题标签 (#) 开头。这是输入数组的示例:
const content = [
"#1",
"##1a",
"###1a1",
"###1a2",
"##1b",
"#2",
"#3",
"##3a",
"##3b",
"#4",
];
标题级别(字符串开头的主题标签数)描述了某个标题在章节层次结构中的位置。我想将我的输入解析为一个标题数组 objects,其中包含没有主题标签的标题文本和标题嵌套的章节。上面数组的期望输出是:
export interface Heading {
chapters: Heading[];
text: string;
}
const headings: Heading[] = [
{
text: "1",
chapters: [
{
text: "1a",
chapters: [
{ text: "1a1", chapters: [] },
{ text: "1a2", chapters: [] },
],
},
{ text: "1b", chapters: [] },
],
},
{ text: "2", chapters: [] },
{
text: "3",
chapters: [
{ text: "3a", chapters: [] },
{ text: "3b", chapters: [] },
],
},
{ text: "4", chapters: [] },
];
我尝试编写一个解析字符串的函数,但卡在了如何知道当前字符串属于哪个标题输出的问题上:
export const getHeadings = (content: string[]): Heading[] => {
let headingLevel = 2;
let headingIndex = 0;
const allHeadings = content.reduce((acc, currentHeading) => {
const hashTagsCount = countHastags(currentHeading);
const sanitizedHeading = currentHeading.replace(/#/g, "").trim();
const heading = {
chapters: [],
text: sanitizedHeading,
};
if (hashTagsCount === headingLevel) {
headingIndex = headingIndex + 1;
} else {
headingIndex = 0;
}
headingLevel = hashTagsCount;
if (hashTagsCount === 2) {
acc.push(heading);
} else if (hashTagsCount === 3) {
if (acc.length === 0) {
return acc;
}
if (acc.length === 1) {
acc[acc.length - 1]["chapters"].push(heading);
}
} else if (acc.length === 2) {
acc[acc.length - 1]["chapters"][headingIndex]["chapters"].push(heading);
} else if (acc.length === 3) {
acc[acc.length - 1]["chapters"][headingIndex]["chapters"][headingIndex][
"chapters"
].push(heading);
}
return acc;
}, []);
return allHeadings;
};
虽然这适用于非常简单的情况,但它不可扩展并且具有预定义的标题级别(使用 if 语句)。我怎样才能以级别数(主题标签)无关紧要的方式重写它?
使用基于 reduce
的方法,可以保持 tracing/managing 正确的(嵌套)chapters
数组,其中需要将新的章节项目推入其中。
因此,累加器可以是一个对象,除了 result
数组之外,还具有一个 index/map 用于要跟踪的嵌套级别 chapters
数组。
要减少的 heading
字符串被分解为其基于 '#'
(散列)的 flag
及其文本 content
部分。这是在以下正则表达式的帮助下完成的... /^(?<flag>#+)\s*(?<content>.*?)\s*$/
... which features named capturing groups。哈希值 (flag.length
) 表示当前的嵌套级别。
function traceAndAggregateChapterHierarchy({ chaptersMap = {}, result }, heading) {
const {
flag = '',
content = '',
} = (/^(?<flag>#+)\s*(?<content>.*?)\s*$/)
.exec(heading)
?.groups ?? {};
const nestingLevel = flag.length;
// ensure a valid `heading` format.
if (nestingLevel >= 1) {
let chapters;
if (nestingLevel === 1) {
// reset map.
chaptersMap = {};
// level-1 chapter items need to be pushed into `result`.
chapters = result;
} else {
// create/access the deep nesting level specific `chapters` array.
chapters = (chaptersMap[nestingLevel] ??= []);
}
// create a new chapter item.
const chapterItem = {
text: content || '$$ missing header content $$',
chapters: [] ,
};
// create/reassign the next level's `chapters` array.
chaptersMap[nestingLevel + 1] = chapterItem.chapters;
// push new item into the correct `chapters` array.
chapters.push(chapterItem);
}
return { chaptersMap, result };
}
const content = [
"# The quick brown (1) ",
"## fox jumps (1a)",
"###over (1a1)",
"#### ",
"###the (1a2)",
"## lazy dog (1b)",
"# Foo bar (2)",
"# Baz biz (3)",
"##buzz (3a) ",
"##booz (3b) ",
"# Lorem ipsum (4) ",
"##",
];
const { result: headings } = content
.reduce(traceAndAggregateChapterHierarchy, { result: [] });
console.log({ content, headings });
.as-console-wrapper { min-height: 100%!important; top: 0; }
没有可变状态的简短解决方案:)
通过递归删除第一个 #
并对标题进行分组。
const content = [
"#1",
"##1a",
"###1a1",
"###1a2",
"##1b",
"#2",
"#3",
"##3a",
"##3b",
"#4",
];
const getNesting = (arr) =>
arr
.map((str) => str.slice(1)) // remove first #
.reduce(
(acc, cur) =>
// group heading level
cur.match(/^#/)
? [...acc.slice(0, -1), acc.at(-1).concat(cur)]
: [...acc, [cur]],
[]
)
.map(([text, ...subh]) => ({
// recursive call
text,
chapters: !!subh ? getNesting(subh) : [],
}));
console.log(JSON.stringify(getNesting(content)));
我有一个大文本,使用正则表达式从中提取了标题字符串。标题以 1-6 个主题标签 (#) 开头。这是输入数组的示例:
const content = [
"#1",
"##1a",
"###1a1",
"###1a2",
"##1b",
"#2",
"#3",
"##3a",
"##3b",
"#4",
];
标题级别(字符串开头的主题标签数)描述了某个标题在章节层次结构中的位置。我想将我的输入解析为一个标题数组 objects,其中包含没有主题标签的标题文本和标题嵌套的章节。上面数组的期望输出是:
export interface Heading {
chapters: Heading[];
text: string;
}
const headings: Heading[] = [
{
text: "1",
chapters: [
{
text: "1a",
chapters: [
{ text: "1a1", chapters: [] },
{ text: "1a2", chapters: [] },
],
},
{ text: "1b", chapters: [] },
],
},
{ text: "2", chapters: [] },
{
text: "3",
chapters: [
{ text: "3a", chapters: [] },
{ text: "3b", chapters: [] },
],
},
{ text: "4", chapters: [] },
];
我尝试编写一个解析字符串的函数,但卡在了如何知道当前字符串属于哪个标题输出的问题上:
export const getHeadings = (content: string[]): Heading[] => {
let headingLevel = 2;
let headingIndex = 0;
const allHeadings = content.reduce((acc, currentHeading) => {
const hashTagsCount = countHastags(currentHeading);
const sanitizedHeading = currentHeading.replace(/#/g, "").trim();
const heading = {
chapters: [],
text: sanitizedHeading,
};
if (hashTagsCount === headingLevel) {
headingIndex = headingIndex + 1;
} else {
headingIndex = 0;
}
headingLevel = hashTagsCount;
if (hashTagsCount === 2) {
acc.push(heading);
} else if (hashTagsCount === 3) {
if (acc.length === 0) {
return acc;
}
if (acc.length === 1) {
acc[acc.length - 1]["chapters"].push(heading);
}
} else if (acc.length === 2) {
acc[acc.length - 1]["chapters"][headingIndex]["chapters"].push(heading);
} else if (acc.length === 3) {
acc[acc.length - 1]["chapters"][headingIndex]["chapters"][headingIndex][
"chapters"
].push(heading);
}
return acc;
}, []);
return allHeadings;
};
虽然这适用于非常简单的情况,但它不可扩展并且具有预定义的标题级别(使用 if 语句)。我怎样才能以级别数(主题标签)无关紧要的方式重写它?
使用基于 reduce
的方法,可以保持 tracing/managing 正确的(嵌套)chapters
数组,其中需要将新的章节项目推入其中。
因此,累加器可以是一个对象,除了 result
数组之外,还具有一个 index/map 用于要跟踪的嵌套级别 chapters
数组。
要减少的 heading
字符串被分解为其基于 '#'
(散列)的 flag
及其文本 content
部分。这是在以下正则表达式的帮助下完成的... /^(?<flag>#+)\s*(?<content>.*?)\s*$/
... which features named capturing groups。哈希值 (flag.length
) 表示当前的嵌套级别。
function traceAndAggregateChapterHierarchy({ chaptersMap = {}, result }, heading) {
const {
flag = '',
content = '',
} = (/^(?<flag>#+)\s*(?<content>.*?)\s*$/)
.exec(heading)
?.groups ?? {};
const nestingLevel = flag.length;
// ensure a valid `heading` format.
if (nestingLevel >= 1) {
let chapters;
if (nestingLevel === 1) {
// reset map.
chaptersMap = {};
// level-1 chapter items need to be pushed into `result`.
chapters = result;
} else {
// create/access the deep nesting level specific `chapters` array.
chapters = (chaptersMap[nestingLevel] ??= []);
}
// create a new chapter item.
const chapterItem = {
text: content || '$$ missing header content $$',
chapters: [] ,
};
// create/reassign the next level's `chapters` array.
chaptersMap[nestingLevel + 1] = chapterItem.chapters;
// push new item into the correct `chapters` array.
chapters.push(chapterItem);
}
return { chaptersMap, result };
}
const content = [
"# The quick brown (1) ",
"## fox jumps (1a)",
"###over (1a1)",
"#### ",
"###the (1a2)",
"## lazy dog (1b)",
"# Foo bar (2)",
"# Baz biz (3)",
"##buzz (3a) ",
"##booz (3b) ",
"# Lorem ipsum (4) ",
"##",
];
const { result: headings } = content
.reduce(traceAndAggregateChapterHierarchy, { result: [] });
console.log({ content, headings });
.as-console-wrapper { min-height: 100%!important; top: 0; }
没有可变状态的简短解决方案:)
通过递归删除第一个 #
并对标题进行分组。
const content = [
"#1",
"##1a",
"###1a1",
"###1a2",
"##1b",
"#2",
"#3",
"##3a",
"##3b",
"#4",
];
const getNesting = (arr) =>
arr
.map((str) => str.slice(1)) // remove first #
.reduce(
(acc, cur) =>
// group heading level
cur.match(/^#/)
? [...acc.slice(0, -1), acc.at(-1).concat(cur)]
: [...acc, [cur]],
[]
)
.map(([text, ...subh]) => ({
// recursive call
text,
chapters: !!subh ? getNesting(subh) : [],
}));
console.log(JSON.stringify(getNesting(content)));