Parsing markdown to get Table of Content

A Table of Content (TOC) is basically a list of the headings and subheadings in the content. It acts like a preview or outline so readers can quickly scan the topics covered and go directly to the parts they want to read.

Instead of manually copying and pasting all the headings to make a TOC, you can automatically generate it using some code.

utils.js
const getTableOfConent = (contentString: string) => {
  const headingRegex = /(#{1,6})\s+(.*)/gm;
  const headings = [];
  let match;
  // loop through the content string and extract headings using the regex
  while ((match = headingRegex.exec(contentString)) !== null) {
    const level = match[1].length; // Number of '#' symbols
    const text = match[2].trim(); // Heading text
    headings.push({ level, text });
  }
  return headings;
}

Here's how it works

  • contentString: is markdown string which we pass to the function.
  • Heading regex pattern /(#{1,6})\s+(.*)/gm, match with # symbols, followed by a space and then any text
  • Inside loop:
    • The match variable is updated with the next match found by the regular expression.
    • The level heading is determined by length of # symbols.
    • The text of heading is extracted by trimming text after # symbols.

The final result is pushed as array of object to headings array.

Output

Let’s see output for blog post How I'm Writing CSS in 2024 by leerob.

toc.json
[
  {
    "level": 2,
    "text": "Design Constraints"
  },
  {
    "level": 3,
    "text": "User Experience"
  },
  {
    "level": 3,
    "text": "Developer Experience"
  },
  {
    "level": 2,
    "text": "CSS in 2024"
  },
  {
    "level": 3,
    "text": "Build Steps"
  },
  {
    "level": 3,
    "text": "Compilation"
  },
  {
    "level": 3,
    "text": "Streaming CSS"
  },
  {
    "level": 2,
    "text": "My Recommendations"
  },
  {
    "level": 3,
    "text": "CSS Modules"
  },
  {
    "level": 3,
    "text": "Tailwind CSS"
  },
  {
    "level": 3,
    "text": "StyleX"
  },
  {
    "level": 2,
    "text": "Conclusion"
  }
]

Each entry in the TOC corresponds to the respective heading in the blog post, and the links are anchored to those headings in the post. In Next.js, you might integrate this function within a script that processes your markdown files before generating the static HTML. For other frameworks, you can similarly integrate this function within your build process or build script.