Designing Blog Infrastructure, Part 1

Blog > Designing Blog Infrastructure, Part 1

Designing Blog Infrastructure, Part 1

Explorations into runtime of building a static generated blog

December 26, 2022

2,587 words, ~ 13 min read

code

I started learning NEXT.js with an understanding of React, React Native, and Typescript. I did the tutorial of building a basic blog app to learn the fundamentals, then started building myself. This blog is largely inspired by this tutorial, with some minor changes.

Effectively, this is a custom content management system. Usually, these systems have a frontend UI (a website) where users can upload or edit information, which is then utilized to generate the underlying content. In this case, it's more like utilities to take information from a _posts directory and generate dynamic information on pages throughout the site, triggered on build time.

Along the way, I will be mentioning both runtime/time complexity and design consideration of functions. Runtime refers to the asymptotic performance in relation to the input as the input scales to infinity; it is an application of limits from calculus. It provides an understanding of growth, with constants dropped to signify the order of growth is used to classify functions. In many large-scale applications, sizes can be so large that the smaller order functions will ultimately be faster even if there is higher overhead.

In this particular case, the number of posts remains relatively small, so it's a minimal concern. The focus remains on creating a system that is easily maintainable.

Build Time
Functions

Build Time

Similar to compiled languages, NEXT.js (the framework used to build this site) provides the option to render webpages on the server side. As webpages became more intricate, the time it takes for browsers to load sites increased; in an effort to decrease them, the server compiles all code to static pages when possible, then sends them over to the client.

All of the posts are written in Markdown format, including this post. It makes it very simple to focus on the content, using simpler syntax to indicate links and headings. There is also front matter, metadata that can be used and later accessed with the help of libraries.

These posts are stored in a _posts directory. At build time, NEXT.js has the concept of getStaticProps - a function invoked before the rendering of any page to get necessary data to render a page. This triggers a few calls:

Node, the JavaScript runtime, is used to read all of the files in the _posts directory to get the metadata.
The metadata is then sorted by date to render on the homepage of the blog and to get the most recent post for the homepage.
When going to an individual blog post, that page is a templatized page; it has its own getStaticProps to query the content of a given file using the file name. This content is then passed through the remark library, accounting for math and code markup, link handling, styles, etc.
This code is injected into the respective blog page, displayed for the user.

All of this happens before the user ever navigates to the page, on the server side. It feels lightning fast, with excellent load speeds, despite taking a non-trivial amount of time to piece together.

Picking a Framework

Blogs can be written in many frameworks and libraries. Among the most popular ones are Blogger, Medium, Gatsby, 11ty, and Jekyll. There are many more website builders, but from the (minimal) research that I did, these were the ones that I was mainly considering.

My personal site was first built in HTML and CSS, back in 2020 when I was first learning to code. After learning React, I decided to pick up Next.js, mainly for SEO reasons. I rebuilt my personal site, using a lot of similar styles but writing code in a much more maintainable way. Gone were the days of having to ctrl+f for the footer to make one small change, replaced by a Footer component that lent itself to easy updates and styling.

Initially, I wanted to use Jekyll. I was heavily inspired by many blogs using the Minima theme, specifically the dark variation.

The major issue was getting the Ruby gems to work right. I already had the gems from running learn.mdb.dev, which I didn't want to mess around with. I haven't had to use Ruby for anything, and I didn't see the need to figure out having multiple versions and whatnot on Windows for a site. The other issue I didn't want to figure out was routing. If I used Jekyll, I would most likely have to set up a redirect to a different domain, such as blog.aniruthn.com, similar to what I do for my projects (pathless.aniruthn.com and tbase.aniruthn.com).

If I just used Next.js for my blog, then this problem disappears. I just integrate it into the current website. The initial tutorial that I had done to first learn Next.js was literally building a blog. I looked over it, realized that it would work well, and went to work expanding infrastructure.

Other static-site generators, like Gatsby and 11ty, were briefly considered. I wanted to focus on making content and writing; the whole reason I even looked into making a blog was to take the thoughts I had and put them on paper (in a metaphorical sense). I didn't want to take the time to learn these other frameworks, because Next.js was good enough. Even if these were nicer or faster, I didn't have a problem with Next.js, so I wasn't looking to switch.

Sitemap Generator

Sitemaps are a crucial part of any site. That is, if you care about the site being picked up by search engines and their crawlers. They contain information about the pages present in a given site. This site has both XML and txt sitemaps.

Note that this site does not generate nearly enough traffic that it's worth optimizing every bit of SEO nor do I have the profit incentive to do so either. Additionally, the pages on this site do not change often enough that a sitemap needs to be dynamically generated. The Makefile I use to run the site has an empty target in case I change my mind down the line.

Scripts

The sitemap generator previously mentioned is an example of a script. There are many always used, such as for pushing code git push, by Vercel to deploy the website, and yarn run dev to run the site locally.

A Makefile is used to add more scripts that aid in local development, part of GNU Make. It's fairly straightforward; there are targets that specify commands to be executed on the command line. Often, these are useful to chain commands together. Note that aliases may also be used, but those are generally for system-wide commands and not project-specific utilities.

The only interesting note from the Makefile is opening links and running the local server. A frequent use case is starting the local server, then navigating to localhost:3000 in a browser. Ideally, it would automatically open. A simple way to do so is the following:

open:
	start chrome http://localhost:3000; \
	yarn run dev

Why is this peculiar? The link is being opened before the command to start the server is run. It's a matter of sequencing. Technically, yarn run dev is an infinite process - similar to a while True loop, it will keep running the server until some external factor removes that capability (such as a user killing the process, the computer crashing, etc.). Thus, if the order was reversed, the link would never be opened. In the sequence above, the link is triggered to open by the CLI, which then hands off the task to Chrome. Chrome opens it, just as yarn run dev has opened the 3000 port on localhost, so the webpage then loads.

Building Iteratively

There's a lot that goes into this system. To make things easier on me, I first made the minimal features necessary to have a working blog - two sample posts, a way to display all posts, and a way to each post. Over time, I began to add more and more features, starting simple with adding a tag system and then going to more complex things to customize the remark rendering. As I was writing posts, I thought of more things to implement, adding them to the backlog.

Functions

The following functions are written in TypeScript. Complex syntax is not used, so knowing TypeScript is not a precursor to following along, though it may be helpful.

For all runtimes below, $n$ is the number of markdown posts in the _posts directory.

getSortedPostsData

export function getSortedPostsData() {
    // implementation omitted
    // returns an array of post metadata, sorted by date
}

This is implemented quite similar to the tutorial. The grey matter has been slightly changed to support a tag system from each markdown file.

The runtime of this function is $\Theta(nlogn)$ . Note that this is the best case for this function; at minimum, every file within the directory needs to be looked at. Then, the comparison-based sort on the date takes $\Theta(nlogn)$ time - proving this is the minimal runtime.

totalWords and totalTime

const totalWords = allPostsData.reduce(
    (prev, post) => prev + post.readingTime.words,
    0
);

const totalTime = allPostsData.reduce(
    (prev, post) => prev + post.readingTime.minutes,
    0
);

These are variables, not functions, but they utilize reduce. Instead of iterating over every post to find the total number of words and the total time it would take to read, this specifies a callback function and an initial value. It's a fairly elegant way of combining information across a lot of things.

Note that the reduce function can be rewritten as follows:

const initialValue = 0
// using reduce
const sumReduce = (arr) =>
    arr.reduce(
        (prev, curr) => prev + curr,
        initialValue
    );

// using a forEach loop
const sumForEach = (arr) => {
    let sum = initialValue;
    arr.forEach((el) => sum = sum + el);
    return sum;
}

The runtime of this is $\Theta(n)$ , since it calls the callback function which takes constant time for every post.

getMostRecentPostData

export function getMostRecentPostData() {
    return getSortedPostsData()[0];
}

For the main home (index) page, the most recent blog post is shown in the hero section. This is queried live, by getting the same sorted posts data and then retrieving the first element. The metadata (the filename and route) is enough to render this preview.

The runtime of this function is $\Theta(nlogn)$ . The call to getSortedPostsData takes $\Theta(nlogn)$ ; accessing the first element's pointer is constant time.

It is possible to have this function take linear time, i.e., $\Theta(n)$ . There would be a store of a "most recent post", initialized to a null value. If there is a post traversed that is more recent, the store gets updated. This is the minimal runtime, since every post needs to be looked through to determine the most recent post.

getPreviousPost and getNextPost

export function getPreviousPost(id: string) {
    const sortedPosts = getSortedPostsData();
    for (let index = 0; index < sortedPosts.length - 1; index++) {
        if (sortedPosts[index].id === id) return sortedPosts[index + 1];
    }
    return null;
}

export function getNextPost(id: string) {
    const sortedPosts = getSortedPostsData();
    for (let index = 1; index < sortedPosts.length; index++) {
        if (sortedPosts[index].id === id) return sortedPosts[index - 1];
    }
    return null;
}

These functions are used to display the previous and next post links at the bottom of each blog post. These are generated dynamically using the information of the given post (the post id, from the url), to avoid hardcoding.

The runtime of both functions is $\Theta(nlogn)$ . Note that the $\Theta$ bound can be used as a precise measure over the $\mathcal{O}$ bound. The for loops themselves can return on the first iteration, or not return at all in which case there are $n-1$ iterations before the return of null - which is $\mathcal{O}$ (n). However, the call to getSortedPostsData will take $\Theta(nlogn)$ so the overall runtime of these functions is still $\Theta(nlogn)$ .

Note that these functions are invoked for every page. The results of getSortedPostsData could be cached, since they won't be changed unless a new post is added which triggers the build runtime again. That way, the first iteration takes $\Theta(nlogn)$ but all subsequent lookups take $\Theta(1)$ time. If the results of the function are cached, and this function is called on every single file, then the overall runtime for the entire procedure would look something like the following sum:

\begin{aligned} nlogn + \sum_{i=1}^{n} {n - i + i} = nlogn + n^2 = \Theta(n^2) \end{aligned}

Amortized over the $n$ pages, this comes out to $\Theta(n)$ per page.

If the results of the function are not cached, and this function is called on every single file, then the overall runtime for the entire procedure would be $\Theta(n^2logn)$ , since there are $n$ files, 2 function calls for each file, and $\Theta(nlogn)$ runtime for each call.

showNotFound

The following is an excerpt from a function which determines which blog posts to show given a selection of tags. If no tags are chosen, every post should be shown; if there is a selection, posts containing all of the tags are shown, and if there are no posts with the selection, a message indicating no results is shown.

There are a few ways to implement this. The way that I chose is as follows:

let showNotFound = true;

This is a variable to determine if there are no posts with the selection (to indicate no results found).

<ul>
    {allPostsData.map(
        ({
            id,
            date,
            title,
            tags,
            readingTime,
        }: PostMetadata) => {
            const show = selectedTags.every((el) =>
                tags.includes(el.value)
            );
            // cascades a false value for showNotFound
            showNotFound = !show && showNotFound;
            return show ? (
                <li key={id}>
                    {/* post */}
                </li>
            ) : (
                <></>
            );
        }
    )}
    {showNotFound && (
        <p>No results found with these filters.</p>
    )}
</ul>

This chunk does a few things. It maps over all the posts. Each post contains a set of tags; each of the tags in the selection must be included in the post's set of tags. If so, the show variable is true, showNotFound is set to false (which won't be updated to true ever again, since the false value cascades), resulting in that post being displayed and the no results found message not being displayed. If the tags in the selection are not found in any post, then showNotFound was never set to false, so the no results found message is displayed.

The runtime here is $\mathcal{O}(nt)$ , where $t$ is the number of selected tags. Each post must be checked, and each of the selected tags must be checked for each post. It's not a $\Theta$ bound, however, since if the first selected tag is missing, the post won't show - there's no need to check the remaining tags. Determining if the selected tags are present is a procedure that takes at most the amount of tags there are. This is also the optimal runtime, since every post needs to be checked to determine if it should be rendered.

An alternative method is to avoid using showNotFound by computing the map separately given the list of selected tags. Then, either the map is shown, or if it has zero length (no posts), the no results found message is shown. This second method is a clearer implementation.

So why am I not caching, settling for a slower runtime, or refactoring? Laziness, probably.

Image sources: 1

Found this interesting? Subscribe to get email updates for new posts.

Incomplete Autocomplete: ChatGPT and GitHub Copilot

Short Lived Mac Touchbar

Return to Blog