Create a content filter for static websites using Serverless

Introduction

In case you’re wondering how our back-end works, you’re in luck! In this series of posts, I’m going to walk you through how we built our serverless websites with HUGO, Airtable, GitLab, and Cloudflare Workers.

Our websites, including srvrlss.io, are completely static. That’s right, there is no server and no database!

Instead, we use a static site generator called Hugo. We have written custom templates combined with the Hugo data directory to store information for our site.

The example we use in this article is our website about chatbots and providing a solution so people can easily and quickly learn about what’s available out there. Every time we add a new chatbot provider there are a few scripts we need to populate to create the content regarding the new provider.

The problem with static websites

We quickly came to realize that, while static pages are nice, they miss interactive parts. In our case an efficient product filter, specifically.

Our website Chatwidget.info contains a lot of information and in order to find the right chat provider, it’s essential to filter the list of providers.

We were presented with a few options. We could hack more templates, combining HTML and Go templating language to make a JavaScript-based filter, and while we did start out this way, this proved to be problematic later when we wanted to do more advanced things like combining filter options together.

Why not go for a pure JavaScript solution?

Well, because the entire database needs to be in our source code for such a filter to work, we would give away our competitive edge to anyone looking at the source code. This meant we couldn’t go with the local solution.

So considering what other options we had, the serverless approach started to look a lot better.

Our CI/CD setup

I find that KISS is a term easily said, but not often applied and even more difficult to adhere to! It takes mastery and good understanding to translate the idea of adding a serverless component into a simple setup that we have thusfar.

Our CI/CD pipeline is simple, releasing an update to the website just requires a push to our git’s main branch. We need to make sure we don’t break this setup, allowing updates, releasing new providers and current (template) work without interfering with the way of working.

Our CI is nothing more than a few scripts executed by our git provider GitLab that exports the Hugo public folder to the “pages” feature.

We created scripts to:

Download the database from Airtable
Transform the Airtable download into YAML files
Use the correct Hugo binary
Download our custom Hugo theme from another repo
Create all provider and comparison pages

Our .gitlab-ci.yml looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
image: monachus/hugo
variables:
  AIRTABLE_KEY: $AIRTABLE_KEY

pages:
  script:
  # other irrelevant bits
  - ./scripts/v2_airtable.py
  - ./scripts/prepare_worker.py # See below for source
  - ./scripts/create_comparisons.py
  - mkdir public
  - hugo --minify
artifacts:
  paths:
  - public
only:
- master

API interface to save the day

Let’s talk about the interface and how we can make sure we’re ready to make changes without breaking our static website, should we ever change things.

We start with an API version of course.

Then, for simplicity’s sake, we use query parameters to add/group features we want to filter on.

We considered many options, but using query parameters just seemed the easiest way for what we were trying to achieve.

Here are a few example URLs:

1
2
3
/v1/api?product_categories_audio_chat=1
/v1/api?product_categories_audio_chat=1&pricing_plans_free_plan
/v1/api?pricing_plans_free_plan=1&plugins_wordpress

As you can see, our front-end implementation lacks a final =1 bit if there are more than one active filters. We fixed this in our worker script (see below), so no harm done. It’s not great, but it works!

The Worker code it’s all about

If you made it this far, thank you for reading!

Basically we wrote a small function which puts our entire database in a reverse lookup table, meaning each company name would be seen as a value to every feature the company has. It sounds more complex than it is. - Trust me!

We collect a lot of data about our providers, so you get a pretty large dictionary pretty quickly. At the time of writing we’re close to 300 chat providers that we are aware of.

Each key contains a list of providers, our reverse database looks something like this:

1
2
3
4
5
{
	"supports_images": ["Drift", "Intercom" ...],
	...
	"collects_email": ["Drift", "Crisp"]
}

Now that we have our entire “inverted database”, we can create our worker, which is literally a lookup function with some placeholder code.

We replace our newly created reverse dictionary in the __LOOKUPTABLE__ variable - happens in the python script below.

worker.js source:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
export default {
  async fetch(request, env) {
    return await handleRequest(request)
  }
}

async function handleRequest(request) {
  const lookuptable = __LOOKUPTABLE__
  const { searchParams } = new URL(request.url)
  let providers = {}

  searchParams.forEach((value, item) => {
    if (item in lookuptable) {
      providers[item] = lookuptable[item]
    }
  })

  let commonItems = getCommonValues(providers);

  return new Response(commonItems, {
      headers: {
          'Access-Control-Allow-Origin': '*',
          'Access-Control-Allow-Methods': 'GET,HEAD,POST,OPTIONS',
          'Access-Control-Allow-Headers': 'Origin, X-Requested-With, Content-Type, Accept'

      }
  })
}

function getCommonValues(obj) {
  const keys = Object.keys(obj);
  return keys.reduce((acc, key) => {
    return acc.filter(value => obj[key].includes(value));
  }, obj[keys[0]]);
}

This worker is as simple as we could make it. It just has a few functions.

handleRequest() transforms the URL into query parameters, and then uses those keys to find all providers using getCommonValues(). The result is then returned as comma separated string. We didn’t go with JSON responses because it introduced several problems.

I can’t recall what those problems were at the time, but when we wrote this, we had to go with text-based responses.

Now that we have our reverse table and worker template, we can work on pushing this code to Cloudflare.

Pushing a worker.js to Cloudflare

At the time there was not much documentation on how to push updates to a function programatically. The web interface worked just fine, but that was no option for us. We need to keep the entire setup automated and simple.

With a little help from the Cloudflare Developer Discord (Awesome and very active community btw!), we managed to get the last bits in place to automatically push a new updated script to our endpoint:

Note: This should be easier now that worker deployment is included in Cloudflare’s Wrangler

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

"""
other magic happens here
...
"""

edge_api_worker_source = bytes(edge_api_worker_source.replace("__LOOKUPTABLE__", str_lookuptable), encoding="raw_unicode_escape")

with open('worker.js', 'wb') as f:
    f.write(edge_api_worker_source)


cf_prefix = "https://api.cloudflare.com/client/v4/"
worker_url = "accounts/{}/workers/services/_redacted_/environments/production".format(ACCOUNTID)

headers = {
    "Authorization": "Bearer " + APIKEY,
}

# We push the script + a metadata object with instructions
edge_script = (
    ("worker.js", ('worker.js', open('worker.js', 'r').read(), 'application/javascript+module')),
    ("metadata", ('blob', '{"bindings":[],"main_module":"worker.js"}', 'application/json')))

t = requests.put(cf_prefix + worker_url, headers=headers, files=edge_script)

The good, the bad and the ugly

In retrospect, we’re really happy with this setup and solution. The website feels way more “done” with an interactive element like this.

What really surprised us was the performance of the function. It feels instant to use, and that just shows the magic of using a serverless function on your website.

Pushing the worker to Cloudflare itself didn’t go without hiccups. It took a lot of experimentation and reverse engineering to figure out how the metadata/manifest needed to be pushed to their API. Luckily the developer community was able to help out, but this is where we got stuck for a while.

Concluding words

And that is how we created a really awesome multi-purpose filter to a static website using Hugo, Cloudflare workers, fully integrated in our CI/CD setup and some glue here and there.

If you want to try out our filter on chatwidget.info/providers/, please be our guest! We’re proud of how it works and we’re open to feedback and suggestions!

Thank you for reading!