Building a website in 2024
Aug 3, 2024 · 5 minute readAs mentioned earlier, it’s the 20th anniversary of You Are The Generation That Bought More Shoes And You Get What You Deserve, and to celebrate, I rebuilt the website, running into all sorts of issues that weren’t really a problem in the context of webdev when I built the original version.
The first iteration of the website had a bunch of clips from The Mayfair Set and a bunch of quotes; the page simply cycled through the clips and quotes at random. Which is fine, but I wanted something a little more interesting to celebrate two decades.
Dance, you fuckers, dance1
Firstly, I wanted more clips. That was relatively easy — all of Adam Curtis’ series went up on iPlayer a couple of years ago, so I have nice hi-res (and complete!) copies of those. Instead of cycling at random, though, I decided to return to my love of embeddings.
Using a SigLIP multimodal model, I encoded the lyrics of the song, all the quotes (with some new additions) and a random frame from every five seconds across every single episode of Curtis’s documentaries. Yay, a bunch of embeddings! You know I love them.
Once I had this pile of embeddings, I turned back to the original video for the song. I split it up into 38 different five-second fragments, and then every slot is randomly assigned to either:
- A clip of the original video
- A Curtis clip based on the nearest neighbour lookup from the lyrics at that point in the song compared to the video embeddings, picking a random element from top_k
- A Curtis clip based on the nearest neighbour lookup from the current quote compared to the video embeddings, picking a random element from top_k
Obviously, the easiest thing to do at this point would be to bung all the embeddings in a vector database, but I didn’t want the hassle of having to deal with setting up FAISS or a more complicated vector store. Plus I also wanted it to be pretty fast…and given that the videos, the quotes, and the lyrics are fixed, I just precomputed all the embedding lookups for a top_k of 50. torch.topk
for the win2.
This gave me two very large arrays, which I could have stuck behind a Python API to generate the clips on-demand. But I was feeling like the website should be even more dumb than usual, so I just got Claude 3 Sonnet to generate a bunch of JavaScript and copied the arrays into the HTML page directly. It’s all there, go and peek (and don’t blame me for the terrible JS code. The computer wrote it!).
After that, it was just a matter of dealing with how browsers handle playing audio (when I built the original website, auto-playing was allowed, but that hasn’t been the case for quite some time now), and I also hard-coded the start and end of the song to play from the original video to provide a better ‘playlist’.
My feeling was it was going to be relatively simple to package up and get started on Google Cloud Run. I just had to upload the clips and make a small Docker container for hosting the page (with a tiny FastAPI server to do some mounting and serve the page itself). That was fine, but when I tried to actually get the application running under the proper domain name, everything broke. Mind you, it broke with an obscure Kubernetes error that I have seen a lot in my time, so even though the UI couldn’t tell me what was wrong, I knew instantly.
It seems that Google Cloud Run creates a pod on a random internal Kubernetes cluster and uses the domain name as the pod name. When dealing with sane domain names, that works fine. However, youarethegenerationthatboughtmoreshoesandyougetwhatyoudeserve.com
is 65 characters long. And pod names can only be a maximum of 63 characters. Boo. With a bit more access to the system, I could probably have fixed that, but you don’t get that luxury with Google Cloud Run. I ended up having to use an ugly DNS redirect to the bare Cloud Run URL (which is why you see the domain name change when you visit the site).
And there was more to come. I blithely posted out an announcement on Bluesky, but forgot that bare domains links these days default to HTTPS connections. and for some reason3, I didn’t have an SSL cert. No problem! Let’s Encrypt! Except…imagine an entire day of bouncing around SSL providers before coming to the conclusion that no, none of them were going to accept the domain and its inordinate length. So my launch fizzled and spluttered. Oh well. In the meantime, the non-SSL site is fully operational.
See you back here for the 25th?
-
The first line of my live review of Johnny Boy from 2005, where I attempted to merge Chris Roberts and Paul Morley into one being. It was a short music journalism career, but if I had to pick out two pieces I quite like from that era, it would have to be my two Johnny Boy pieces, one of which even got quoted on advertisements for the album next to Kieron Gillen… ↩︎
-
In general, while I understand the logic of people like Karpathy and Howard saying “just use numpy/torch operations!” for similarity search, you quickly run into a wall as soon as you need to do real searches. Look at Karpathy’s “simple movie review” site for example — for a site like this, filters such as year ranges or categories are just things that you take for granted with a search engine…and while you could certainly build all that up with tensor ops…it is much simpler just to throw everything into QDrant or Fusion (hey, look, I’m back on the party line already!) ↩︎
-
I am wondering if I ever tried it ten years ago in light of my hassles. Probably not, given that I don’t think Let’s Encrypt was around when I originally created the site… ↩︎