But This Was A Fantasy — An Adam Curtis Search Engine

I built a new thing.

Presenting: But This Was A Fantasy: An Adam Curtis Search Engine

Yes, given any natural language query entered into its shiny, Helvetica-accented search box, and The Ghost In The Machine & The Machine In The Ghost will respond with five images that it thinks are most appropriate.

How does it work? It’s a mixture of OpenAI’s CLIP model and Facebook’s FAISS vector similarity library. I take every second of footage, encode it using CLIP and store the normalized vectors into FAISS. For searching, it encodes the text, again using CLIP, and computes the cosine similarity between the text vector and all the encoded image vectors, returning the top 5 results (which I then check against a SQLite database to pull out metadata such as the episode title, timecode, and the appropriate image URI). Basically, “tell me what image is closest to this bit of text?”

There’s not much more to it, except for some embarrassing Javascript which was mainly me spending the week doing my first ‘major’ web development since working on the Call of Duty registration system back in 2012. Apparently there are magic things like the Fetch API now instead of XMLHttpRequest! No longer do you need to use jQuery and IE polyfills!

(please, JS devs, do not actually look at the code; there’s a couple of laughable bits in there which are me falling back to “yes, I’m sure there’s a proper way to do this, but I am bored and I’ll just throw in this half-remembered hack that gets the job done)

So it’s a FAISS index file, a SQLite database, a FastAPI Python server, one page of HTML and a lot of images, all bundled up into a single stateless Docker container and running on Google Cloud Run. Look at me embracing the current zeitgeist (in fairness, I am pleased with the Knative stuff and I’m already pondering how I can use it with work’s Kubernetes infrastructure to make our lives easier). And aside from having to adjust the memory limits to make the container actually run and some long-ish startup times, it all seems to work pretty seamlessly.

I do have some improvements in mind, if people are interested.

Amusingly, this is pretty much directly related to my day job at the moment. Anyway, enjoy and let me know if you find anything fun!