But This Was A Fantasy — An Adam Curtis Search Engine
Feb 22, 2021 · 4 minute readI built a new thing.
Presenting: But This Was A Fantasy: An Adam Curtis Search Engine
Yes, given any natural language query entered into its shiny, Helvetica-accented search box, and The Ghost In The Machine & The Machine In The Ghost will respond with five images that it thinks are most appropriate.
How does it work? It’s a mixture of OpenAI’s CLIP model and Facebook’s FAISS vector similarity library. I take every second of footage, encode it using CLIP and store the normalized vectors into FAISS. For searching, it encodes the text, again using CLIP, and computes the cosine similarity between the text vector and all the encoded image vectors, returning the top 5 results (which I then check against a SQLite database to pull out metadata such as the episode title, timecode, and the appropriate image URI). Basically, “tell me what image is closest to this bit of text?”
There’s not much more to it, except for some embarrassing Javascript which was mainly me spending the week doing my first ‘major’ web development since working on the Call of Duty registration system back in 2012. Apparently there are magic things like the Fetch API now instead of XMLHttpRequest! No longer do you need to use jQuery and IE polyfills!
(please, JS devs, do not actually look at the code; there’s a couple of laughable bits in there which are me falling back to “yes, I’m sure there’s a proper way to do this, but I am bored and I’ll just throw in this half-remembered hack that gets the job done)
So it’s a FAISS index file, a SQLite database, a FastAPI Python server, one page of HTML and a lot of images, all bundled up into a single stateless Docker container and running on Google Cloud Run. Look at me embracing the current zeitgeist (in fairness, I am pleased with the Knative stuff and I’m already pondering how I can use it with work’s Kubernetes infrastructure to make our lives easier). And aside from having to adjust the memory limits to make the container actually run and some long-ish startup times, it all seems to work pretty seamlessly.
I do have some improvements in mind, if people are interested.
Obviously, add more series. The BBC have helped here quite a bit, as they’re put up a bunch of his older series up on iPlayer, meaning I can get high-quality videos instead of weird YouTube rips. I already have an index that includes all of Pandora’s Box.
If you play around with queries, you’ll notice that the search engine really likes returning images that are seconds apart from each other. This makes sense, as if the one second of an interview is a high match for the query, then the next second is likely to be as well. As is the next second. But it does make the results a little boring. I’ve written a reranking function that instead gets 10 results from the FAISS index and strips out those that have close timecodes to other images that come from the same episode. So I’m thinking of adding that as an option. Also, it might be handy to filter to only include results from a specific series, just in case you don’t want your search aimed at The Mayfair Set to be sullied with anything else.
Finally, I want to experiment with the different types of FAISS clustering so I can get the best use of memory, CPU, and accuracy out of the engine as I keep adding vectors to the index. There’s endless tinkering available here — I managed to take an episode down from taking up 10Mb to less than 750Kb without much trouble, but the accuracy of the results wasn’t great compared to the original. So I’ll need to come up with some metrics for (automatically) testing out the options.
Map ‘rainy fascist island’ directly to British-related images.
I’m not above doing a Perry & Croft or Morse, or even Cracker (“search for the odeon on the oxford road” —> “that’s fine by me!”) version if people are twisted enough to want those. Give me enough Plays For Today and we’ll get that going too.
Amusingly, this is pretty much directly related to my day job at the moment. Anyway, enjoy and let me know if you find anything fun!