Ken Morse

I’ve been living in the US for almost 12 years now, and one of the things I can’t wrap my head around? The United States Postal Office. Let’s take postal redirection. It will cost you almost £40 in the UK to have Royal Mail redirect your post. In this shrine to capitalism and money-grabbing? $1.10 online, or free if you do it at a post office (they only charge you to verify your identity). The equivalent to the First class stamp costs…63¢ versus £1.10. They’ll take your outgoing post from you at the door if you don’t want to find a postbox. I get emails every morning showing me what post is going to be arriving later that day. And then there’s the packaging. I ordered about 12 different sets of packaging (probably close to 100 pieces altogether). Everything from DVD-sized boxes to tubes that look like they’re explicitly designed to send massive Toblerones across the country. And the cost of all this? $0.00, including $0.00 postage and packing.

Honestly, I can understand why the Republicans want to privatize it. You can’t have a publicly-owned company be so useful and able to send post across the country for a flat-rate. It just breaks their minds.

In other news, this week I began my long-term plan of indoctrination. Exposure to The KLF, Kate Bush, New Order, and Kenickie is well underway, plus Maeryn was also treated to the first series of The Fast Show (I’m hoping she’ll start doing the “oooh!” part of Suits You soon). I don’t think I saw the first series when it originally went out; my first memory of watching it is this moment from S02E01, which is where I knew it was going to be something I loved:

Ken Morse on the Rostrum Camera

The additional detail of throwing in an obscure TV gag where it wasn’t really required, but Higson/Whitehouse decided to go the extra mile. Possibly the last truly great BBC sketch show?1

Also, I have to admit that this Ron Manager segment is…well…yes I could hear the vidiprinter sound as he was talking

  1. When watching it back, one thing that comes to mind is: “how much did this all cost? Endless VT effects, tons of location shooting, stunt work, a massive cast, skits that can last only a few moments…you just wouldn’t get the budget for that these days. ↩︎

FrugalGPT: This Big Boy Can Fit So Many LLMs

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

I feel like the timing of this paper is amazing; you get the feeling that the authors watched some of the Tears of The Kingdom trailers, looked at the pile of models they had lying around and just thought “Why don’t we just use Ultrahand on them?”

What we have here, then, is a carefully constructed Heath Robinson machine designed to work around two big issues with calling GPT-3/4 in a query pipeline:

  • OpenAI calls are slow (there’s a fundamental issue about have to make a call to an external API, but even accounting for that, calls to OpenAI tend to be somewhat slower than the competition:
  • OpenAI calls also cost money, and when you’re working with queries at scale, those fractions of cents are going to add up really fast.

The authors construct a system using five different techniques to reduce the cost of using OpenAI’s LLMs, some of which avoid talking to them at all:

  1. Prompt Selection — reducing the number of examples provided in a prompt to reduce the total amount of tokens sent to the LLM
  2. Query Concatenation — combining multiple queries into a singe request to the LLM, and demultiplexing the response to answer the separate queries
  3. Response Cache — a cache that stores responses and returns answers from the cache if the queries is judged ‘similar’ enough
  4. Use a fine-tuned model instead — Collect responses from a large model (e.g. GPT-4) and use those responses to fine-tune a smaller model (the paper uses GPT-J), which can then be used in the:
  5. LLM Cascade — this is the main component of the paper. The cascade service sends a query to a list of LLMs in order of increasing expensiveness, and responses are evaluated via a scoring model to determine if the response is acceptable. If so, the response is returned to the user, if not, then the next LLM on the list is queried.

The resulting hodge-podge contains some surprises, the main one being: not only is it cheaper than just talking to GPT-4 directly, but when things are tuned, it actually performs better than GPT-4. The improvement isn’t massive, but combined with the 50-98% cost savings in their experiments, it does feel like there’s definitely something worth digging into here.

But also, a few issues. Using a fine-tuned model sounds like a good idea, but pretty much all the major LLM providers include clauses in their terms of service that would pretty much prevent you from doing this in production unless you have a robust legal department that is eager to try and argue the textual outputs of LLM models have no copyright protections and thus those terms are unenforceable. Some providers even prevent you caching LLM queries! And then there’s the issue that the authors point out as a major limitation — the scoring models need to be trained with labelled examples that come from the distribution domain of the incoming queries. Which means this is a system that will need to be continuously updated or else you’re going to have some serious model drift. Which makes me think of a bunch of data scientists running around this crazy contraption like Wallace and Gromit trying to prevent it from blowing up and spraying cheese all over the house. I do wonder if you could get this going in a RL framework or something else to alleviate the support the system would need.

(my eyebrows are also raised a little by the prompt selection and query concatenation stages — making LLMs that you don’t have raw access to consistently follow directions can be something of a challenge. You’d need to provide extra guardrails in production to make this work reliably)

One thing I think is interesting and glossed over a little in the paper, is the cache component. The paper refers to it as a ‘database’, but my thinking is that a traditional cache/database in such a pipeline is going to miss out on a lot of potential reuse of queries, e.g. “When was New Order’s Blue Monday originally released?” and “When did Blue Monday first come out?” are the same question but you’re going to get a cache miss on the second. So! Why not use a fast embedding model, a vector database, and an aggressive distance cut-off for nearest neighbours so you can respond to a lot more queries without having to go to the LLMs?1

Anyway, that’s FrugalGPT. Save all the monies! Keep your Zonai batteries charged!

  1. Another fun thing you can do here - you can take queries and redirect them to things that you want to promote. For example, a merchandiser could set up a promotion for nappies, and automatically searches for ‘Pampers’ could return the promotion results along with the user’s actual query response. ↩︎

Family Week

A busy week of family visiting! Everybody got to hold Maeryn, large pizzas were had, I waited outside a bakery for half-an-hour until discovering it was closed until the end of May, a Miranda moment where my trousers fell down as I was carrying a table across my front garden1, baby’s first Eurovision2, Mother’s Day brunch, discovering how Cincinnati’s bulk item pick up works and watching them crush a freezer, putting up shelves…and completing the first two shrines in Tears of The Kingdom3.

Now, the family is off to Vegas, but they’ll be back here at Christmas, doing a trial run of the new LHR-CVG direct flight, before we hopefully go back to Britain next year for a trip celebrating Maeryn’s first birthday!

No, I didn’t finish the comic notes, but I’m almost at 1,000 words after finally spending some time on it one morning this week…maybe over Memorial Day weekend…

  1. For proper Mirandaness, yes, my neighbours were sitting outside enjoying the evening when it happened… ↩︎

  2. Finland 4eva! ↩︎

  3. Nintendo capturing a large section of the gaming community with this ad… ↩︎

Five Years!

Something of a milestone this week. As of this weekend, not only have I been living in this house for five years, but because I bought the house in Durham in May 2013, this is now the second-longest time I’ve ever lived in a house full-stop. It will take some beating to overcome Avon Crescent’s 28 years, but hopefully we’re well on the way (as I type, Maeryn is making grunting noises from her rocker, so I think she’s on-board with this plan).

The family comes over the ocean this week, and I’m about to start my second week of paternity leave; I’m spreading it out to get maximum overlap with Tammy’s maternity leave and stretch out what we’ve been luckily to have in a country where there is no federally-mandated paid parental time off. And we have the oven working again!

Anyway, having some time off means you might finally get those promised comic notes. And maybe something else too…

Let's Talk About Lego

Honestly, I really don’t think you understand the depths of the Lego problem that is building. At 1am this morning, I was looking at pictures of Piccadilly Gardens because I decided that the city really needs to have a central public space instead of just shops and eateries. This was after I had talked myself into having two stations earlier on Saturday. After all, if Bicester can have two stations, then this city certainly can! Besides, it’s more fun in a closed city if there’s multiple destinations where a train can stop1.

And that led me down a short rabbit-hole, because in that case, especially with the central public space idea, a tram system seems a good idea. But the third-party tram I found didn’t seem all that appealing, and Lego only seemed to do one in a rather old set which is going for crazy money. So for now, everybody will travel by high-speed train.

And then there’s the housing problem (and let’s be clear, not a single brick of this city has actually been built yet). I have a few sets that come with a lot of minifigures. The hospital set comes with 12!! They all need to have homes…but most of the houses I have came with more people. But don’t worry, because I have a plan to solve this issue with the most on-brand idea possible. I just need to order a lot of bricks to make it happen.

Of course, some of you might be pointing out that Maeryn is barely a month old. And yes, that may be true, pedants. But planning! Planning is very important, dammit!

  1. Look at this. It even has a shop called WH Brick. ↩︎

This Birthday's On Fire

I have to say that running down to the basement to hit the breaker while your oven spews smoke does tend to cast a little bit of a pall over your birthday. Don’t worry though, everything is fine…well, the oven isn’t, as I believe the smoke came from the logic board and the whole thing is dead to the world (as well as turned off at the breaker). But still, that’s next week’s problem.

Anyway, a good birthday was had by all; pizza slices larger than Maeryn, off-brand Lego sets to help increase the diversity of our eventual Lego city, boardgames, and aside from the oven’s electrical fire, a nice gentle day.

I’ll be honest with myself and suggest you don’t even think about seeing those writer’s notes until May when I have my second round of paternity leave; all that you’ve got to look forward on here for the next two weeks is more pictures from early fatherhood and my ongoing crippling addiction to Lego (if Maeryn doesn’t like building with blocks, I am going to be in trouble)…

Organize All The Things!

I think I may be nesting. In the past week, I’ve cleared out a space in one of the storage rooms to be a tornado storm shelter, bought a safe for important documents, and spent the weekend putting together shelving that has lain in pieces in the furnace room for almost four years (which seems really odd until I factor in Pandemic Time Dilation). I also dug through five years of paperwork to find the title paperwork for my old car, which will be going across the water to live with Myles shortly. So definitely a ‘dad’ weekend. I have also ordered more shelves. Because apparently now everything has to be in tubs.

(Tammy has also seen the extent of the Transformers Problem for the first time in a few years and…yes, I do need to do something there. And I haven’t even considered what to do about the Lego Problem that is waiting in the wings)

Next week! Baby’s First Buggy Trip! Melting GPUs!


View this post on Instagram

A post shared by Ian Pointer (@carsondial)

A Little Earlier Than Expected…

Welcome Maeryn!

View this post on Instagram

A post shared by Ian Pointer (@carsondial)

Another Round of Apologies

In fairness, I have started. But it has been a crazy week of visitors, a sizeable existential crisis, and birthing classes. At this point, the write-up comes when it comes…hopefully before April…

I will leave you with the ultimate St. Patrick’s Day Comic Relief sketch: