Week One

We survived the first week! Admittedly, we couldn’t go outside for most of the week because of the terrible air quality across the Midwest from the Canadian wildfires, but we did eventually go to the post office and the supermarket. Maybe this week will be the one where I don’t open Slack at all.

(obviously, that’s never going to happen. But I did also make a bunch of headway on a bunch of the ‘parental leave list’ items, so that is something!)

A lot has changed again in just a week — Maeryn is now reaching for and interacting with toys, plus she is really getting the hang of the posture of sitting, even if she hasn’t quite worked out balance yet. She’s coming along week by week, sometimes even by the day!

Next week: I begin the process of getting an Irish passport…

And Now For Something Completely Different

It occurred to me on Friday that the upcoming weeks will see my longest bout of time off from work since…well, possibly the end of 2010 when I quit my last job in the UK. That’s quite a while ago…

Making Plans For Maeryn

I realize that my current ‘things to do during paternity leave’ list has grown to comic proportions at this point (the ‘comics’ entry currently even has three subitems!), when what it should be is “keep baby fed / sleep when possible”. But I like setting unrealistic targets and then berating myself when I fail to achieve them! This is fine, right?

Now, if you’ll excuse me, I have to go and assemble my Father’s Day present…

Lego Land Rover

That Was A Week

People of the Internet! I need your opinion! Should I try and go to see this band when they’re in Cincinnati next month?

Otherwise, another quiet week. But only two weeks left until the last of my paternity leave kicks in and I’m out for over a month. That…will be interesting! And only 9 working days, too…I might be a little busy trying to get all that ready…

Network Is Down

Continuing the focus on old television, it’s been a sad week, as Network Distributing’s website went offline on Tuesday and although there’s no official news, it appears the company is either in administration or liquidation. It’s terrible news for physical video media in the UK in general, as Network seemed to be one of the last ‘giants’ standing the market, pushing out a lot of titles each year, but it’s even worse for people that are interested in the more esoteric fare that they put out. Sure, they had plenty of crowd-pleasers like the restored Prisoner blu-rays or the astoundingly complete Monty Python boxset from a couple of years ago. But they were also the company that would put out a children’s TV serial from 1974 that may have only aired in the Yorkshire TV region and that half the cast have forgotten they even made it. And they’d produce that release with a similar amount of care as they would the Python set. A company that was fully prepared to release a 13-disc set of Give Us A Clue and the complete insanity of the recent 90+disc Crossroads set. They were legends and will be sorely missed.

(As part of the hordes descending upon retailers in an attempt to get hold of things before they sell out forever, I got hold of the complete series of Watching. Which I would never claim is brilliant, but I have good memories of watching it both at home on broadcast and in Santa Monica ten years ago. And who is going to be crazy enough to do a repress of this series? I’m not even sure if it’s even online in the UK…)

In happier news, it looks like I will be in San Francisco for the first time in over three years this coming August! I’ll be going to the Google Cloud Next conference in the centre of town, and hope to meet up with a bunch of people I haven’t seen in person for a while. I feel like I’m already regretting the red-eye back home on Thursday night, mind you; I didn’t realize the conference ended at 2pm…but given the time difference, it looks like I wouldn’t get home any quicker anyhow. Still, should hopefully be fun…and I’ll get to see the new offices for the second time in three and a half years…

Ken Morse

I’ve been living in the US for almost 12 years now, and one of the things I can’t wrap my head around? The United States Postal Office. Let’s take postal redirection. It will cost you almost £40 in the UK to have Royal Mail redirect your post. In this shrine to capitalism and money-grabbing? $1.10 online, or free if you do it at a post office (they only charge you to verify your identity). The equivalent to the First class stamp costs…63¢ versus £1.10. They’ll take your outgoing post from you at the door if you don’t want to find a postbox. I get emails every morning showing me what post is going to be arriving later that day. And then there’s the packaging. I ordered about 12 different sets of packaging (probably close to 100 pieces altogether). Everything from DVD-sized boxes to tubes that look like they’re explicitly designed to send massive Toblerones across the country. And the cost of all this? $0.00, including $0.00 postage and packing.

Honestly, I can understand why the Republicans want to privatize it. You can’t have a publicly-owned company be so useful and able to send post across the country for a flat-rate. It just breaks their minds.

In other news, this week I began my long-term plan of indoctrination. Exposure to The KLF, Kate Bush, New Order, and Kenickie is well underway, plus Maeryn was also treated to the first series of The Fast Show (I’m hoping she’ll start doing the “oooh!” part of Suits You soon). I don’t think I saw the first series when it originally went out; my first memory of watching it is this moment from S02E01, which is where I knew it was going to be something I loved:

Ken Morse on the Rostrum Camera

The additional detail of throwing in an obscure TV gag where it wasn’t really required, but Higson/Whitehouse decided to go the extra mile. Possibly the last truly great BBC sketch show?1

Also, I have to admit that this Ron Manager segment is…well…yes I could hear the vidiprinter sound as he was talking

  1. When watching it back, one thing that comes to mind is: “how much did this all cost? Endless VT effects, tons of location shooting, stunt work, a massive cast, skits that can last only a few moments…you just wouldn’t get the budget for that these days. ↩︎

FrugalGPT: This Big Boy Can Fit So Many LLMs

FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance

I feel like the timing of this paper is amazing; you get the feeling that the authors watched some of the Tears of The Kingdom trailers, looked at the pile of models they had lying around and just thought “Why don’t we just use Ultrahand on them?”

What we have here, then, is a carefully constructed Heath Robinson machine designed to work around two big issues with calling GPT-3/4 in a query pipeline:

  • OpenAI calls are slow (there’s a fundamental issue about have to make a call to an external API, but even accounting for that, calls to OpenAI tend to be somewhat slower than the competition: https://github.com/kagisearch/pyllms)
  • OpenAI calls also cost money, and when you’re working with queries at scale, those fractions of cents are going to add up really fast.

The authors construct a system using five different techniques to reduce the cost of using OpenAI’s LLMs, some of which avoid talking to them at all:

  1. Prompt Selection — reducing the number of examples provided in a prompt to reduce the total amount of tokens sent to the LLM
  2. Query Concatenation — combining multiple queries into a singe request to the LLM, and demultiplexing the response to answer the separate queries
  3. Response Cache — a cache that stores responses and returns answers from the cache if the queries is judged ‘similar’ enough
  4. Use a fine-tuned model instead — Collect responses from a large model (e.g. GPT-4) and use those responses to fine-tune a smaller model (the paper uses GPT-J), which can then be used in the:
  5. LLM Cascade — this is the main component of the paper. The cascade service sends a query to a list of LLMs in order of increasing expensiveness, and responses are evaluated via a scoring model to determine if the response is acceptable. If so, the response is returned to the user, if not, then the next LLM on the list is queried.

The resulting hodge-podge contains some surprises, the main one being: not only is it cheaper than just talking to GPT-4 directly, but when things are tuned, it actually performs better than GPT-4. The improvement isn’t massive, but combined with the 50-98% cost savings in their experiments, it does feel like there’s definitely something worth digging into here.

But also, a few issues. Using a fine-tuned model sounds like a good idea, but pretty much all the major LLM providers include clauses in their terms of service that would pretty much prevent you from doing this in production unless you have a robust legal department that is eager to try and argue the textual outputs of LLM models have no copyright protections and thus those terms are unenforceable. Some providers even prevent you caching LLM queries! And then there’s the issue that the authors point out as a major limitation — the scoring models need to be trained with labelled examples that come from the distribution domain of the incoming queries. Which means this is a system that will need to be continuously updated or else you’re going to have some serious model drift. Which makes me think of a bunch of data scientists running around this crazy contraption like Wallace and Gromit trying to prevent it from blowing up and spraying cheese all over the house. I do wonder if you could get this going in a RL framework or something else to alleviate the support the system would need.

(my eyebrows are also raised a little by the prompt selection and query concatenation stages — making LLMs that you don’t have raw access to consistently follow directions can be something of a challenge. You’d need to provide extra guardrails in production to make this work reliably)

One thing I think is interesting and glossed over a little in the paper, is the cache component. The paper refers to it as a ‘database’, but my thinking is that a traditional cache/database in such a pipeline is going to miss out on a lot of potential reuse of queries, e.g. “When was New Order’s Blue Monday originally released?” and “When did Blue Monday first come out?” are the same question but you’re going to get a cache miss on the second. So! Why not use a fast embedding model, a vector database, and an aggressive distance cut-off for nearest neighbours so you can respond to a lot more queries without having to go to the LLMs?1

Anyway, that’s FrugalGPT. Save all the monies! Keep your Zonai batteries charged!

  1. Another fun thing you can do here - you can take queries and redirect them to things that you want to promote. For example, a merchandiser could set up a promotion for nappies, and automatically searches for ‘Pampers’ could return the promotion results along with the user’s actual query response. ↩︎

Family Week

A busy week of family visiting! Everybody got to hold Maeryn, large pizzas were had, I waited outside a bakery for half-an-hour until discovering it was closed until the end of May, a Miranda moment where my trousers fell down as I was carrying a table across my front garden1, baby’s first Eurovision2, Mother’s Day brunch, discovering how Cincinnati’s bulk item pick up works and watching them crush a freezer, putting up shelves…and completing the first two shrines in Tears of The Kingdom3.

Now, the family is off to Vegas, but they’ll be back here at Christmas, doing a trial run of the new LHR-CVG direct flight, before we hopefully go back to Britain next year for a trip celebrating Maeryn’s first birthday!

No, I didn’t finish the comic notes, but I’m almost at 1,000 words after finally spending some time on it one morning this week…maybe over Memorial Day weekend…

  1. For proper Mirandaness, yes, my neighbours were sitting outside enjoying the evening when it happened… ↩︎

  2. Finland 4eva! ↩︎

  3. Nintendo capturing a large section of the gaming community with this ad… ↩︎

Five Years!

Something of a milestone this week. As of this weekend, not only have I been living in this house for five years, but because I bought the house in Durham in May 2013, this is now the second-longest time I’ve ever lived in a house full-stop. It will take some beating to overcome Avon Crescent’s 28 years, but hopefully we’re well on the way (as I type, Maeryn is making grunting noises from her rocker, so I think she’s on-board with this plan).

The family comes over the ocean this week, and I’m about to start my second week of paternity leave; I’m spreading it out to get maximum overlap with Tammy’s maternity leave and stretch out what we’ve been luckily to have in a country where there is no federally-mandated paid parental time off. And we have the oven working again!

Anyway, having some time off means you might finally get those promised comic notes. And maybe something else too…

Let's Talk About Lego

Honestly, I really don’t think you understand the depths of the Lego problem that is building. At 1am this morning, I was looking at pictures of Piccadilly Gardens because I decided that the city really needs to have a central public space instead of just shops and eateries. This was after I had talked myself into having two stations earlier on Saturday. After all, if Bicester can have two stations, then this city certainly can! Besides, it’s more fun in a closed city if there’s multiple destinations where a train can stop1.

And that led me down a short rabbit-hole, because in that case, especially with the central public space idea, a tram system seems a good idea. But the third-party tram I found didn’t seem all that appealing, and Lego only seemed to do one in a rather old set which is going for crazy money. So for now, everybody will travel by high-speed train.

And then there’s the housing problem (and let’s be clear, not a single brick of this city has actually been built yet). I have a few sets that come with a lot of minifigures. The hospital set comes with 12!! They all need to have homes…but most of the houses I have came with more people. But don’t worry, because I have a plan to solve this issue with the most on-brand idea possible. I just need to order a lot of bricks to make it happen.

Of course, some of you might be pointing out that Maeryn is barely a month old. And yes, that may be true, pedants. But planning! Planning is very important, dammit!

  1. Look at this. It even has a shop called WH Brick. ↩︎