The First Big Weekend

It’s the first quiet weekend since, maybe February? We have no schedule, no appointments, or any plans. Maeryn has just gone down for an unexpected but solid afternoon nap. Tammy and I meet in the kitchen.

What are we supposed to do now?

Eventually, we’ll get used to it. And then Maeryn will stop sleeping in the afternoon…

We did, however, all go out to dinner…in a restaurant that we haven’t set foot in for four years. Plenty of ‘roadside delivery’ during that time, but we haven’t had a meal there since the start of the pandemic. Also, apparently Maeryn likes Bhangra music, rocking out in her little high chair while eating paneer.

With the release of Llama3 this week, I’ve toying with the idea of a series entitled: “Let’s look at old papers and replace ChatGPT3 with Llama3-7b-chat and see what happens!” I spent part of Friday night1 getting the ADaPT paper working, which took about five minutes, and then two hours attempting to work out why the WebShop evals weren’t working for the full 100 traces before giving up after staring at the mess of Java and Python that comprises the benchmark. So the tl;dr is: I saw ADaPT work with Llama3 for several traces, but can’t actually report on how it compares to the original ChatGPT implementation. Promising, though.2

  1. Don’t worry, we had already watched an episode of Pole To Pole, so archive television had been slotted in! ↩︎

  2. Although I will say that I have some fundamental objects to the functions they make available to the planner/agent LLMs - I don’t think SimpleMatch is ever going to return something useful in the WebShop context - I’d replace it with a very quick and dirty embedding function to give the agent a chance of returning candidates to the planner, even if they end up not being a perfect fit. ↩︎