# Give me the hope to run out of steam

And we’re back to non-deep learning content again1. I’m currently watching my cat attempt to hunt birds from the basement window, losing her footing and falling onto the pillow that has been down here since Thanksgiving. And then getting back up again. It’s entertaining. Welcome to lockdown…week…infinity?

This has been a very different week from most quarantine weeks, in that the house has been mostly full! People recovering from surgery, visitors, cinnamon rolls, a disappointing Detroit-style pizza2, the return of the deer family, a depressing Catalyst-related Spark bug, the completion of the TF UK #82 line up of The Wreckers3, a shed-load of Brook-Rose novels, and my first attempt at Sichuan cooking. Which, really, makes staying home sound a lot more interesting than it actually is. And I didn’t mention the nights of not sleeping, or the hours of staring at log files. Or that my turnips rotted.

Still, aside from the weight gain, the general stress, the mounting apprehension about November, and a host of other things, everything is fine! Totally!

🥺🥺🥺

Come back next week for more uplifting content!

1. By the way, I’m sorry that the LaTeX for the equations didn’t render properly in the last post. I did attempt to upgrade my Hugo install and include KaTeX rendering…but ended up completely destroying my rendering pipeline due to the Hyde-X theme not having been updated for a few years now. I…will do a blog redesign at some point with a more modern theme, but for now, I’ve downgraded twenty versions of Hugo to get the pipeline working again. Yay, software! [return]
2. Probably expected when you use NY pizza dough for it, mind you… [return]
3. WRECK’N’RULE! Anyway, better than the other news coming from Hasbro and Takara this week… [return]

# Big Models Hate This One Weird Trick! (Quantization, T5, & PyTorch 1.4)

Here’s the notebook in Google Colab

As we know, models can be big lumbering beasts, comprised of millions of parameters (both weights and activations) that require lots of matrix multiplications to take an input and arrive at an answer. And for most of our work so far, that’s been fine! We have mighty GPUs that can handle these burdens with ease.

But what if we didn’t? We often package a model up for production inference usage so that it only runs on the CPU. And what if we wanted to run our model on a smaller embedded platform? Suddenly, both the size of the model and all those floating-point operations become a little more problematic. Thankfully, there’s a trick we can perform that makes our model smaller and faster, normally with the trade off with some accuracy. Even better, PyTorch allows us to perform this one weird trick with just one line of code, with some other approaches for squeezing even more performance. Let’s have a quick look at quantization.

## Quantization

Every parameter in our model is a 32-bit floating point number, taking up 4 bytes of memory. That’s not a lot, but it can soon add up. Let’s have a look at Google’s recent T5 transformer-based model, which has a t5-small variant that’s available in the transformers library.

import torch
from transformers import pipeline, T5ForConditionalGeneration

def count_parameters(model):
return sum(p.numel() for p in model.parameters())

base_model = T5ForConditionalGeneration.from_pretrained("t5-small")

param_count = count_parameters(base_model)

memory = (param_count * 4) / (1024 *1024)
memory


230.8154296875


Even with the smallest pre-trained T5 weights, our model is roughly 60m parameters and weighs in at a whopping 230Mb!

However, what if we decided that we didn’t need the full precision of our floating-point parameters? If our parameters could be restricted to within a certain range of values, then we could use a smaller type of number representation to store the parameters. This quantization is the key to speeding up our inference time and reducing the memory footprint of our models. What we tend to aim for is to quantize down from a 32-bit floating point to an 8-bit integer. The basic idea is:

$$x{int8} = (\frac{x{float32}}{x{scale}} + x{offset})$$

Which is essentially just fitting the potential values of the parameters of a network to a line of $y = mx + c$, although due to the reduced resolution of the 8-bit integer, there’s only so many values a parameter now may take instead of the huge amount that a float32 value could be. PyTorch does its quantizing in a slightly more complicated affair that ensures that zero is always zero, but the basic idea is the same - we have a range of values that our parameters can take, and then find an appropriate pair $x{scale}$ and $x{offset}$ to provide 256 graduations to represent that range - or 255 if you think about PyTorch always keeping zero around.

At the moment (PyTorch 1.5), quantized layers are best supported with CNN and Linear layers. Thankfully, if we have a look at the model structure of T5, we can see a happy coincidence:

base_model

T5Model(
(shared): Embedding(32128, 512)
(encoder): T5Stack(
(embed_tokens): Embedding(32128, 512)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
(relative_attention_bias): Embedding(32, 8)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(2): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(3): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(4): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(5): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(decoder): T5Stack(
(embed_tokens): Embedding(32128, 512)
(block): ModuleList(
(0): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
(relative_attention_bias): Embedding(32, 8)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
(relative_attention_bias): Embedding(32, 8)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(1): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(2): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(3): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(4): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(5): T5Block(
(layer): ModuleList(
(0): T5LayerSelfAttention(
(SelfAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(1): T5LayerCrossAttention(
(EncDecAttention): T5Attention(
(q): Linear(in_features=512, out_features=512, bias=False)
(k): Linear(in_features=512, out_features=512, bias=False)
(v): Linear(in_features=512, out_features=512, bias=False)
(o): Linear(in_features=512, out_features=512, bias=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
(2): T5LayerFF(
(DenseReluDense): T5DenseReluDense(
(wi): Linear(in_features=512, out_features=2048, bias=False)
(wo): Linear(in_features=2048, out_features=512, bias=False)
(dropout): Dropout(p=0.1, inplace=False)
)
(layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
(final_layer_norm): T5LayerNorm()
(dropout): Dropout(p=0.1, inplace=False)
)
)


Yes, that’s right, look at all those Linear layers! We should be able to get some benefit out of quantizing this model.

## One Weird Trick — Dynamic Quantization

import torch.quantization

quantized_model = torch.quantization.quantize_dynamic(base_model, {torch.nn.Linear}, dtype=torch.qint8)


No, really, that’s it. Chapter done. Bye!

Oh, okay, if you really insist. But honestly, there’s not much more to it. Okay, firstly, a caveat in that quantize_dynamic will only quantize the weights, not the activations in our parameters. But all we need to do is pass in the model we wish to quantize and a dict of layers that we wish to replace with our quantized versions, in this case Linear. The function returns a new model, though you could run with the optional parameter inplace=True to mutate the original model rather than make a copy.

Let’s save the model and take a look at the quantized size:

!mkdir t5
quantized_model.save_pretrained("t5")
!du -m t5

mkdir: cannot create directory ‘t5’: File exists
121 t5


Almost a 50% reduction in size! We can’t get down to 4 times smaller due to not being able to store the activations as 8-bit integers, but we’ve done pretty well for one line of code. Let’s do a very simple microbenchmark using both models in the transformers library summarization pipeline.

base_summarizer = pipeline("summarization", model=base_model, tokenizer="t5-small")
quantized_summarizer = pipeline("summarization", model=quantized_model, tokenizer="t5-small")

%timeit base_summarizer("From the very beginning, Regan was seen as having series potential. After the television film scored highly in the ratings, work began on the development of the series proper. Ian Kennedy Martin's idea was for the series to be mainly studio-based, with more dialogue and less action, but producer Ted Childs disagreed, and in consequence Ian Kennedy Martin parted company with the project. Childs produced it on 16mm film, a format that allowed for a much smaller film unit than videotape at that time. This made it possible to shoot almost entirely on location which helped give the series a startling degree of realism and to use film editing techniques which enabled him to give the show a heavy bias toward action sequences. The television play and the subsequent series were commissioned by Thames Television and produced by its film division Euston Films. It was originally broadcast on ITV between 2 January 1975 and 28 December 1978 at 21:00–22:00 on weekdays (usually Mondays), with repeated screenings at the same time until the early 1980s. The writers were given strict guidelines to follow: \"Each show will have an overall screen time (minus titles) of 48 minutes 40 seconds. Each film will open with a teaser of up to 3 minutes, which will be followed by the opening titles. The story will be played across three acts, each being no more than 19 minutes and no less than 8 minutes in length. Regan will appear in every episode, Carter in approximately 10 out of 13 episodes. In addition to these main characters, scripts should be based around three major speaking parts, with up to ten minor speaking parts")

1 loop, best of 3: 29.4 s per loop

%timeit quantized_summarizer("From the very beginning, Regan was seen as having series potential. After the television film scored highly in the ratings, work began on the development of the series proper. Ian Kennedy Martin's idea was for the series to be mainly studio-based, with more dialogue and less action, but producer Ted Childs disagreed, and in consequence Ian Kennedy Martin parted company with the project. Childs produced it on 16mm film, a format that allowed for a much smaller film unit than videotape at that time. This made it possible to shoot almost entirely on location which helped give the series a startling degree of realism and to use film editing techniques which enabled him to give the show a heavy bias toward action sequences. The television play and the subsequent series were commissioned by Thames Television and produced by its film division Euston Films. It was originally broadcast on ITV between 2 January 1975 and 28 December 1978 at 21:00–22:00 on weekdays (usually Mondays), with repeated screenings at the same time until the early 1980s. The writers were given strict guidelines to follow: \"Each show will have an overall screen time (minus titles) of 48 minutes 40 seconds. Each film will open with a teaser of up to 3 minutes, which will be followed by the opening titles. The story will be played across three acts, each being no more than 19 minutes and no less than 8 minutes in length. Regan will appear in every episode, Carter in approximately 10 out of 13 episodes. In addition to these main characters, scripts should be based around three major speaking parts, with up to ten minor speaking parts")

1 loop, best of 3: 16.6 s per loop


In addition to almost being half the size, the quantized model is almost twice as fast! So…why don’t we do this all the time? Are there no downsides? Well…it depends. We are losing information in our inference in a quantized model as our values cannot map to all the possible floating-point values that we find in the original model. So the chain of multiplications will be less accurate in our quantized model than in the original. You’ll need to check the new model against a reference dataset to determine the accuracy loss and whether that loss is an acceptable trade-off compared to the reduced storage demands and faster execution.

## Other Quantizing Options Are Available

In addition to dynamic quantizing, PyTorch also offers static quantizing, where a trained model is modified to include observer modules and a selection of data is fed into the model. During the inference on this data, the observers can generate a quantized distribution that fits best to the observed data and the activations that result. This can can produce even further space and time savings, especially with vision models like ResNet.

However, for the best-in-class of accuracy in your smaller model, you’ll want to investigate quantization-aware training (QAT). In this approach, the model fakes quantizing during the training loop of both the forward and backward passes; while all the computations take place with standard floats, everything is rounded down to integer values, so you end up with a quantized model after training is finished, but one with a higher accuracy than you can acheive with the dynamic or static approaches.

## Is It Worth It?

You might be wondering if you’re just better off training a smaller model rather than going to all this effort to compress larger models. In the recent paper, Train Large, Then Compress, there’s a good deal of evidence presented that transformer-based models really do benefit from this approach. Because larger models converge faster than smaller ones, you will likely get more accurate results by training a large model and compressing than if you spent the same compute time on a smaller model. So go forth and compress!

(and we’ll see you back here in the future for pruning models)

https://pytorch.org/docs/stable/quantization.html

https://arxiv.org/abs/2002.11794

# Falling Standards

A summary of how things are going: I have worked from home now for over four years. And yet it was the closing week of April 2020 where I ordered enough tracksuit bottoms to last a week, because I cannot be bothered to dress properly any more. Standards have fallen, people. Also, how come March felt like one thousand years instead of a month and yet April has flown by so quickly?

Slowly trying to thin down my web presence this week by backing up a hosted server I rent that…hasn’t been doing anything of note for at least two years now. I really want to convert this blog to just being served from an S3 bucket, but given that the current set up has been running on Dreamhost for…18 years, well, let’s just say that there’s some inertia there. Also, this blog will be old enough to drink come August (though not in its jurisdiction). That’s absolutely terrifying. It also means that it’s been the same amount of time since I went to UNC. Ah, halcyon days.

Meanwhile, from what is apparently the Cincinnati 2020 reboot of Keeping Up Appearances, let me tell you how I met the ex-mayor of Cincinnati. It’s Friday evening, and I’m celebrating the end of the week with a drink. It’s a slightly larger than usual drink because it was the end of the bottle. There I am, downstairs in the basement watching an episode of The Sweeney whilst also reading the instructions for a GPU-accelerated dense vector search library. Yeah, I know you’re jealous; it’s a wild Friday night and no mistake. Anyway, having finished the bottle of Wild Turkey, of course the doorbell rings. Running up the stairs, slightly drunk, pulling off a sock as I go1, I open the door and, yes, well there’s the former mayor, coincidentally my neighbour with a package in hand. Now, despite it also being the two-year anniversary of me moving to the area…I have never actually met him before. So he’s introducing himself, and I go to shake his hand.

“Oh no, I don’t think we do that any more”

I snap back, making it very clear I’ve been drinking, apologising and basically expecting a Miranda-like situation where I lose my trousers is coming any moment. On the bright side, I have now met both sets of neighbours! But I can never speak to one set ever again. I guess I’m thankful it’s not the set that I’m helping to take down a tree in the summer?

1. because, obviously, I only had one sock on at that point and that would have just been weird to answer the door with one sock. [return]

# Shaw Taylor Says Hi

Well, as the horror continues to pile up, a small bit of personal good news: apparently my book is being translated into German. I’m working with the editors to make a few changes in the upcoming edition, hopefully including the already-in-English Chapter 9.5 plus some sections on quantizing models and image self-supervision, providing I can get them written by the end of May. The English language versions of those will end up on the GitHub repo as promised. No word on a second edition (which would probably see massive revisions of the test and audio chapters, given that PyTorch changed everything a month after the book went to the printers), but I’ll let you know if it happens!

Otherwise, a week that mainly consisted of the New Normal of Remaining Indoors, but with a small visit to Jungle Jim’s on Saturday (with a mask, but nothing as fabulous as this outfit). I needed more tea. Of course, when you go, you can’t just buy one thing, as you will end up with at least ten things that catch you eye even when you go with a list and a plan in mind. Reader, I bought all the Star Bars. And the Tunnock’s Caramel Wafers. Look, I’m reduced to watching episodes of Police 5 whilst drinking a cup of tea. Eating Scottish sweets that haven’t changed since 1952 is the least of my worries right now.

I continue to plan for a future in the hope that there will be a future. Not that I can talk about things much at the moment, but one plan will have a website, hopefully around the end of June. It’ll be fun! At least fun in a certain type of way. Not quite as much fun as Sophie Ellis-Bextor’s Kitchen Discos, but something for a certain type of person to play about with for the rest of the year. No clues, but it will take obsessions from my early life and mix them with current obsessions. Keep ‘em peeled!

I’m so sorry

# What Even Is Time?

I think this week qualifies as my first meltdown? That’s what days of increasingly less sleep gets you, I guess. Also: you cannot solve Kubernetes problems by thinking about them from 1am until 6am.1

Anyway, I have had a long weekend of doing…almost nothing. It’s been quite nice, really. No baking or confectionary work. No long periods of staring at code. I even sat outside for a bit.

(Okay, I cheated by moving the baking to mid-week by virtue of having a quiet, but quite nice birthday on Wednesday. Not quite the birthday celebration we had planned, but that is pretty much 2020 so far…)

Next week: further adventures in Seldon Core. And maybe a masked-up visit to Jungle Jim’s…

(I don’t have a lot to say about being 41. Aside from being so old, obviously)

1. Though, oddly, you can solve them in the morning whilst in the shower. [return]

# Sheriff Fatman

Really, what I want to know is why I spent a large part of Friday night listening to Carter The Unstoppable Sex Machine?

It’s a sound of The End of History, big sans-serif fonts on top of VT from a performance on The 8:15 From Manchester, Amiga music intros, and big honking slabs of the cheapest drum beat you could possibly imagine. And there’s been no revival. The only reference point I can think of these days would be The Indelicates, and they’re not exactly a household name. Maybe I just love the puns and the vague memory of happier times before The Event.

Also, who booked them for the Smash Hits Poll Winners’ Party?

(I have to say that I’m on Schofield’s side of that argument; his response to them smashing up the stage is peak-Smash Hits. It’s not quite like firing a machine gun at the Brits, after all…)

In Event news, I went outside. Twice. Both to supermarkets, though one trip was for a prescription rather than food items. The trip to Kroger was weird in that it’s the first time in a month I’ve been near that number of people . Already it seems odd to have so many people in one place. Probably around 60-70% people wearing masks and…somewhat bad attempts at social distancing. You have to be more on your game noticing when people suddenly stop in the aisles. But I got in and out (I can’t say unscathed, because none of us really know, of course). There are supplies, and there will be a roast dinner (and a game of Root which actually ended up not being quite as scary for a first play-through as we expected).

As I type this, I realize that right now, I should have been suffering terrible jet-lag and trying to roam the streets of Tokyo with Tammy in a vain attempt to stay up to a reasonable bedtime. There would have been ramen, brutalism, bourbon hunting, and my face curling into a pained grimace as Tammy ate all the seafood. But the important thing is that we’re still all here and mostly okay!

# Remaining Indoors

Well, I was going to point back into the archives and highlight some posts from the Pac-Man Wedding of 2010…only it turns out that whilst I was good about posting updates leading up to the event highlighting voting for the reception songs, I wrote absolutely nothing about the day itself or the subsequent trip to New York. So, er…it’s been ten years since Sandwich Day at Fiore’s! I am celebrating by…remaining indoors as usual. It’s what we do in 2020!

I even double-downed on Remaining Indoors this weekend as there was no board game visit. I sadly didn’t really sleep on Friday and didn’t feel up to it. Instead, I made my own Easter egg.

View this post on Instagram

A post shared by Ian Pointer (@carsondial) on

Milk chocolate orange through and through, twenty-two centimetres tall. That’ll do, I think.

I am putting off going outside again until Tuesday. I can even probably survive another couple of weeks without going to a supermarket, but we’re going to have a roast dinner for Easter Sunday next weekend, so I will have to venture out for potatoes and meat. I’m not particularly looking forward to that. But I have masks, gloves, and enough hand sanitizer to sterilize a battalion. So I should be okay.

# I Only Remember The Bits With Liz

Well, I finally caved to reality this weekend and cancelled the flight to Japan. Re-scheduling may end up being somewhat difficult, even putting aside the pandemic, so it’s not looking likely that we will get to Japan this year after all. Bit of a disappointment, but hopefully there will be other times to come!

View this post on Instagram

A post shared by Ian Pointer (@carsondial) on

Meanwhile, I have joined the baking hordes. But I refuse to waste my precious flour reserves by throwing out a sizeable amount of it to begin a sourdough starter. Instead, soda bread! I was planning on doing a full Ulster Fry…but isolation makes you somewhat less inclined to do an actual production for lunch or dinner. Just Enough Is Fine.

Having said all that, I’m already making plans for something fun at Easter now that I’m going to…well, probably still confined to the house if we’re being honest with ourselves. More details as the plans come together! But you can guess that it’s going to involve chocolate somehow. Not much of a surprise there.

We also re-watched Billy Liar this week. I last watched it back in 2005, which seems to have meant that I completely forgot all the scenes in the film that don’t have Julie Christie in and show Billy to be entirely hateful. Or that Rodney Bewes was in it, basically playing Bob from The Likely Lads. It was definitely less charming than I remember, and I think spending the past 15 years with The Long Blondes’ Giddy Stratospheres made me think there was a touch more to the Liz scenes than are actually present in the film. Still, we’ll move onto Kes next week in my continuing season of “it’s grim up North”. Or maybe Panda’s Fen for something a little more weird.

Right, another week of staying home awaits! rushes and sits down

# And now, The Quiz Broadcast

A little light viewing.

Yes, I am doing absolutely fine, why do you ask? Honestly, I think it was the tornado warning on Thursday night that finally sent me over the edge as myself and the cat sheltered in a small wood-lined room in the basement with over 250 Transformers. All I’m saying is I want a refund on my 2020.

But! In good (or at least better news), I am 100,000 bells in debt to that capitalist oppressor raccoon. We never learn, and Nook lures us to a remote area where he controls the only means of ingress or egress, and the entire money supply. I really want an Animal Crossing update where you can finally depose him. In the meantime, I will continue to dig up seashells to slowly pay off that mortgage. At which point he’ll ‘accidentally’ upgrade the house and put me under for 500k.

So it’s mainly been a week of staring at code and slowly making it work again, cursing at that damn Nook, re-discovering the Ghost Box record label, playing board games, reading, and baking. Not too bad, considering.

View this post on Instagram

A post shared by Ian Pointer (@carsondial) on

Discovering that my wall oven has a “proofing” mode is going to come in quite handy, I believe. As will the accidental stockpiling of flour in the post-Thanksgiving sales.

And tomorrow, I get to be ‘down with my fellow youths’ and use Houseparty for the first time. I am too old for a new social network (to this day, I still don’t really understand how SnapChat is supposed to work). But this way everybody back home gets to be online in an attempt to recreate Sunday at Granny’s house…when you can’t be there physically…

# Enough Pre-2000s UK TV To Tide Me Over

On the bright side, I got a surprise day off on Friday. On the downside, our upcoming trip to Japan is cancelled and my family is apparently no longer welcome in this country for the foreseeable future. Although given the absolute disaster that was penning thousands of potential COVID-19 carriers into an enclosed space with non-carriers for upwards of ten hours in airports, it’s probably for the best. Classic Miller and Kushner joint, that.

Meanwhile, I am preparing for a long hermitage. Which means I’m just carrying on as normal, obviously. It’s okay; I have a huge quantity of rice, soy sauce, and the complete set of The Sweeney (all episodes and the films). There’s an emergency backup of Inspector Morse and all of You Rang, M’Lord?. If I have to break out Hi-De-Hi, you know it’s got bad. Though I am now wondering if now is a good idea to download all of Taggart. You know, just in case.

I am of course, one of the luckiest ones. I already work from home, I can get almost everything delivered to my house from Amazon, and there’s a cat that demands attention every so often. Some I know are going to be in a situation where they’re exposed to danger every day and that…well, let’s just say I’ve upped my Ambien taking this week. Oh, also Lent is cancelled and I have cask-strength Irish whiskey for Tuesday. And Wednesday and Thursday if needed. We’re all just trying to get to next Friday, when Animal Crossing is finally released on the Switch and we can all start building houses in our villages. WE SHALL RISE AGAINST NOOK!