Ballroom Blitz

This visit to Blitz by Jack Charlton is fantastic. Aside from Charlton taking all in his stride, though, my favourite part is how it completely demystifies Blitz itself. “It’s the hallowed ground where New Romantism was born!” becomes “Oh, it’s just a disco in a very typical run-down British hall. Look at the state of those plugs.”

It occurred to me this week that I last filled my car with petrol back in March. And it’ll probably still be fine into August. Remain indoors!

(Having said that, I will be on the coast of North Carolina next weekend. Cook Out time! And maybe, maybe a sneaky stop on the way back Sunday afternoon for taking pizzas away from IP3. We can but dream. And wear masks. And disposable gloves. This is how we travel in 2020.

My weekend archive viewing was Secret Society, which included hilarious scenes where people on the high street were shocked to see their names come up on a private copy of the voting roll database via a teletext-style linkup. Oh, those innocent, halcyon days. You can also tell how I’ve been co-opted by the system, as the programme’s fierce line against machine-readable passports generated a big Alan Partridge-sequel shrug from me. “There’s a big book of names that the passport agents are supposed to check, but they don’t! This is bad! The machine-readable passports will make that viable! This is also bad!” A little odd.

And I think that’s it for this week. Except this wonderful clip from Gillian Gilbert where she casually reveals that nobody else in New Order could read/transliterate music and how she and Stephen wrote the backbone of World In Motion as a Reportage end credit theme. I am really glad that she was back in the band when I finally got to see them earlier this year.

And this, as “The Other Two”, watching one of their early BBC live performances (content warning: Bernard’s shorts):

(This blog stans Stephen Morris & Gillian Gilbert)

Nothing To Report, Captain

Ho-hum. A week. Both outside at large and in. I keep on wanting to actually get started on the ideas I have, but everything else is getting in the way; even the weekend is no real respite this time around.

But, obviously, it could be worse. waves hand to the outside world

To cheer myself up, I watched Punishment Park and Vanishing Point back-to-back. It turns out that I am really terrible at picking films to cheer myself up. But, I ask, who wouldn’t want to watch two incredibly paranoid 1970s films whilst the current President makes lists of enemies, eh? Next week, we’re doing Hamilton, but what people don’t know is that I’ll slip in the Mick Travis trilogy during the intermission. You will all appreciate the majesty of Britannia Hospital if it’s the last thing I do.

So yes, not the best week for a variety of reasons. But! After almost a decade of it sitting somewhere in a room, forlorn and unloved, this is finally assembled and in the bar:

View this post on Instagram

A post shared by Ian Pointer (@carsondial) on

Now I just have to find things to go inside it. And hopefully that won’t take another decade…

The Most Famous Day of Exit & A New Keyboard

My favourite keyboard was a random purchase in August 2002. I had the entirely sensible idea of bringing my tower (no, you read that right; not a mini-tower, but a full 2ft high tower unit) across the Atlantic Ocean when I went to UNC. My insanity did not extend to bringing all the other bits, though, so there was a trip to the Best Buy at New Hope Commons and I found a keyboard. Nothing special at all, but it turned out to an IBM keyboard that still had enough of the Model M lineage inside its DNA to be a satisfying click on every keypress. How I mourned about eight years later, after taking it back home, when the space bar finally gave up the ghost and I had to get a new keyboard.1

What I’m leading up to is that I’m currently having a Very Proper Tech Mid-Life Crisis, and I’ve bought my first mechanical keyboard. And it is glorious. CLICK-CLACK Every time. Plus it feels solid enough to cause somebody a lasting injury if you used it as a club. Which might give Bonnie ideas. But…I’m not likely to be making it home for Christmas this year, so she won’t have to hear it too much. Except for when I make sound recordings of me typing on it and send them to her…

Enfield

Obviously, it does not make you into a super-hacker, but I’ll be damned if it doesn’t feel better to be typing on something that sounds like it came right out of WarGames. Plus, it works on my iPad too!

Honestly, I’m also somewhat jazzed about having actual PgUp and PgDn keys again. I’m still cool! I swear!

Meanwhile, I have spent the nights of this week reading a collection of lectures from Bell Labs, a journal of video game development, and the diaries of a Tory Minister in the early 1980s. For the latter, it’s a surprise that even compared to this current shower, Clark comes across as an oozing chauvinistic sociopath that would be one of the first up against the wall given any justice in the world.2 And then there’s the other part of my reading for the week, where I have continued my descent into CCRU-adjacent land and into a loose collection of websites like xenogothic, hyperstition, and the venerable k-punk. Like last week’s revisiting of The Net, it reminded me of a different time, of chasing down weird and wonderful links on the Internet late at night in room A14 in Manchester (the Xenofeminist website even vaguely looks like a Geocities site circa 1999).

It’s also weird that having read one article on ‘patchwork’ this week on one of these sites, my entire twitter feed was aflame with it Thursday. Synchronicity. I’m not really sure how successful any of these ideas are in pulling them away from Land’s clutches. But still, any port (or exit) in a storm, eh?

Mister Six


  1. My sister, on the other hand, did enough dancing and cheering that I vaguely suspect that she had a hand in its demise. ↩︎

  2. The current shower of course is in thrall to a strange creature that appears to have grown up in the LessWrong forums of a decade past. I am looking forward to Keir Starmer bringing up Roko’s Basilisk at some point at PMQs. ↩︎

Remember The Future

View this post on Instagram

A post shared by Ian Pointer (@carsondial) on

As the horror ramps up again, due to some interesting decisions by our glorious President and various Governors throughout the land, it’s important to remain at least a touch optimistic. And thus this week, we’ve done some planning and preparation for the future. This has included building more of the special features for our Escape Basement (what, you haven’t built fake walls, a cage and a fireplace yet? What are you doing with your 2020?), the Grippos tacos shown above, which will be a key part of the 2020 Election Night Viewing Situation (along with All The Bourbon In Case), and, yes the first and second desserts of Thanksgiving are already planned.

The thing that I teased for June and July is now being pushed back to Christmas; I have a feeling that I will need something to occupy my time this December, as I’m pessimistic that travel will be back to normal by then. I have purchased the oh-so-important domain name, mind you, but I can’t tell you what it is yet, as the name gives the entire project away. Patience!

Meanwhile, I went down another old rabbit-hole on Friday night by watching BBC’s The Net. GASP! At the classic BBC2 idents! GROAN! As Jules gives Rise of the Robots 9/10. LAUGH! As CD-ROMs are talked up as the next big thing1. DESPAIR! As they talk about the grand future envisaged, where nobody considered the possibility of this timeline’s equivalent of Robert Linus Booth yelling us closer to chaos 280 characters at a time. And then Sadie Plant turned up and I spent two hours in bed delving back into the CCRU, The Invisibles, and my own odd thoughts about BERT and GPT-2. I’m reaching for something, but I’m not quite sure what.

My keyboard is dying, so I’ll wrap up by pointing you towards this amazing work that I must own even if I never end up playing it.


  1. Although the kicking they give to The 7th Guest is amusing and has proven correct in time. ↩︎

The Biggest Hourglass

Who needs to tell some time?

View this post on Instagram

A post shared by Ian Pointer (@carsondial) on

The Hauntology of the WEF

Forums, especially successful ones, often build up a mythology. ILX, for example, has been going for almost twenty years and has plenty of myths and legends, including a hilarious real-time thread cataloging the desecration of a couch (the incident is still celebrated today as the icon of the ILX app), the ‘so not going to happen’ photo thread, and a sadder, drawn-out catfishing, as well as the usual forum drama of long-banned members.

The Warren Ellis Forum only lasted for a short time back at the start of the century, but there has been a lot of hagiography about it over the past few years. And why not? Many careers and friendships were launched during its existence, and the comics world of today has many people in commanding positions that can trace their success right back to that forum.

And yet.

I’ve never quite managed to sort out whether this was a SOTCAA Savile transcript affair or something that actually happened. But there was a post. Maybe I saw it on the night shift when I was working at Oxford Brookes University after graduation, or maybe it was in the early morning. Anyway, it was a post containing a transcript of a webcam session. Between a woman and Ellis. And it was posted by the woman’s boyfriend. Nobody saved the post. At least nobody who’d admit to it, and if any denizens of The V-Forum had saved a copy, it would have been regularly posted in the years following1, so I don’t think a copy really does exist any more. Or even whether it was true at all, although in light of recent revelations, it’s probably right to assume the benefit of the doubt.

The ‘official’ Oral History of The WEF makes a big play of how it was a comics forum that didn’t hate women.

As far as comic spaces circa 2000 went, the WEF was an accepting and welcoming place for women, and that speaks for people who were there, the creators who came out of there, and the work we do.

And Kieron is not wrong at all. But it’s fair to say that ‘circa 2000’ is doing a lot of work in that sentence. Aside from the ‘Filthy Assistants’, the moniker given to the all-women mod staff, and Ellis’ overuse of ‘daddy’, women that didn’t quite fit2 into the WEF would often find themselves on the receiving end of this image macro:

McQueen / McGraw

But it was all ironic, you see. And it was okay! Because the forum was a welcoming forum that celebrated diversity in the comics world! Both from readers and creators! So much better than everything else around at the time! 3

I’m reminded that Larry Young, the publisher that gave Matt Fraction his start in the comics world, once referred to Jeremy Love as ‘boy’ and did not back down from it one bit when called out.

So the recent revelations of Ellis’ predatory behaviour are not a total surprise. Disappointing and enraging, but not a surprise. And, judging from a quick Twitter-stalking of names I remember from that era (aside from the obvious ones), that seems to be the consensus from almost everybody. People who weren’t on the WEF seem to be a lot more shocked.

I don’t have a lot else to say other than I believe Katie West.


  1. The V-Forum was a satellite forum of the main WEF, where all the cynical British people (and some token Americans) hung out to laugh at everybody in the WEF itself. I was mostly a lurker on the WEF and V. ↩︎

  2. As viewed by others, anyhow. I remember a user called Lorna who would take so much hassle from Larry Young. He, Brian Wood, and Fraction formed a very hyper-masculine sect of the WEF. Which the V-Forum laughed at, but let’s not kid ourselves, we thought they were cool as well. ↩︎

  3. Of course, when your competition is the Newsarama forums of 2000, it’s a very low bar indeed. ↩︎

The Man In Room A14

It’s been twenty years. At least, around that. Not sure exactly. I didn’t keep a diary, and as I keep pointing out to great disappointment to others, there are very few photos of that time, but I think either this weekend or last weekend was my final weekend of living in Manchester. At the end of the first and second years, I rushed home as soon as I possibly could, sometimes literally the day of my final exam. In the last year, I hung around for as long as I could, knowing that this was a big ending. Some of my friends that I made during that time I haven’t seen since the day I left.

St. Anselm Hall is now co-ed (though apparently it was the last hold-out of university halls in the entire country, which says…something), but still apparently has formal dinner. The thing about it is that we laughed at the Oxford-lite grasping at the time (the ‘JCR’ was an anteroom, the ‘SCR’ simply a normal room with delusions upon its station), and I’m sure everybody else who stayed there throughout the decades did too. But I did stay there for three years, in the exact same room, so it had its merits, even if when I tell people about it now they expect me to talk about a Sorting Hat1.

It often feels like yesterday. But more often these days, it seems like a different world and a completely different person. At least I eventually managed to sort out half-decent looking glasses for me. It might have been better if I had got that down before I left home mind you…

(come back in August for the second part of this tale, where I point out that the entire course of my life post-2000 was decided by Dawson’s Creek of all things. Plus! Muppets! In Space!)


  1. The Sorting Hat is obviously cancelled. ↩︎

Get In The Sea, Colston

https://i.pinimg.com/originals/d0/a6/3f/d0a63f30e5f2a1c6f746232b34d98403.jpg

Even the Judges of Mega-City One don’t obscure their badges.

Things I didn’t expect at the start of the year: assembling a backpack with equipment to cut zipties, deal with tear gas, cuts, and bruises, all topped off with a pair of burner phones to avoid Stingray harvesting. And then, less than 24 hours later, taking it all back out again out of an abundance of COVID-19 caution.

How will I describe this week in the years to come? “That was the week I got curtains!” Maybe not. But it has been two and a half years since I bought the house, so it is probably beyond time of having some. Only for two windows, though. You don’t want to rush these things.

And I’m tired again. Maybe a longer entry next week? Probably wouldn’t get your hopes up, but I’ll see what I can do.

There Is A Better World. Well, There Must Be

Well, it’s been a rather terrible week on the macro and the micro level, hasn’t it? Perhaps June will bring better tidings. Or…given that it’s 2020, things will get worse.

View this post on Instagram

🎶Life During Wartime 🎵

A post shared by Ian Pointer (@carsondial) on

It got worse. Much worse.

Image Self-Supervised Training With PyTorch Lightning

(You can also view this post in Google Colab)

Self-Supervision is the current hotness of deep learning. Yes, deep networks and transfer learning are now old hat — you need to include self-supervised somewhere if you want to get those big VC dollars. Like transfer learning, though at its core it’s a very simple idea: there is so much data in the world — how can we use it without the big expense of getting humans to label it all? And the answer really does feel like cheating. Self-supervision is essentially “get the computer to automatically add labels to all your data, train a network on that, and then use transfer learning on the task you actually want to solve.” That’s it. The only interesting bits are how to decide what labels you add to what is called the “pretext task”, but the technique is surprisingly effective, especially in image and text-based problems where the Internet provides an almost endless supply of data.

Let’s have a look at the two main approaches to image self-supervised learning that are popular right now — rebuilding the original input from a distorted input, and automatically adding labels to data and training using those synthetic labels

Reconstructing & Augmenting The Input

If you remember our look at the super-resolution architectures, they’re taking a small image and producing a larger, enhanced image. A self-supervised dataset for this problem can be fairly easily obtained by simply looking at the problem in the opposite way: harvest images from the Internet, and create smaller versions of them. You now have training images and the ground truth labels (the original images). If you’re building a model that colourizes images, then you grab colour images…and turn them into black and white ones!

You can extend this to a more general principle, where you take an image, apply a series of transforms to that image, and then train a neural network to go from the manipulated image to the original. You’ll end up with some sort of U-Net-like architecture, but after you’ve trained the network, you can throw away the ‘decoder`’ half and use the ‘encoder’ part for your actual task by adding a Linear layer or two on top of the features you obtain at the bottom of the ‘U’.

You’ll want an augmentation technique that forces the architecture to learn things like how to structure elements of images, how to in-paint missing parts of an image, correct orientations, and so on. Here’s a couple to get you started

CutOut / Random Erasing

Perhaps the easiest to apply is simply removing part of an image and getting the model to restore it. This approach is often known as CutOut, and was shown to improve model performance with classification tasks in its introductory paper “Improved Regularization of Convolutional Neural Networks with Cutout”.

And it’s rather easy to apply, because it’s now included as a torchvision transform by default! You can just use:

torchvision.transforms.RandomErasing(p, scale, ratio value, inplace)

This can be slotted into a transform pipeline as we’ve seen many times throughout the book. The parameters you can set are:

  • p — the probability of the transform taking place
  • scale — range of proportion of erased area against input image.
  • ratio — range of aspect ratio of erased area.
  • value — the value that will be used in the erased box. Default is 0. If given a single integer, that integer will be used. A tuple of length 3 will make the transform use values within for replacing R, G, and B channels. If passed the string "random", each pixel in the box will be replaced with a random value.
  • inplace — boolean to make this transform inplace. Default set to False.

In general, you’ll probably want to use the random strategy for erasing details from an image.

Crappify

Crappify is a fun idea from the fast.ai project which literally ‘crappifies’ your images. The concept is simple: pack a transform function with resizing, adding text, and JPEG artefacting, or anything else you decide to add to ruin the image, and then train the network to restore things back to the original.

Automatically Labelling Data

The full image-based based self-supervision works very well, but you could say that it’s a little wasteful in a classification task; you end up training a full U-Net and throwing half of it away. Is there a way we can we be lazier and still do self-supervision?

Yes! And it’s what we’re going to spend the rest of this section implementing. Consider this image:

Helvetica!

Okay, so it’s another picture of Helvetica the cat, but we would need a human annotator to give us the cat label. But we can take this image, transform it, and give it a meaningful label at the same time.

Helvetica! image_90

We have given this new image the label of image_90 to indicate that it has been indicated by 90º. But no human was needed in this (trivial) labelling. We can build a completely synthetic classification task, where we can build a training dataset and corresponding labels entirely programatically. We don’t need to build a U-Net block because all we’re training is a simple classification task; there’s no image reconstruction. But in order to learn how to classify correctly, the model will have to learn how to recognize what way up a cat normally is, and this pre-trained model can then be used on our actual classification task.

We’re going to build this approach to self-supervision, but we’re going to do it with a slightly higher-level framework than PyTorch. Enter PyTorch Lightning!

PyTorch Lightning, or A Little Help From The Internet

PyTorch Lightning is a wrapper around PyTorch that handles a lot of the standard PyTorch boilerplate that you end up writing for every project (e.g. training, test, and validation loops, determining whether a model should be in eval or not, setting up data, and so on). Like fast.ai, it has an extensible callback system that allows you to hook custom code during almost any part of the training cycle, so you end up with most of the power of PyTorch, but without having to rewrite train() every time you start a new project. Instead, you end up just doing this to train a custom model:

from pytorch_lightning import Trainer

model = LightningModel()
trainer = Trainer(gpus=1, num_nodes=1)
trainer.fit(model)

Some people prefer working with pure PyTorch all the time, but I definitely see a lot of value in Lightning, as it does remove a lot of the error-prone tedium of writing training code while still remaining flexible enough for research purposes. I personally write most of my deep learning code either with Lightning or fast.ai (the new fast.ai2 library even has a tiered layer of APIs that allows you to delve deeper when you need to but still use their powerful higher-level abstractions for most of your work) rather than in raw PyTorch.

Don’t worry, though, because as we’ll see, building a model with PyTorch Lightning isn’t that much different than what we’ve been doing throughout the rest of the book. We just don’t need to worry about the training quite so much!

Light Leaves, ResNet Sees

In order to demonstrate self-supervised training, we’re going to use a smaller version of ImageNet called Imagenette. This dataset contains images from 10 classes of the larger set, and was constructed by Jeremy Howard as a way of being able to quickly test new ideas on a representative sample of ImageNet rather than having to spend a considerable amount of time training on the whole thing. We’ll be using the full-sized version for our model, which means a 300Mb download. Let’s declare our imports and download Imagenette.

!pip install pytorch-lightning
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import pytorch_lightning as pl
from PIL import Image
from pathlib import Path
from torchvision import transforms
import torchvision.transforms.functional as TF
import random

!wget https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-320.tgz
!tar xvzf imagenette2-320.tgz

A Self-Supervised Dataset, As A Treat

You, sobbing: “You can’t just point at a picture and call it a label!"

Me, an intellectual, pointing at a cat rotated ninety degrees: “Label."

Even though we’re using PyTorch Lightning, we’ll construct our datasets in the usual way with the Dataset class. When an image is requested from the dataset, we will either simply return a tensor version of the image with the label 0, or randomly apply a rotational transform through 90, 180, or 270 degrees, or flipping the image’s axis either horizontally or vertically. Each of these potential transforms has a separate label, giving us six potential labels for any image. Note that we’re not doing any normalization in this pipeline to keep things relatively simple, but feel free to add the standard ImageNet normalization if you desire.

class RotationalTransform:
    def __init__(self, angle):
        self.angle = angle

    def __call__(self, x):
        return TF.rotate(x, self.angle)

class VerticalFlip:
    def __init__(self):
        pass
    def __call__(self, x):
        return TF.vflip(x)

class HorizontalFlip:
    def __init__(self):
        pass
    def __call__(self, x):
        return TF.hflip(x)

We’ll then wrap those transforms up inside a Dataset class, which will apply a chosen transformation when __getitem__ is called, as well as returning the correct label for that transform.

class SelfSupervisedDataset(object):
    def __init__(self, image_path=Path("imagenette2-320/train")):
        self.imgs = list(image_path.glob('**/*.JPEG'))
        self.class_transforms = [RotationalTransform(0), RotationalTransform(90), 
                                            RotationalTransform(180), RotationalTransform(270), 
                                            HorizontalFlip(),VerticalFlip()]
        self.to_tensor = transforms.Compose([transforms.ToTensor()])                       
        self.classes = len(self.class_transforms)

    def __getitem__(self, idx):
        img = Image.open(self.imgs[idx])
        label = random.choice(range(0, self.classes))
        img = img.convert("RGB")
        # Resize first, then apply our selected transform and finally convert to tensor
        transformed_image = self.to_tensor(self.class_transforms[label](transforms.Resize((224,224))(img)))
        return transformed_image, label

    def __len__(self):
        return len(self.imgs)

ResNet-34 Go Brrr

With our dataset completed, we’re now ready to write the LightningModule that will be the model we train on this data. Writing a model in PyTorch Lightning is not too much different from the standard PyTorch approach we’ve seen throughout the book, but there are some additions that make the class more self-contained and allow PyTorch Lightning to do things like handle training for us. Here’s a skeleton LightningModule:

class SkeletonModel(pl.LightningModule):
    def __init__(self):
        pass
    def forward(self, x):
        pass
    def train_dataloader(self):
        pass
    def training_step(self, batch, batch_idx):
        pass
    def configure_optimizers(self):
        pass
    def prepare_data(self):
        pass

As you can see, we have our familiar __init__ and forward methods, which work in exactly the same way as before. But we now also have methods for various parts of the training cycle, including setting up dataloaders and performing training and validation steps. We also have a prepare_data method which can do any preprocessing needed for datasets, as well as configure_optimizer for setting up our model’s optimizing function.

PyTorch Lightning includes hooks for lots of other parts of the training process (e.g. handling validation steps and DataLoaders, running code at the start or end of training epochs, and lots more besides), but these are the minimal parts we’ll need to implement.

Now that we know the structure, let’s throw together a model based on ResNet-34 with a small custom head. Note that we’re not using a pretrained ResNet model here; we’re going to be training from scratch. We’ll also add another method, validation_epoch_end, which will update statistics for loss and accuracy in our validation set at the end of every epoch.

class SelfSupervisedModel(pl.LightningModule):
    def __init__(self, hparams=None, num_classes=6, batch_size=64):
        super(SelfSupervisedModel, self).__init__()
        self.resnet = torchvision.models.resnet34(pretrained=False)
        self.resnet.fc = nn.Sequential(nn.Linear(512, 256), nn.ReLU(), nn.Linear(256, num_classes))
        self.batch_size = batch_size
        self.loss_fn = nn.CrossEntropyLoss()
        if "lr" not in hparams:
            hparams["lr"] = 0.001
        self.hparams = hparams

    def forward(self, x):
        return self.resnet(x)

    def training_step(self, batch, batch_idx):
        inputs, targets = batch
        predictions = self(inputs)
        loss = self.loss_fn(predictions, targets)
        return {'loss': loss}

    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=self.hparams["lr"])

    def prepare_data(self):
        self.training_dataset = SelfSupervisedDataset()
        self.val_dataset = SelfSupervisedDataset(Path("imagenette2-320/val"))
		
    def train_dataloader(self):
        return torch.utils.data.DataLoader(self.training_dataset, batch_size=self.batch_size, num_workers=4, shuffle=True)

    def val_dataloader(self):
        return torch.utils.data.DataLoader(self.val_dataset, batch_size=self.batch_size, num_workers=4)

    def validation_step(self, batch, batch_idx):
        inputs, targets = batch
        predictions = self(inputs)
        val_loss = self.loss_fn(predictions, targets)
        _, preds = torch.max(predictions, 1)
        acc = torch.sum(preds == targets.data) / (targets.shape[0] * 1.0)
        return {'val_loss': val_loss, 'val_acc': acc}

    def validation_epoch_end(self, outputs):
        avg_loss = torch.stack([x['val_loss'] for x in outputs]).mean()
        avg_acc = torch.stack([x['val_acc'].float() for x in outputs]).mean()
        logs = {'val_loss': avg_loss, 'val_acc': avg_acc}
        return {'progress_bar': logs}

Having defined the model, we can start training by using PyTorch Lightning’s Trainer class. We’ll pass in max_epochs to only train for 5 epochs with the learing rate of 0.001 (though the framework comes with lr_finder method to find an appropriate learning rate that uses the same approach that we have been using in the book so far and what’ll you’ll find in fast.ai). We’ll also need to tell the trainer how many GPUs we have available; if more than one is present and available, then the class will use as many as directed for multi-GPU training.

model = SelfSupervisedModel({'lr': 0.001})
trainer = pl.Trainer(max_epochs=5, gpus=1)
trainer.fit(model)
trainer.save_checkpoint("selfsupervised.pth")

We’ve now trained for 5 epochs on our pretraining task. What we need to now is to train on the actual task we’re trying to solve - not to classify for rotations or flipping, but to determine the ImageNet class an image belongs to. We can do this training simply by swapping out the current dataloaders for ones that returns the images and the labels for the provided Imagenette dataset. We do this using the old faithful ImageFolder:

tfms = transforms.Compose([
            transforms.Resize((224,224)),
            transforms.ToTensor()
        ])

imagenette_training_data = torchvision.datasets.ImageFolder(root="imagenette2-320/train/", transform=tfms)
imagenette_training_data_loader = torch.utils.data.DataLoader(imagenette_training_data, batch_size=64, num_workers=4, shuffle=True)

imagenette_val_data = torchvision.datasets.ImageFolder(root="imagenette2-320/val/", transform=tfms)
imagenette_val_data_loader = torch.utils.data.DataLoader(imagenette_val_data, batch_size=64, num_workers=4)

We’ll then load in our saved checkpoint, replacing the original training data with the new DataLoader, and we’ll replace the head of the classifier so it now is predicting the 10 ImageNet labels instead of our self-supervised labels. The model will be trained for a further 5 epochs on the supervised training data.

model = model.load_from_checkpoint("selfsupervised.pth")
model.resnet.fc[2] = nn.Linear(256,12)

Training will be performed using the Trainer class again, but this time we’ll pass in these new training and validation dataloaders, which will override the ones we defined in the actual class (and prepare_dataset will not be called by PyTorch Lightning during this training phase).

trainer = pl.Trainer(max_epochs=5, gpus=1)
trainer.fit(model, train_dataloader=imagenette_training_data_loader, val_dataloaders=imagenette_val_data_loader)

The model’s accuracy its final (10th) epoch of training ended up around 54%. Which isn’t too bad considering that we have only trained 5 epochs on the data itself (and did no augmentation on that pipeline). But was it worth it? Well, let’s check! If we recreate our model from scratch and just pass in the non-supervised dataloaders for training and validation, training for 10 epochs, we can have a comparison between that result and our self-supervised model.

standard_model = SelfSupervisedModel({'lr': 0.001})
trainer = pl.Trainer(max_epochs=10, gpus=1)
trainer.fit(standard_model, train_dataloader=imagenette_training_data_loader, val_dataloaders=imagenette_val_data_loader)

On my training run, it ended up with a best accuracy over 10 epochs of 33%. We can see that pre-training with our self-supervised dataset offers a greater performance despite being trained on the final task for only 5 epochs.

One Step (or more) Beyond

This has been dipping a toe into the waters of self-supervised learning. If you want to go deeper, you could experiment further with the framework in this chapter. Can you improve performance by adding other transformations to the initial pipeline, perhaps? Or augmentation in the training on the task fine-tuning stage? Or training with larger ResNet architectures?

In addition, I urge you to look contrastive learning, which is a technique where the model is trained by being shown augmented and non-augmented images and another image of a completely different class. This turns out to be another powerful way of extracting as much as you can from your existing data and, as part of Google’s SimCLR system, is currently the state-of-the-art when it comes to training models on ImageNet.

Further Reading