It Was 20 Years Ago Today

Be Here Now

Looking back, it’s bizarre to realize just how big a deal Be Here Now was in the UK. Yes, a long-awaited album from one of the most popular bands in Britain. Sure, sure. Even people queuing up at midnight to get their hands on it wouldn’t have been surprising, if Ignition hadn’t forced record shops to sign contracts that they wouldn’t sell the album until 8am (seriously). But all that plus a prime-time BBC1 documentary airing the night before (and let’s not forget, this is still in the ‘dark ages’ before multi-channel and Internet video was a thing)? Leading news items and almost blanket coverage on Radio 1? August 21st 1997 was an event.

On a sea of hype, it was unlikely that the album would live up to expectations, even if Creation strong-armed the music press to ensure good reviews. And lo, so it was that the album broke all sales records in the scant three days of sales it had in its first week (it took eighteen years for the record to be broken by Adele’s 25, and that only managed it by virtue of having a full seven days of sales), lighting up the charts for weeks on end…and then ending up in every charity shop in the British Isles as realization dawned.

Over the years, it has become a punchline, the moment when Oasis lost it, the point ‘where Britpop died’, the end of ‘Cool Britannia’, and the moment where Opal Fruits became Starburst (okay, that happened in 1998. BUT STILL). Now, twenty years on, we return to the Album That Ruined Everything and ask: was it really so bad?

D’You Know What I Mean?

This was released to Radio 1 on the day of our Sixth Form Leavers’ Ball. Which made it an event to mark an event.

And look, I still like this song. Even hearing it now, 20 years later, I’m reminded of the things I loved about it: a confident, swirling swagger of a song, brimming with limited promise. A band that might be on the edge of stepping forward; they were never going to make something like Sgt. Pepper, but the backmasking, the sampling of NWA (amusingly the Amen Break loop, so as traditional as you can get) seemed to show that they might have learnt a few things from their collaborations with The Chemical Brothers if nothing else.

And yes, the lyrics are the usual stuff and nonsense with swipes from songs thirty years prior, but again, here and there are flashes of humour along with the paranoia. I mean, come on, the second bridge of Liam snarling “I met my maker and made him cry” followed by Noel’s sly flanged deadpan of ‘It must have been love’ is completely unnecessary, funny, and manages to sum up their relationship in a manner that almost makes you forget how bad Noel’s lyrics can get. But don’t worry, he’ll be reminding us shortly!

My Big Mouth

My Big Mouth was one of the two new songs premiered at the 1996 Knebworth concert, edited out of the original Radio 1 broadcast (an early sign of the control that Creaton/Ignition would maintain over the Be Here Now tracks). Here, it sets the tone for the rest of the album: there’s a wall of guitar threatening to swallow everything in its path. And in this song it succeeds - you can barely hear the drums, you can barely hear the vocal, and somebody disconnected the bass guitar in the recording studio. And somehow it manages to last five minutes despite this. Not a good sign. But don’t worry, it’s going to get worse before it gets better.

Magic Pie

Even at the time, the day of queuing up to buy the thing at my local record shop, the excitement, the hype, swept up in it all, even then I knew Magic Pie was a dirge that had no basis being on the record. Twenty years on, I remain astounded that Stay Young somehow got relegated to a b-side and this wasn’t.

The organ, the noodling. the chorus that attempts to build…but can’t and just stops the song dead, which would be a problem for a song that comes in under three minutes. Or four. Definitely five.

It lasts for seven minutes and ten seconds.

Even now, I’m looking at the clock, looking at iTunes after typing this sentence and no, there’s still three bloody minutes left of this turgid song that seems so bloated that it slows down spacetime itself. And when it finally has the decency to end, it can’t just fade out, oh no. The kicker is the ‘jazz’ play out on the mellotron that makes the song about ten times more hateable.

Stand By Me

Two things about Stand By Me intrigue me. The first is this: just exactly what was Alan McGee taking when he said that this record was going to be the one that took America by storm? That’s some seriously high-quality hallucinogenics.

Secondly, did nobody during the writing and recording of this song even think about mentioning to Noel that stealing from the same bloody song for the fucking third time looked a bit desperate? Not even one person could pipe up that the clunky chords in the chorus sounded hideous?

And that’s about as interesting Stand By Me gets, I’m afraid. Not only did it fail to conquer America (shock!), it didn’t even hit Number One in the UK either, being stymied by Candle In The Wind (Dead Princess Remix).

On the twenty-year playback, this song makes me angry. So lazy, so full of contempt for anything resembling beyond a half-hearted ‘will this do?’.

I Hope, I Think, I Know

So here’s the thing. This is not a great song. It’s essentially Roll With It with a bit more jaunt. But! It has quite a few things going for it in the context of Be Here Now:

  • It is the shortest song on the album (4:22)
  • The guitars are yes, full on wall of sound, but at least you can hear Liam singing
  • The lyrics may be standard doggerel, But! It’s Oasis swagger circa Definitely Maybe. If you squint with your ears, it could be a b-side to Cigarettes and Alcohol
  • ‘You’ll never forget my name!’
  • It is actually quite fun?
  • The previous track was Stand By Me and anything would probably sound better

And so, as the years have gone by, this has become my 2nd favourite thing on the album. It knows exactly what it is and doesn’t stay any longer than it needs to. Hurrah!

The Girl In The Dirty Shirt

Meanwhile…

Okay, I was 18 years old and I loved_ The Girl In The Dirty Shirt_. In my defence, I did fall in love with Floodlit World a couple of months earlier, so I had some taste. Honest.

My God, even aside from the Manic Pixie Dream Girl overtones, there’s…just nothing here. Aside from that bloody organ dragged out from the Magic Pie recording. glances at the length Six bloody minutes. I mean, it’s not terrible…it’s just like wallpaper. Wallpaper with an organ.

Fade In-Out

And then things got weird. Yes, Johnny Depp was on the fastest selling album in British chart history. Because Noel couldn’t play the slide guitar for the recording session, they used Depp’s guitar line from the demos instead (this might, in less coke-fuelled times, have been seen as a warning sign).

Oh, and I had forgotten about the Who-esque scream part-way through. sigh

Again, there’s nothing truly terrible about Fade In-Out, but although the fair may be in town today, it does not make me want to go on the roller coaster. And it has no business being seven minutes long, but this album has already worn me down on that front.

Don’t Go Away

You might call Don’t Go Away an ashamedly transparent attempt to create another Wonderwall. I couldn’t possibly comment. I’d actually place it as an attempt to write another Cast No Shadow but with impressive attempts to pronounce ‘situation’ and ‘education’ while trapping your finger in a drawer.

However, unlike Stand By Me, this attempt at the ‘album ballad’ works where the other fails; I mean lyrically it’s clunky but the song doesn’t feel like it’s stapled to All The Young Dudes with knitting needles, and Liam’s delivery actually does feel pleading at various points. Obviously the song is bursting to overflow with strings and compressed as much as the mixing desk would allow, but it doesn’t suffer too much from either, and the coda is classic Oasis balladry, for whatever that’s worth.

Be Here Now

“Your shit jokes remind me of Digsy’s”.

I struggle. I mean, come on, this is Status Quo. With added whistling.

All Around The World

Welcome, my friends, to the song that never ends. The longest running time of a Number One in the UK, their last single of the 20th century.

This was one of the first songs that Noel ever wrote, held back until the band could afford a 36-piece orchestra to record it. I mean, come on, say what you want, the man had some vision. Again, though, you wish that somebody in the studio might have asked a couple of questions about that vision during recording.

At 2:45, the lyrics drop out. BUT THERE’S STILL SIX MINUTES LEFT. WE’RE NOT GOING TO DO THE CHORUS FOR SIX MINUTES, SURELY?

Yes. Yes we are.

First key change! Second key change! THIRD key change! The orchestra being compressed to neutron star levels of density. Liam insisting that it’s going to be okay, but it’s fine for him, he doesn’t have another five minutes of the song to face.

Everybody should listen to this once, though: you’ve never heard hubris as distilled into song form until you’ve heard All Around The World all the way through.

It’s Gettin’ Better (Man!!)

It’s Gettin’ Better (Man!!) (tragically, not a secretly ironic Amiga Power reference) was the other song debuted at Knebworth, and it is a forgettable Stones knock-off. The album peters out with a limp and a fade as the bass-less cacophony draws to a close…

All Around The World (Reprise)

…oh God, it’s sodding back. Not just a ‘reprise’, but two more minutes of All Around The World. I mean did nobody at Creation actually listen to this (yeah, I know their internal response was initially something akin to the scene in 24 Hour Party People where Factory has a playback of a lyric-less Happy Mondays album, but seriously).

It ends with a door slamming…which probably sums the album up effectively.

I was hoping that revisiting the album would reveal something worth keeping, but D’You Know What I Mean? aside, time has been even less kind to the album. And, because I suffer for all ten of you reading this, I even listened to the Mustique demos as well. Let’s just say that there wasn’t a lot to build on, though it’s clear that the desire to fill every channel with guitars did not help.

So when you see one of the 696,000 CDs bought on those first three days languishing in a dusty corner of a charity shop, nod your head to the perils of cocaine, but take solace in that you may have made mistakes in your time, but you did not stick Magic Pie on an album.

Surprise NY, NY

And that, dear readers, is how you join the @Pinboard-approved club of people blocked by Paul Graham. Bless his heart.

Odd week, somewhat broken up by travelling up to New York for two days for work. Which I’ll also have to do next week as well. Tiring, but I did manage to sneak in a visit to François Payard’s bakery, so not completely terrible.

(as opposed to LaGuardia Airport, which is just as bad as everybody says it is. Thankfully, next week I’ll be flying into JFK instead)

Seven days until flying back over the Atlantic…

Lemon, It's Wednesday

The impeccable timing of a US passport turning up during an escalating week of ‘are we the baddies?’1

After a lovely weekend in Cincinnati last week, I’m just very tired.

It’s also the sixth anniversary of my arrival to the United States today. I wish it was a happier occasion.

I promise, next week, something longer.


  1. Unfortunately, it seems likely. [return]

Planes, Trains, and Automobiles — DC Edition

The first inkling that this weekend was not going to go entirely to plan came on Tuesday. American Airlines pushed an update to my phone that evening saying ‘well…er…we think there’s going to be bad weather on Friday evening. Fancy rebooking?’ Like a fool, I ignored it, thinking that it’d all work out, and besides, if I was a little late, it’d be fine. Totally fine.

By the time I got to the airport on Friday (after sending off all evidence that I’m a US citizen in order to get a passport), the flight had already been delayed until 20:30. But the plane was in the air, so I’d still get to DC around 22:00. Not great, but not the end of the world.

We started boarding around 21:00. We got to the runway at 21:30. number one for take-off. Okay, so maybe I wouldn’t be meeting Richard tonight, but I’d get to my hotel no problem.

At which point, obviously, the plane developed a mechanical fault and returned to the gate. Mind you, it’s better to develop a faulty engine on the runway than having one 6,000 metres up. Anyhow, back to the airport we went, and a sinking feeling descended. Especially as the people in front of me in the rebooking line were getting more and more irate about being offered flights for tomorrow evening. As Richard would be flying home on Sunday, it wasn’t really good for me either.

The day was saved by a combination of a ticket agent happy to have somebody not gearing up for a fight, and Tammy, offering a helpful suggestion - there was a Megabus service that would leave Durham tonight and arrive in DC in the morning. Hurrah!

Just one tiny little catch: the bus would leave at 03:30.

Having confirmed with the ticket agent that they’d hold my return flight open for me if I didn’t rebook my outgoing flight, I headed home, had a shower, drank multiple cups of tea, and started working out how exactly I was going to get to the centre of Durham at 3am. I didn’t really come up with a good solution aside from maybe there will still be some Lyft cars about at 3am?1. Thankfully, the downtown bar scene still has stragglers at that point on a Friday, and so there I was, taking pictures of the Mutual Life building at 3am and hopping on a double-decker bus to DC. By hook or by crook, I was going to get there.

We’ll pause now to list some of history’s greatest monsters:

  • The people who sit on the aisle seat but leave the window seat clear but inaccessible
  • The people that have an hour-long call about banking processing charges where they repeat that information approximately every five minutes into the conversation
  • And the people that get on at Richmond, sit next to you and start playing music on their phone. Headphones still exist, everybody2.

Ahem. Anyway, I did get some sleep, and even arrived in DC 15 minutes than scheduled. A full day of tourism approached!

It’s easy to forget how big the Mall really is, and how impressive the monuments and the planning of central DC comes together. We spent twelve miles in the quest of rediscovering that. But it was fun to play tourist and show a friend around. I did have some impressive bad luck on that front - ending up at a Mexican chain restaurant for lunch after walking to a series of either closed or non-seating places.

I made a better choice for dinner (my first Ethiopian in a few years), but my choice of going to the nearest whiskey bar to save on walking turned out to have two slight issues: 1) its whiskey selection was not exactly stellar, and 2) the research I had done online neglected to mention that it was a gay bar. Which is fine…but I don’t think either of us expected to be chatted up in the evening, or to have somewhat risqué conversations about Bicester Village.

After finishing our drinks there, we wandered and wandered, eventually ending up here (look, it’s a local chain, okay?) for the rest of the evening.

The trip back was much easier. The plane worked, I landed at RDU, and I’m now laid up looking at developing blisters. I don’t have to go outside again until Friday…by then they should be going down some…until then, ibuprofen it is!


  1. I have had terrible luck with scheduling taxi services in Durham at early times, so I ruled them out early on. [return]
  2. Another instance of this - somebody in Target last week walking past the show section with heavy metal blaring from his leg. I mean, standards. [return]

Breaking Neural Networks, Part 1 — Turning A Frog Into A Cat

We’ve all seen the hype. Deep Learning techniques can recognize human faces in a crowd, identify seals from orbit, and classify insects from your phone! SkyNet awaits!

Well, not today, everybody. Because today is the day we fight back. But we won’t be sending back John Connor to save the day. Oh no, we’ll be using maths.

Okay, so first, we need a neural network to attack. Today, we’re not going to be sophisticated - we’re simply going to take an architecture from one of the PyTorch tutorials. It’s a very shallow ConvNet for classifying images from the CIFAR-10 dataset, but the principles here scale up to something like Inception. We train the model, again just using the training code in the tutorial, and we begin.

What is this, oh clever neural network?

Frog

Model says: Frog

(CIFAR-10 images are only 32x32 pixels, so yes, it’s a touch blocky)

What we’re going to do is change our picture of a frog just enough that the neural network gets confused and thinks it’s something else, even though we can still recognize that it’s clearly a frog, thus proving man’s continued domination over machine.

To do this, we’re going to use a method of attack called the fast gradient sign method, which was detailed back in 2014 in this paper by Ian Goodfellow , Jonathon Shlens & Christian Szegedy.

The idea is that we take the image we want to mis-classify and run it through the model as normal, which gives us an output tensor. Normally for predictions, we’d look to see which of the tensor’s values was the highest and use that as the index into our classes. But this time, we’re going to pretend that we’re training the network again and backpropagate that result back through the model, giving us the gradient changes of the model with respect to the original input (in this case, our picture of a frog).

Having done that, we create a new tensor that looks at these gradients and replaces an entry with +1 if the gradient is positive and -1 if the gradient is negative. That gives us the direction of travel that this image is pushing the model. We then multiply by a scalar, epsilion to produce our malicious mask, which we then add to the original image, creating an adversarial example.

Here’s a simple PyTorch method that returns the fast gradient sign tensors for a input batch when supplied with the batch’s labels, plus the model and the loss function used to evaluate the model:

    def fgsm(input_tensor, labels, epsilon=0.02, loss_function, model):
        outputs = model(input_tensor)
        loss = loss_function(outputs, labels)
        loss.backward(retain_variables=True)
        fsgm = torch.sign(inputs.grad) * epsilon
        return fgsm

(epsilon is normally found via experimentation - 0.02 works good for this model, but you could also use something like grid search to find the value that turns a frog into a ship)

Running this function on our frog and our model, we get a mask that looks like this:

Frog Mask

Adding that mask to the frog results in:

Frog Adversarial Example

Clearly, still a frog. What does our model say?

    model.predict() -> "cat"

Hurrah! We have defeated the computer! Victory!

Eagle-eyed readers will have noticed that in order to produce an image that fools the classifier, we need to know a lot about the model being used. We need the explicit outputs so we can run backpropagation to get our gradients, and we need to know the actual loss function the model uses to make this work. This is fine for this example, because we know everything about the model, but if we wanted to fool a classifier where we don’t have access to any of these internals, we’re stuck. Right?

Stand by for Part 2 soon, where we discover how we can extend this approach to defeat classifiers where all we get is ‘this is a frog’. We shall defeat the machines!

Two Feet High And Chomping

A post shared by Ian Pointer (@carsondial) on

And with that, I’m pretty much coming to the end of my recent buying spree of Transformers. Trypticon was one of the toys that never made it to UK shores, so I was always going to be a sucker for Hasbro’s current strategy of ‘let’s do all of G1, but this time with knees’. So I now have a two-foot dinosaur to go with my two-foot robot…but with no idea where I’m going to put them (although I guess I can delay a permanent decision there for a while).

Next week, more adult pursuits: getting my first US passport. Exciting times!

Rusted Ruby

I gave a talk this week! If you’re interested in a 10,000m look at Rust and then a delve into Helix, then here’s your slides!

Rusted Ruby from Ian Pointer

The talk suffered a bit from my usual deer-in-headlights presenting style and that I lost a week of preparation due to wisdom teeth shenanigans, but hey, I implemented a 3x speedup of Ruby’s Pathname.absolute? in less than ten minutes, which isn’t too bad…

Going to a busy six weeks coming up - as of yesterday, I have trips to DC, Cincinnati, and of course Bicester and London before the end of August. And then…and then, things get even more fraught as I begin my 2018 masterplan. Which is much less impressive than it sounds, but will involve quite a bit of upheaval. But all for the better in the end!

Hopefully, I will sleep a little more this upcoming week, too. And fewer exam stress dreams set back in Manchester. It’s been 17 years…

Finally...Tea!

I can drink hot and carbonated drinks again. Which is good, because I seem to have bought all of the Diet Coke in Durham during the past two weeks of Target sales. I may have miscalculated how much even I can drink, but you can’t beat 4 cases of 12 cans for $8.88, can you?

Other than that…it has been a quiet week. I mean, really quiet. The most exciting thing of the has likely been the discovery of 1980’s Fox. Come back next week for more exciting updates!

The Colour Problem - Neural Networks, the BBC, and the Rag-and-bone Man

tl;dr - a fully convolutional neural network trained to colourize surviving black-and-white recordings of colour Steptoe and Son episodes.

One of the most infamous parts of the BBC’s history was its practice of wiping old episodes of TV programmes throughout the years to re-use videotape. The most famous of these is, of course, Doctor Who, but it was far from the only programme to suffer in this way. Many episodes of The Likely Lads, Dad’s Army, and Steptoe And Son have been lost.

Every so often, an off-air recording is found. Recorded at the time of broadcast, these can be a wonderful way of plugging the archive gaps (who said that piracy is always a crime, eh?). If the BBC is lucky, then a full colour episode is recovered (if it was broadcast in colour). More often for the older shows, however, it’s likely that the off-air recording is in black and white. Even here, though, the BBC and the Restoration Team has been able to work magic. If a BW recording has colour artifacts and noise in the signal (known as chromadots), then the colour can be restored to the image (this was performed on an episode of Dad’s Army, ‘Room At The Bottom’).

If we don’t have the chromadots, we’re out of luck. Or are we? A couple of months ago, I saw Jeremy Howard showing off a super-resolution neural net. To his surprise, one of his test pictures didn’t just come out larger; the network had corrected the colour balance in the image as well. A week later, I was reading a comedy forum which offhandedly joked about the irony of ‘The Colour Problem’ being an episode of Steptoe and Son that only existed in a b/w recording…and I had an idea.

Bring out your data!

Most image-related neural networks are trained on large datasets, e.g. ImageNet, or CoCo. I could have chosen to take a pre-trained network like VGG or Inception and adapted it to my own needs. But…after all, the show was a classic 60s/70s BBC sitcom production - repeated uses of sets, outside 16mm film, inside video, etc. So I wondered: would it make sense to train a neural network on existing colour episodes and then get it to colourize based on what it had learnt from them?1

All I needed was access to the colour episodes and ‘The Colour Problem’ itself. In an ideal world, at this point I would have pulled out DVDs or Blu-Rays and taken high-quality images from those. As I’m not exactly a fan of the show…I don’t have any of those. But what I did have was a bunch of YouTube links, a downloader app, and ffmpeg. It wasn’t going to be perfect, but it’d do for now.

In order to train the network, what I did was to produce a series of stills from each colour episode in two formats - colour and b/w. The networks I created would train by altering the b/w images and using the colour stills as ‘labels’ to compare against and update the network as required.

For those of you interested, here’s the two ffmpeg commands that did this:

    ffmpeg -i 08_07.mp4 -vf scale=320:240 train/08_07_%06d.jpg
    ffmpeg -i 08_07.mp4  -vf format=gray traingray/08_07_%06d.png

I was now armed with 650,000 still images of the show. I never thought my work would end up with me having over half-a-million JPGs of Steptoe and Son on my computer, but here we are.

But First on BBC 1 Tonight

Having got hold of all that data, I then took a random sample of 5000 images from the 600,000. Why? Because it can be useful to work on a sample of the dataset before launching into the whole lot.

As training takes a lot less time than on the entire dataset, not only does this allow you to spot mistakes much quicker than if you were training on everything, but it can be great for getting a ‘feel’ for the data and what architectures might or might not be useful.

FSRCNN

Normally, if I was playing with a dataset for the first time, I’d likely start with a couple of fully-connected layers - but in this case, I knew I wanted to start with something like the super-resolution architecture I had seen a few weeks ago. I had a quick Google and found FSRCNN, a fully-convolutional network architecture designed for scaling up images.

What happens in FSRCNN is that the image is exposed to a series of convolutional layers which reduces the image’s complexity down to a much smaller size, whereupon another set of convolutional layers operate on that smaller data. Finally, everything goes through de-convolutional layers to scale the image back up and then to the required (larger!) size.

Here’s a look at the architecture, visualized with Keras’s SVG model rendering:

G 139862606661712 batch_normalization_2_input: InputLayer 139862606661328 batch_normalization_2: BatchNormalization 139862606661712->139862606661328 139862606661136 feature_extraction: Conv2D 139862606661328->139862606661136 139862605102608 batch_normalization_3: BatchNormalization 139862606661136->139862605102608 139862605102288 create_channels: Conv2D 139862605102608->139862605102288 139862606233552 batch_normalization_4: BatchNormalization 139862605102288->139862606233552 139862604127120 shrinking1: Conv2D 139862606233552->139862604127120 139862604465744 batch_normalization_5: BatchNormalization 139862604127120->139862604465744 139862603741328 mapping1: Conv2D 139862604465744->139862603741328 139862593615760 batch_normalization_6: BatchNormalization 139862603741328->139862593615760 139862592468624 mapping2: Conv2D 139862593615760->139862592468624 139862591437776 batch_normalization_7: BatchNormalization 139862592468624->139862591437776 139862591289104 mapping3: Conv2D 139862591437776->139862591289104 139862591688080 batch_normalization_8: BatchNormalization 139862591289104->139862591688080 139862590978320 mapping4: Conv2D 139862591688080->139862590978320 139862337313552 batch_normalization_9: BatchNormalization 139862590978320->139862337313552 139862336358672 channels: Conv2D 139862337313552->139862336358672 139862335763088 batch_normalization_10: BatchNormalization 139862336358672->139862335763088 139862334683984 expand: UpSampling2D 139862335763088->139862334683984

(The idea of shrinking your data, operating on that instead and then scaling it back up is a common one in neural networks)

I made some modifications to the FSRCNN architecture. Firstly, I wanted the output image to have the same scale as the input rather than making it bigger. Plus, I altered things to take the input with only one channel (greyscale), but to produce an RGB 3-channel picture.

Armed with this model, I ran a training set…and…got a complete mess.

mess

Well, that worked great, didn’t it? sigh

Colour Spaces - From RGB to CIELAB

As I returned to the drawing board, I wondered about my decision to convert from greyscale to RGB. It felt wrong. I had the greyscale data, but I was essentially throwing that away and making the network generate 3 channels of data from scratch. Was there a way I could instead recreate the effect of the chromadots and add it to the original greyscale information? That way, I’d only be generating two channels of new synthetic data and combining it with reality. It seemed worth exploring.

The answer seemed to be found in the CIELAB colour space. In this space, the L co-ordinate represents lightness, a* is a point between red/magenta and green, and b* is a point between yellow and blue. I had the L co-ordinates in my greyscale image - I just had to generate the a* and b* co-ordinates for each image and then combine them with the original L. Simple!

Other Colorization Models Are Available

While I was doing that research, though, I also stumbled on a paper called Colorful Image Colorization. This paper seemed to confirm my choice of moving to the CIELAB colour space and also provided a similar architecture to FSRCNN, but with more filters running on the scaled-down image. I blended the two architectures together with Keras, as I wasn’t entirely convinced by the paper’s method of choosing colours via quantized bins and a generated probability distribution.

Here’s what my architecture looked like at this point:

G 139862329103696 input_30: InputLayer 139862329102864 conv2d_133: Conv2D 139862329103696->139862329102864 139862332643408 concatenate_18: Concatenate 139862329103696->139862332643408 139862328902288 conv2d_134: Conv2D 139862329102864->139862328902288 139862328485264 batch_normalization_72: BatchNormalization 139862328902288->139862328485264 139862328560400 conv2d_135: Conv2D 139862328485264->139862328560400 139862329251280 conv2d_136: Conv2D 139862328560400->139862329251280 139862332827408 batch_normalization_73: BatchNormalization 139862329251280->139862332827408 139862327147344 conv2d_137: Conv2D 139862332827408->139862327147344 139862326933712 conv2d_138: Conv2D 139862327147344->139862326933712 139862326870096 conv2d_139: Conv2D 139862326933712->139862326870096 139862326747024 batch_normalization_74: BatchNormalization 139862326870096->139862326747024 139862325312336 conv2d_140: Conv2D 139862326747024->139862325312336 139862325098704 conv2d_141: Conv2D 139862325312336->139862325098704 139862325035088 conv2d_142: Conv2D 139862325098704->139862325035088 139862324932496 batch_normalization_75: BatchNormalization 139862325035088->139862324932496 139862329351952 conv2d_transpose_17: Conv2DTranspose 139862324932496->139862329351952 139862332895056 conv2d_143: Conv2D 139862329351952->139862332895056 139862333305104 conv2d_transpose_18: Conv2DTranspose 139862332895056->139862333305104 139862329409040 conv2d_144: Conv2D 139862333305104->139862329409040 139862333277712 conv2d_transpose_19: Conv2DTranspose 139862329409040->139862333277712 139862333579216 conv2d_145: Conv2D 139862333277712->139862333579216 139862333579216->139862332643408

And what did I get?

Sepia. Not wonderful. But better than the previous hideous mess!

Let’s Make It U-Net

Okay, so maybe Zhang et. al had a point, and I needed to include that probability distribution and the bins. But…looking at my architecture again, I had another idea: U-Net.

U-Net is an architecture that was designed for segmenting medical images, but has proved to be incredibly strong in Kaggle competitions on all sorts of other problems.

The innovation of the U-Net architecture is that it passes information from the higher-level parts of the network at the start across to the scaling-up side on the right, so structure and other information found in the initial higher levels can be used alongside information that’s passed up through the scaling-up blocks. Yay more information!

My existing architecture was basically a U-Net without the left-to-right arrows…so I thought ‘why not add them in and see what breaks?’.

I added a simple line from the first block of scaling-down filters to the last block of the scaling-up filters just to see if I’d get any benefit. And…finally, I was getting somewhere. Here’s the current architecture - the final Lambda layer is just a multiplication to bring the values of the two new channels into the CIELAB colour space for a and b:

G 139860581518608 input_8: InputLayer 139860581518736 conv2d_113: Conv2D 139860581518608->139860581518736 139860504027600 concatenate_9: Concatenate 139860581518608->139860504027600 139860581519248 conv2d_114: Conv2D 139860581518736->139860581519248 139860581125840 batch_normalization_29: BatchNormalization 139860581519248->139860581125840 139860581544016 conv2d_115: Conv2D 139860581125840->139860581544016 139860505267600 concatenate_8: Concatenate 139860581125840->139860505267600 139860579585104 conv2d_116: Conv2D 139860581544016->139860579585104 139860580130576 batch_normalization_30: BatchNormalization 139860579585104->139860580130576 139860579713488 conv2d_117: Conv2D 139860580130576->139860579713488 139860579252048 conv2d_118: Conv2D 139860579713488->139860579252048 139860578430224 conv2d_119: Conv2D 139860579252048->139860578430224 139860577930128 batch_normalization_31: BatchNormalization 139860578430224->139860577930128 139860576929616 conv2d_120: Conv2D 139860577930128->139860576929616 139860577101200 conv2d_121: Conv2D 139860576929616->139860577101200 139860577054480 conv2d_122: Conv2D 139860577101200->139860577054480 139860509186192 batch_normalization_32: BatchNormalization 139860577054480->139860509186192 139860508575056 scale_up_0: Conv2D 139860509186192->139860508575056 139860507939344 up_sampling2d_22: UpSampling2D 139860508575056->139860507939344 139860508112016 conv2d_123: Conv2D 139860507939344->139860508112016 139860506915664 conv2d_124: Conv2D 139860508112016->139860506915664 139860506456208 scale_up_1: Conv2D 139860506915664->139860506456208 139860505934864 up_sampling2d_23: UpSampling2D 139860506456208->139860505934864 139860506054352 conv2d_125: Conv2D 139860505934864->139860506054352 139860505656464 conv2d_126: Conv2D 139860506054352->139860505656464 139860505656464->139860505267600 139860505008080 scale_up_2: Conv2D 139860505267600->139860505008080 139860505450320 up_sampling2d_24: UpSampling2D 139860505008080->139860505450320 139860504819088 conv2d_127: Conv2D 139860505450320->139860504819088 139860504944912 conv2d_128: Conv2D 139860504819088->139860504944912 139860504216016 lambda_7: Lambda 139860504944912->139860504216016 139860504216016->139860504027600

(it turns out that Zhang and his team released a new paper in May that also includes U-Net-like additions, so we’re thinking on the same lines at least!)

You Have Been Watching

Here’s a clip of the original b/w of ‘The Colour Problem’ side-by-side with my colourized version:

I’m not going to claim that it’s perfect. Or even wonderful. But: my partial U-Net was trained only on 5000 stills and only for 10 epochs (taking just over an hour). That it produced something akin to a fourth-generation VHS tape with no further effort on my part seems amazing.

End of Part One

Obviously, the next step is to train the net on the full dataset. This is going to require some rejigging of the training and test data, as the full 650,000 image dataset can’t fit in memory. I’ll probably be turning to bcolz for that. At that point I’ll also likely throw the code up on GitHub (after some tidying up). I’m also moving away from Amazon to a dedicated machine with a 1080Ti card, which should speed up training somewhat. I’ll probably also take a look to see if the additions in other colourization networks provide any benefit, for example adding other nets alongside the U-Net to provide local and global hints for colour. So stay tuned for part 2!


  1. spoilers: yes, it would. [return]

A Citizen With 4 Fewer Teeth

I no longer have a green card! Or anything that identifies me as a valid citizen rather than a large, watermarked piece of paper. Which is likely worse…but better, once I get the new passport sorted out. Anyhow, the citizenship ceremony was fairly straightforward, except that our current President still hasn’t got around to recording a welcome message for new citizens, and a complete absence of him from all printed material. Not that most of us minded when we realized.

After a weekend of boardgames, Doctor Who, buying 8 cases of Diet Coke, and a longer-than-expected-dinner-because-reasons, it was Monday and time to celebrate being an American in style: getting my wisdom teeth removed. I have always been hesitant to have them out, due to one of them being apparently close to one of my jaw nerves. But the dentist here seemed very keen on yanking them out, and well, I was getting tired of the infections they kept bringing. I was informed that I have the smallest mouth she’s ever seen…but I was also one of the nicest patients she’s had. So swings and roundabouts there (everybody seems to agree on the small mouth thing…BUT NOBODY EVER SAID THAT TO ME BEFORE! But fine. Sure.).

Thankfully, it was fairly quick and easy to yank them out, but as everybody thought it would be a bad idea for me to be alone afterwards, Tammy drove me all the way back to Kentucky for the week. I spent the 8-hour car ride apologizing every five minutes, so I’m grateful that she didn’t throw me out in West Virginia.

Aside from running a fever on Wednesday and making the well-intentioned mistake of eating pizza yesterday, it hasn’t been too bad. Some pain, but not unbearable, not too much swelling, and the nerve ended up not being a problem. Hurrah! Though I am looking forward to being able to drink Diet Coke and tea again next week.

In the meantime, I am hopped up on ibuprofen, antibiotics, and percocet, whilst doing work and dying over and over on Zelda1. Tomorrow, mostly recovered, I fly home to Durham. Home for now, anyhow…


  1. Buried lede - I have a Switch! Not with the colour of joycons that I wanted, but after looking for…two months for the thing, I decided to get what was available. I am so bad at Zelda, but slightly better at Mario Kart 8. [return]