Oct 31, 2016 · 1 minute read
spark big data beam storm heron flink all the other big data keywords
I did a talk this week at All Things Open, wherein I discovered that references to Blur albums from 1994 don’t necessarily resonate with an American audience. I spend at least 15 minutes adding that ’S’, you know (a good part of that was Photoshop crashing, mind you).
But wait, there’s more! Having received @b0rk’s debugging zine this week, I wanted to produce something for my talk. Due to…well, let’s call it interest in the upcoming election, I didn’t want to create a big zine, but how about an 8-page single sheet zine full of helpful hints on developing with Apache Spark?
But wait! Wait! Yes, there’s even more! Because if you’ve opened that PDF, you’ll have noticed that it’s actually two pages! If you print it duplex, you get a fold-out poster which has operational tips and tricks! Zounds!
(notably, the zine has information which I chopped out of the original talk’s 1hr 8m length, including fun things like upgrading Spark Streaming applications, some command-line GC options, and a brief discussion on broadcast/accumulator variables)
This year, I’ve worked in San Francisco and in a skyscraper in New York. All I’m saying is that as far as childhood goals go, 2017 will have to involve me holding a comic bearing a writer’s credit in order to top 20161. However, what I didn’t really consider when constructing these dreams of being a fancy developer / consultant in a NY skyscraper is that I’m scared of heights. Terrified, even. Which makes for an interesting meeting in the ‘corner office’ with floor-to-ceiling windows. Or even just attempting to close the curtains in my hotel.
Also, I still find it amusing how astounded people are when I use public transport. Next time, I’ll wander around with a copy of Vignelli’s MTA map to do it properly.
Oh, and I booked my Christmas flights home. I’ll be back in the UK from December 17th until December 28th. I probably won’t make it to the south Bank this time around, sadly! Maybe next year…
Tonight’s blog was brought to you by Shimura Curves. Just because.
Admittedly, in most other respects, 2016 has been a tyre fire. But you knew that already.
Just another 16 days. Just days away. This election has broken me, and I spent 2000 on the Ops night shift at Oxford Brookes following the Bush v. Gore shenanigans. And you’ll remember that I got so freaked out at Obama’s abysmal first debate performance in 2012 that I sent a long angry message to Obama For America at 2am that night.
This time, I am sleeping. However, I’m managing that simply through Ambien, which I didn’t have access to four years ago. Otherwise, I have succumbed to Deep US Election Psychosis. Every minute from getting up to going to bed is overwhelmed by the need to check Twitter or elsewhere to see if there’s a new poll, something being said at a rally, or what the guy occupying the Ecuadorian Embassy’s women’s toilets is coming out with today (spoilers: likely to be tinged with anti-Semitism!). Then, once in bed, there’s a thirty minute period whereupon I continue to check my phone until I feel the pull of the Ambien.
This is not healthy. The only real respite for the past month has been sleep and the periods when I’ve been flying (and that leads to me wandering through an airport trying to check 400 tweets whilst attempting to catch a taxi). Just another 16 days to go.
This is all a lead up to me pointing out that I’m giving a talk at All Things Open this Thursday. Come and watch me talk about Apache Spark and see if you can spot the points where I’m thinking “But wait! WHAT IF 538 HAS UPDATED THEIR FORECAST WHILST I’M EXPLAINING HOW TO CONFIGURE THE G1GC GARBAGE COLLECTOR?”1 There will also be a take-home mini-zine for 30 people! And a PDF of it for everybody else afterwards.
Right, back to Twitter. On November 9th…I may be more sane. A little, anyhow.
One sound you really don’t want to hear at 20:30 is a very loud ‘craaaack’ from somewhere just outside your bedroom window. You know, the side where there is an overhanging and overgrown aged tree on the curb.
Thankfully, the large chunk of wood managed to land in a position where it didn’t crush my fence or (more importantly) my roof.
As many of you know, I’m not the biggest fan of change, especially when housing is concerned. When I was at Manchester, I didn’t just spend three years in the same Hall of Residence. Every year, I got the form for reapplication and ticked ‘yes, I want to stay in Room A14 for another year, thank you very much’.
After university, I returned home to my bedroom (also known to my American friends as ‘essentially Harry Potter’s room under the stairs’) and stayed there for nine years with a brief year-long gap in NC.
But whilst scrubbing the hardwood floors this week, I realized that I have stayed in this house longer than anywhere else except Avon Crescent. Which seems like a big deal. And to celebrate, the house kept the mice away for a few days.
I can’t say that I believe I’ll be staying within these walls for as long as I did at Avon Crescent, but I’m here for now. Also, my family is coming back next June, so the house will be full again for a short time!
Sep 25, 2016 · 1 minute read
really, they said this
Heroes starts playing.
“This is U2!”
Welcome to the start of the ‘KIDS THESE DAYS’ stage of my life.
Another great weekend in Kentucky. No board games this time…but I did end up having an hour-long conversation about them on the flight home this morning. Odd, but fun!
Oh, and Chicago this week too! It’s been a busy one.
Which is basically a code phrase for ‘lots happened this week, but I’ve been travelling all week and I’m now very tired and mentally freaking out about the debate tomorrow night, so there’s not a big chance of getting a big update out of me right now.’
BUT STAY TUNED. Because next week I will tell you my adventures of tracking down a FAX MACHINE in Durham, North Carolina! THRILL AT THE EXPECTATION.
Sep 18, 2016 · 2 minute read
it's just dust in my eye When I'm finished over here If you're not finished with me
If you hadn’t heard, Allo Darlin’ announced this week that they were splitting up, with a final farewell concert to be held in December (sadly, I don’t think I’ll be getting home in time for it!).
I first heard Tallulah on the back of Simon Sweeping The Nation’s post back in January 2012. I was in Marina Del Rey, lost in a Courtyard Marriott, walking to Santa Monica every morning to Activision’s offices. I was miserable being back after the Christmas break, and this song just hit me like an anvil as I was getting dressed that morning.
I'm wondering if I've already heard all the songs that'll mean something
And I'm wondering if I've already met all the people that'll mean something
And there I sat, half-dressed, trying not to cry, at least 4000 miles away from anybody I knew, and 6000 miles from home. Thus automatically disproving the first assertion above, but hey, it was 7am and I had only just got out of the shower.
Given that the band now occupies various different countries, the break-up is not entirely surprising. Given that I’m the US, I feel very lucky that I managed to see them on two different tours on this side of the Atlantic.
Sep 17, 2016 · 1 minute read
columbo the sweeney! the sweeney! and yet, how many episodes have I watched?
After careful consideration, and around thirty episodes later, I think I don’t like the original 70s era of Columbo all that much. My mind keeps comparing it to The Sweeney, broadcast on ITV roughly around the same time…and, okay, maybe it’s not a fair comparison, but Columbo is slow, plodding, and not much fun. Even a cruise ship episode guest-starring Robert Vaughn, Patrick Macnee, and Dean Stockwell (with amazing bushy eyebrows) involving a comedy routine where Mrs. Columbo was just out of view every five minutes struggled to rise above ‘interesting’. Falk is good, obviously, but removing the structure of ‘whodunnit’ means that instead of suspense and a guessing game, you get to watch Columbo irritate the murderer for an hour.
I’m guessing it played better in weekly episodes.
Look, I’m still not entirely well, and the nights are long.
Sep 11, 2016 · 4 minute read
8.2% mmmm food trucks no, really, the plot of Goodnight Sweetheart is what???
Last week, I left you with a couple of posts on flame graphs and that was your lot (BUT HEY, somebody might find them useful!). However, in the background, many things happened. I’ve started a new position at Kogentix, and I spent last Friday getting divorced (a relatively painless procedure involving a somewhat sassy judge).
I had fully prepared to spend Labor Day Weekend locked up in my room watching Thames TV idents from the 1970s (I do have worrying form for this, you know), but thankfully that was not to be. Tammy made a surprise visit to the area and that led to food trucks, baking, and time-travelling bigamy. What more could you ask for in a Bank Holiday1 weekend?
If I move away from this area, I will definitely miss the food truck rodeos. We have them so often that it’s easy to get jaded; indeed I don’t think I’ve been to one for over a year. I remember when it was just 13 trucks and everybody was excited about the ill-fated Grilled Cheese Bus. These days, it’s over 50, including Food Network winners and people attempting to charge $14 for three tacos. Ah, Durham.
Then more people! We met up with Christie and Ashley and had a great day wandering the rodeo, eating everything from fried cauliflower to bulkogi cooked in Cheerwine. And then my first visit to Ponysaurus, which seems like a great place and if I actually drank beer, I’d probably go more, considering it’s about four minutes away from home (GENTRIFICATION IS COMING).
Although I did not spend the weekend watching idents, there was a theme of British television. Just somewhat more recent. People watched Very British Problems and laughed as they recognized me far too often, along with me having to point out who Vic Reeves is. Which led to me doing a grand showing of this classic segment of television:
After that, it was time to catch up on the new series of Great British Bake-Off, whereupon I shouted loudly at the TV. “HAVE YOU NEVER HONESTLY NEVER SEEN A JAFFA CAKE IN YOUR LIFE?” may have been a refrain during certain parts.
And then. “Would you mind if we watched this British comedy?” Given that the last time I suggested something for a group to watch, it ended up being Jubilee, I am astounded that people said yes. And that’s how I ended up having to explain the backstory of Goodnight Sweetheart to an incredulous American audience. “Wait, the main character is a bigamist? And this was a popular show in the 1990s?”
Britain: A Strange Country.
Anyway. Yes, it was corny. Yes, about 73% of the jokes fell flat while the studio audience sounded like they had been exposed to nitrous oxide for a few hours before taping. Yes, the show deeply misses Dervla Kirwan2. And yet, the revival had something. The mixture of the past catching up in the 60s with the plagiarized songs being released and the cultural shock of 2016 was interesting. Perhaps I relate to the cultural shock of returning home every year to find things have changed in subtle and not-so-subtle ways each time. In the case of Bicester, that normally means more houses, but sometimes you go back to find out that the bus stop you used for around seven years to get to work has turned into a giant Sainsbury’s and when I get home this Christmas the Big Tesco will have gone…replaced by the visage of a Tesco Extra standing on the countryside like a monolith from 2001. Or at least that’s what I’m told, anyhow. So, despite all the corniness, I’d be interested in seeing another series of that.
So, a great last Summer weekend. Unfortunately, I’ve spent this week drinking Lemsip by the case and fighting some sort of infection. It is mostly cleared now, but I’ve gone through three boxes of tissues in less than two days. Hopefully next week will finally see the back of it!
I spelt it ‘Labor’ Day. Let me have this.
I’ve been working on a few toy Rust programs and libraries of late. One of these is dolby, an implementation of Adaptive Count-Min Sketch as detailed in this paper from Microsoft Research. CMS data structures are designed to deal with the problem of counting things at scale - when your incoming data stream is (essentially) infinite, how can you reason about the data using only finite resources? CMS algorithms and structures get around that by using probability - they can give you approximate answers.
Running an algorithm like this at scale is likely to amplify any inefficiencies in the implementation, and knocking off a few seconds, milliseconds, or even microseconds off the running time of a loop or other section of the program can yield massive rewards. And a great way to identify potential improvements or discovering problematic issues is with flame graphs.
(The following was all done on a Ubuntu 16.04 machine, but don’t worry! I’m not using any fancy BPF tricks or anything - as long as you can run a recent-ish perf on your machine (mine is 4.4.15), you should be able to replicate this fairly easily.)
In order to get useful information out of our profiling tools, we need to tell the Rust compiler (rustc) to include DWARF debugging symbols in the binary it is going to create. To do this, we add a small section to our cargo.toml file:
debug = true
Once we have built the binary with cargo build, we can do a system-wide profile using perf. Here I’m running a test against 10 million entries being added to a dolby data structure:
root# perf record -ag ~/dolby/target/debug/dolby
(the -a flag ensures profiling capture on all CPUs, the -g flag switches on stack-chain recording, which is needed to build the flame graph structure)
Opening rust.svg in a browser gives us a fancy flame graph!
Looking at the graph, it seems that the majority of the time dolby is on-CPU, it is running the insert method. Which makes sense, seeing as the test harness is firing 10m numbers into the structure! We can also see by walking up the chain that insert spends most of its time in the hash_index method, and most of the samples inside hash_index are actually while Rust is doing hashing calculations within the murmurhash3 crate. Changing the hash function to something less expensive than murmurhash3 may therefore result in improved insert performance.
Anyhow - flame graphs in Rust! They’re easy! Use them!