Inspiration Failure

This week, I made three different types of sweet (maple candy, white chocolate with golden syrup crunch and orange milk chocolate bars), played the same board game four times (two of which were just playing by myself), drove around five hundred miles on a round-trip to SC, discovering that apparently you can get red velvet waffles with your chicken in Columbia, finished finalizing things for the first Triangle Spark meetup on Tuesday (you should come!), began work on a semi-secret work project, trained all sorts of machine learning models…and still don’t really think I got anything done this week. I may need to recalibrate my expectations a little.

In other, odd news: since having to give up walking to work, I have lost more weight than I have in the past two years or so. Which is…strange. I have tried to be somewhat more consistent in riding the exercise bike in an attempt to make up for the lack of exercise, and I guess I’ve been much better at that than I realized.

(it’ll all go to hell after I have my operation, mind you…)


Just tired. Tired of the news back home, tired of the news back here, tired of the hipsters’ reflexive bashing of anything that happens in Durham, tired of the heat, tired of my foot giving out at random times. Tired of being tired and spending the evening at the Internet instead of doing something useful instead.

Not a great week. But the week before ended well, so maybe the coming week will be better.

North Carolina — The Fun Police

America (or at least parts of it): where you can go out and buy a gun for concealed carry, but heaven forbid that you buy a firework that shoots into the sky. At least in North Carolina, anyhow; across the border in South Carolina, they’re happy to essentially sell you small ordinance for you to fire off in your back garden.

And then on July 4th, Durham erupts in illegal fireworks. It’s an odd system.

A week of two halves, then: the first four days full of 12-hours days and misery, and the back three days finally relaxing, not staring at a computer screen, and not being woken up at 4am by a spider cricket crawling up my pyjama leg. You can probably guess which part of the week I enjoyed more.

Blackberry Sake sorbet

And I made things! Sorbets, vegan meringues, caramelized soy milk pudding (yes, I know, but believe me, it tastes about 500x better than it sounds, and I’m thinking about using the caramelized soy milk to make a vegan ganache in the near future), deep-fried cheese, and soy nuggets slathered in ssamjang.

Everything should be covered in ssamjang.

(This stems from finally getting to go to Kokyu’s sandwich shop this Friday. The ssamwich is essential and you should beat a path there for weekday lunch sometime)

During a whirlwind visit to Durham, Tammy followed through on her determination to dazzle paint yet another piece of my furniture, so I now have a wonderful dazzle table sitting on my porch. I will not stop until the entire house clashes with itself. Wait until you see the rugs I’m looking at muahahahahahaha.

Still no news on the foot. Maybe this week…

The Delicious Salty Taste of Scalia’s Tears

A new definition of ‘own worst enemy’: knowing that you are going to have surgery soon to alleviate your bad foot…and then walking 13 miles in two days pretty much by accident.

So, yes, I’m currently lying down and in quite a bit of pain.

The talk at the Red Hat Summit seemed to go down well, though. Even if the wireless connection decided to go down right in the middle of the demo. That was a tense minute or two, but thanks to the rather aggressive polling in the web client, it eventually worked (hurrah!).

(also, apologies to everybody I know in Boston - I didn’t know how much time I would have to myself, and how mobile I’d be, otherwise, I would have sought you all out!)

Remember, everyone: have a good long drink of Scalia’s Tears this weekend, and tune out the obnoxious leftist-radical whining about how Friday’s SCOTUS decision means ’nothing’.

Spark and Kafka - Getting Cozier

I’m a huge fan of the reappearance of Enterprise Service Buses. They are especially great for Big Data systems and the Lambda Architecture: messages get sent to various different streams on the bus and consumers can read them in a streaming or a batch operation as desired.

(a good introduction to the idea of a Enterprise Service/Message Bus from last year)

Obviously, you wait for a decent Enterprise Service Bus/Data Stream Bus/PubSub/Messaging Log1 and then many come at once. One of the most popular in recent times is Apache Kafka - developed at LinkedIn to be capable of handling their huge throughput requirements. It’s quickly become a de-facto component of many a Spark Streaming or Storm solution.

In the world of Spark, though, Kafka integration has always been a bit of a pain. If you look at this guide to integrating Kafka and Spark, it’s clear that wrangling more than a simple connection to Kafka involves quite a bit of faff, having to union multiple DStreams as they’re coming in from Kafka to increase parallelism. Spark is supposed to be easier to work with than that!

Well, in Spark 1.3, a new interface to Kafka was added. It’s still (in 1.4) marked as ‘experimental’, but I know of several companies who have been using it in producing for months, handling billions of messages per day (I imagine that it will be marked as safe in 1.5 if you’re still cautious). And it makes things so much simpler!

val ssc = new StreamingContext(new SparkConf, Seconds(5))

val kafkaParams = Map("" -> “kafka-1:9092,kafka-2:9092,kafka-3:9092”)
val topics = Set(“example-topic”, “another-example-topic”)
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics) // and then do Spark stuff!
// ...  

This automatically creates a DStream comprised of KafkaRDDs which read in parallel from the number of Kafka partitions. No union required! As a bonus, because Spark handles the offsets that have been read, bypassing ZooKeeper, the new approach gains exactly-once semantics (with the downside that ZooKeeper no longer knows exactly what the Spark Streaming application is doing, which may cause some monitoring issues unless you manually update ZooKeeper from within Spark).

Also in 1.3 and above - batch access to Kafka! You can create KafkaRDDs and operate on them in the usual way (a boon if you’re working on a Lambda Architecture application).

val offsetRanges = Array(
      // args are: topic, partitionId, fromOffset (inclusive), untilOffset (exclusive)
      OffsetRange(“example-topic”, 0, 110, 220)
val rdd = KafkaUtils.createRDD[String, String, StringDecoder, StringDecoder](sc, kafkaParams, offsetRanges)

(In batch mode, you currently have to manage reading from all the partitions and the offsets in the Kafka log yourself.)

Okay, so we can now do parallelism with Spark and Kafka in a much simpler manner…but an important feature of these architectures is writing results back to the bus (e.g., flagging possible fraudulent bids in a real-time auctioning system for further investigation). Unfortunately, baked-in support of this is not scheduled until 1.5 (see SPARK-4122 for more details), so for now, you have to handle the details here yourself - consider a connection pool if you find yourself doing many writes back to Kafka in a short time.

  1. the cited difference between a data stream bus/log and an enterprise bus seems to be that traditional enterprise buses tended to do transformations on the bus itself, whereas systems like Kafka are much simpler and leave it up to the consumers to transform data (and possibly write it back to the bus under a different topic). [return]

The Adventure Continues

Good news first! I’m going to Boston again next Wednesday, for the Red Hat Summit. We’ll be doing a presentation on financial modelling.

And then the bad news…results from the MRI are in, and an appointment with a surgeon is incoming; my first operation will be involve doing things to my left foot. Things that will leave me unable to walk for a while, and rather impaired mobility for some time beyond that as well. Hurrah for being in a job where remote work is possible!

(though the upcoming surgery did mean I had to pass on a rather fancy posting today; a shame, but I’m sure there’ll be others!)

Other than that, though, quiet week. I did adult things like buy new filters for the house’s air conditioning system and sorted out the various bills that my MRI adventures have cost me so far. Fun times!

Maybe some chocolate-making this weekend…

Insurance Adventures

“Your insurance company has not approved this MRI yet. We can still go ahead with it, but we’ll need you to sign this waiver that holds you liable for the cost if they don’t approve.”

“Er, how much will it be?”


“Nobody asks that?”

“I’ve seen some for as low as $2,000, and some as high as $11,000.”

deep breath from the British person on the other side

“How about we reschedule until next week and see if they approve it?”

Meanwhile, back in Britain:

“Here for your MRI? This way!”1

You can infer from this that I did not have my MRI this week, and thus I still don’t know what’s wrong with it, and I’m also laid up in bed after hurting it again last night and then having to drive four hours back from South Carolina on top of that. Still, a fun trip down to SC where I discovered the useful effects of trampoline parks on children (they’re so tired afterwards!), was given a geometric painting as a birthday present (yay triangles!), and practiced my Sichuan Wonton construction skills. Oh, and I saw all of Flash Gordon for the first time. Richard O’Brien was better in Jubilee, I think. As for the rest of it, my goodness, there were some awful films produced in the wake of Star Wars.

And finally, I got promoted! I’m now a Lead Consultant at Mammoth Data. I now consult in a leading way on all the Big Data things! Perhaps.

  1. Yes, yes, there might be some waiting due to it being a non-urgent scan, but I wouldn’t have had that conversation, and I would have gone to the doctor earlier anyhow. So there. [return]

Gary Oldman - Spirit Animal

Happily, work ended on a much better note this week than it began.

Not much going on here except work this week…but next week: Ian Gets His First MRI!

Autoforwarding Security Credentials In Storm

Using Kerberos with Storm is, like most things involving Kerberos, an experience akin to pulling teeth with a pair of tweezers: it hurts and it goes on for a long time. Can you get the keytabs generated and into the right place, and what does that end up meaning for your Storm supervisor nodes? Wouldn’t it be lovely if Storm could simply hand out Hadoop Kerberos credentials to a topology when it is submitted and Everything Just Works™?

Well, if you’re attempting to use HBase or HDFS in your Bolts, then things are looking up for you. You can use the AutoHBase and AutoHDFS classes to do exactly that, and then the only keytab you need worry about is the one on your Nimbus server.


It’s never quite that easy. Mainly, the thing you have to be aware of is this: the class hierarchy of AutoHDFS and AutoHBase have changed in the last few months, so if you’re using a platform like Cloudera, MapR, or HortonWorks, you may find yourself staring at a terminal wondering why on Earth Kerberos isn’t working…and like all things Kerberos, the errors are obtuse and unhelpful.

Anyway, the old hierarchy is:

and the new locations are:

Then, in your topology, update the Config.TOPOLOGY_AUTO_CREDENTIALS with a list of all the credentials it needs access to (in this example, just HDFS, but you could simply add HBase into the autoCreds list and it’ll have access to HBase too:

public static void main(String[] args) throws Exception {    
    Config cfg = new Config();
    List<String> autoCreds= new ArrayList<String>();
    // Use this hierarchy for an older distribution, e.g. HDP 2.2
    // This is the current hierarchy
    cfg.put(Config.TOPOLOGY_AUTO_CREDENTIALS, autoCreds);
    // [...other topology and config setup...]
    StormSubmitter.submitTopology(TOPOLOGY_NAME, cfg, builder.createTopology());

Then, on your Nimbus server, you need to update your storm.yaml (this example uses the current hierarchy, but you can replace the entries with the old ones and it’ll work if you’re on a non-current version of Storm):

nimbus.autocredential.plugins.classes: [“”, “”] 

nimbus.credential.renewers.classes: [“”, "”]

hdfs.keytab.file: "/path/to/keytab/on/nimbus" 
hdfs.kerberos.principal: "" 
nimbus.credential.renewers.freq.secs : 82800

Restart your Nimbus server, submit your topology and watch Secure HDFS be authenticated without any further Kerberos nightmares! This time, at least. Kerberos is always out there, waiting. Waiting.

Avengers 2 - Getting Too Old For This

Avengers: 2, then:

Things I liked:

  • An actual rescue and evacuation of a city area where superpowers came in handy for situations other than hitting things!
  • Grafting Ultron onto Stark makes a good shortcut (having to explain that, no, really, Ant-Man created in the MU made me look slightly silly)

And things that were somewhat less liked:

  • Stark ruins everybody’s day and then saves the day by doing exactly the same thing (I imagine this will crop up again during Civil War, but seems somewhat lazy).
  • After making such a big deal about why Pietro and Wanda really hate Stark, there’s no scene with them just having a general chat about how he supplied arms that killed their parents?
  • Look! We just happened to have this big huge [REDACTED] lying around!

Also, watching the Batman vs. Superman trailer, this kept repeating in my head:

Flex Mentallo

But no, Superman must be grim and gritty! Batman must have that damn exo-suit! Aren’t we just simply tired of the simple politics of Dark Knight Returns by now? Can’t we move on, like Morrison did back in 1997 in JLA?

Hn. Clark.