The Delicious Salty Taste of Scalia’s Tears

A new definition of ‘own worst enemy’: knowing that you are going to have surgery soon to alleviate your bad foot…and then walking 13 miles in two days pretty much by accident.

So, yes, I’m currently lying down and in quite a bit of pain.

The talk at the Red Hat Summit seemed to go down well, though. Even if the wireless connection decided to go down right in the middle of the demo. That was a tense minute or two, but thanks to the rather aggressive polling in the web client, it eventually worked (hurrah!).

(also, apologies to everybody I know in Boston - I didn’t know how much time I would have to myself, and how mobile I’d be, otherwise, I would have sought you all out!)

Remember, everyone: have a good long drink of Scalia’s Tears this weekend, and tune out the obnoxious leftist-radical whining about how Friday’s SCOTUS decision means ’nothing’.

Spark and Kafka - Getting Cozier

I’m a huge fan of the reappearance of Enterprise Service Buses. They are especially great for Big Data systems and the Lambda Architecture: messages get sent to various different streams on the bus and consumers can read them in a streaming or a batch operation as desired.

(a good introduction to the idea of a Enterprise Service/Message Bus from last year)

Obviously, you wait for a decent Enterprise Service Bus/Data Stream Bus/PubSub/Messaging Log1 and then many come at once. One of the most popular in recent times is Apache Kafka - developed at LinkedIn to be capable of handling their huge throughput requirements. It’s quickly become a de-facto component of many a Spark Streaming or Storm solution.

In the world of Spark, though, Kafka integration has always been a bit of a pain. If you look at this guide to integrating Kafka and Spark, it’s clear that wrangling more than a simple connection to Kafka involves quite a bit of faff, having to union multiple DStreams as they’re coming in from Kafka to increase parallelism. Spark is supposed to be easier to work with than that!

Well, in Spark 1.3, a new interface to Kafka was added. It’s still (in 1.4) marked as ‘experimental’, but I know of several companies who have been using it in producing for months, handling billions of messages per day (I imagine that it will be marked as safe in 1.5 if you’re still cautious). And it makes things so much simpler!


val ssc = new StreamingContext(new SparkConf, Seconds(5))

val kafkaParams = Map("metadata.broker.list" -> “kafka-1:9092,kafka-2:9092,kafka-3:9092”)
 
val topics = Set(“example-topic”, “another-example-topic”)
 
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)

stream.map(_._2). // and then do Spark stuff!
// ...  
ssc.start()
ssc.awaitTermination()

This automatically creates a DStream comprised of KafkaRDDs which read in parallel from the number of Kafka partitions. No union required! As a bonus, because Spark handles the offsets that have been read, bypassing ZooKeeper, the new approach gains exactly-once semantics (with the downside that ZooKeeper no longer knows exactly what the Spark Streaming application is doing, which may cause some monitoring issues unless you manually update ZooKeeper from within Spark).

Also in 1.3 and above - batch access to Kafka! You can create KafkaRDDs and operate on them in the usual way (a boon if you’re working on a Lambda Architecture application).


val offsetRanges = Array(
	  // args are: topic, partitionId, fromOffset (inclusive), untilOffset (exclusive)
      OffsetRange(“example-topic”, 0, 110, 220)
)
 
val rdd = KafkaUtils.createRDD[String, String, StringDecoder, StringDecoder](sc, kafkaParams, offsetRanges)

(In batch mode, you currently have to manage reading from all the partitions and the offsets in the Kafka log yourself.)

Okay, so we can now do parallelism with Spark and Kafka in a much simpler manner…but an important feature of these architectures is writing results back to the bus (e.g., flagging possible fraudulent bids in a real-time auctioning system for further investigation). Unfortunately, baked-in support of this is not scheduled until 1.5 (see SPARK-4122 for more details), so for now, you have to handle the details here yourself - consider a connection pool if you find yourself doing many writes back to Kafka in a short time.


  1. the cited difference between a data stream bus/log and an enterprise bus seems to be that traditional enterprise buses tended to do transformations on the bus itself, whereas systems like Kafka are much simpler and leave it up to the consumers to transform data (and possibly write it back to the bus under a different topic). ↩︎

The Adventure Continues

Good news first! I’m going to Boston again next Wednesday, for the Red Hat Summit. We’ll be doing a presentation on financial modelling.

And then the bad news…results from the MRI are in, and an appointment with a surgeon is incoming; my first operation will be involve doing things to my left foot. Things that will leave me unable to walk for a while, and rather impaired mobility for some time beyond that as well. Hurrah for being in a job where remote work is possible!

(though the upcoming surgery did mean I had to pass on a rather fancy posting today; a shame, but I’m sure there’ll be others!)

Other than that, though, quiet week. I did adult things like buy new filters for the house’s air conditioning system and sorted out the various bills that my MRI adventures have cost me so far. Fun times!

Maybe some chocolate-making this weekend…

Insurance Adventures

“Your insurance company has not approved this MRI yet. We can still go ahead with it, but we’ll need you to sign this waiver that holds you liable for the cost if they don’t approve.”

“Er, how much will it be?”

“…”

“Nobody asks that?”

“I’ve seen some for as low as $2,000, and some as high as $11,000.”

deep breath from the British person on the other side

“How about we reschedule until next week and see if they approve it?”

Meanwhile, back in Britain:

“Here for your MRI? This way!”1

You can infer from this that I did not have my MRI this week, and thus I still don’t know what’s wrong with it, and I’m also laid up in bed after hurting it again last night and then having to drive four hours back from South Carolina on top of that. Still, a fun trip down to SC where I discovered the useful effects of trampoline parks on children (they’re so tired afterwards!), was given a geometric painting as a birthday present (yay triangles!), and practiced my Sichuan Wonton construction skills. Oh, and I saw all of Flash Gordon for the first time. Richard O’Brien was better in Jubilee, I think. As for the rest of it, my goodness, there were some awful films produced in the wake of Star Wars.

And finally, I got promoted! I’m now a Lead Consultant at Mammoth Data. I now consult in a leading way on all the Big Data things! Perhaps.


  1. Yes, yes, there might be some waiting due to it being a non-urgent scan, but I wouldn’t have had that conversation, and I would have gone to the doctor earlier anyhow. So there. ↩︎

Gary Oldman - Spirit Animal

Happily, work ended on a much better note this week than it began.

Not much going on here except work this week…but next week: Ian Gets His First MRI!

Autoforwarding Security Credentials In Storm

Using Kerberos with Storm is, like most things involving Kerberos, an experience akin to pulling teeth with a pair of tweezers: it hurts and it goes on for a long time. Can you get the keytabs generated and into the right place, and what does that end up meaning for your Storm supervisor nodes? Wouldn’t it be lovely if Storm could simply hand out Hadoop Kerberos credentials to a topology when it is submitted and Everything Just Works™?

Well, if you’re attempting to use HBase or HDFS in your Bolts, then things are looking up for you. You can use the AutoHBase and AutoHDFS classes to do exactly that, and then the only keytab you need worry about is the one on your Nimbus server.

Except

It’s never quite that easy. Mainly, the thing you have to be aware of is this: the class hierarchy of AutoHDFS and AutoHBase have changed in the last few months, so if you’re using a platform like Cloudera, MapR, or HortonWorks, you may find yourself staring at a terminal wondering why on Earth Kerberos isn’t working…and like all things Kerberos, the errors are obtuse and unhelpful.

Anyway, the old hierarchy is:

backtype.storm.security.auth.hadoop.AutoHDFS
backtype.storm.security.auth.hadoop.AutoHBase

and the new locations are:

org.apache.storm.hdfs.common.security.AutoHDFS
org.apache.storm.hdfs.common.security.AutoHBase

Then, in your topology, update the Config.TOPOLOGY_AUTO_CREDENTIALS with a list of all the credentials it needs access to (in this example, just HDFS, but you could simply add HBase into the autoCreds list and it’ll have access to HBase too:

public static void main(String[] args) throws Exception {    
	//...
    Config cfg = new Config();
    List<String> autoCreds= new ArrayList<String>();
    
    // Use this hierarchy for an older distribution, e.g. HDP 2.2
    autoCreds.add("backtype.storm.security.auth.hadoop.AutoHDFS");
    // This is the current hierarchy
    //autoCreds.add("org.apache.storm.hdfs.common.security.AutoHDFS");
    cfg.put(Config.TOPOLOGY_AUTO_CREDENTIALS, autoCreds);
    
    // [...other topology and config setup...]
    
    StormSubmitter.submitTopology(TOPOLOGY_NAME, cfg, builder.createTopology());
  }

Then, on your Nimbus server, you need to update your storm.yaml (this example uses the current hierarchy, but you can replace the entries with the old ones and it’ll work if you’re on a non-current version of Storm):

nimbus.autocredential.plugins.classes: [“org.apache.storm.hdfs.common.security.AutoHDFS”, “org.apache.storm.hdfs.common.security.AutoHBase”] 

nimbus.credential.renewers.classes: [“org.apache.storm.hdfs.common.security.AutoHDFS”, "org.apache.storm.hdfs.common.security.AutoHBase”]

hdfs.keytab.file: "/path/to/keytab/on/nimbus" 
hdfs.kerberos.principal: "superuser@EXAMPLE.com" 
nimbus.credential.renewers.freq.secs : 82800

Restart your Nimbus server, submit your topology and watch Secure HDFS be authenticated without any further Kerberos nightmares! This time, at least. Kerberos is always out there, waiting. Waiting.

Avengers 2 - Getting Too Old For This

Avengers: 2, then:

Things I liked:

  • An actual rescue and evacuation of a city area where superpowers came in handy for situations other than hitting things!
  • Grafting Ultron onto Stark makes a good shortcut (having to explain that, no, really, Ant-Man created in the MU made me look slightly silly)

And things that were somewhat less liked:

  • Stark ruins everybody’s day and then saves the day by doing exactly the same thing (I imagine this will crop up again during Civil War, but seems somewhat lazy).
  • After making such a big deal about why Pietro and Wanda really hate Stark, there’s no scene with them just having a general chat about how he supplied arms that killed their parents?
  • Look! We just happened to have this big huge [REDACTED] lying around!

Also, watching the Batman vs. Superman trailer, this kept repeating in my head:

Flex Mentallo

But no, Superman must be grim and gritty! Batman must have that damn exo-suit! Aren’t we just simply tired of the simple politics of Dark Knight Returns by now? Can’t we move on, like Morrison did back in 1997 in JLA?

Hn. Clark.

The Not-So-Strange Death of Liberal England

Oh well, it only took 100 years for them to come back last time…

And Then There Was One

It’s quiet in the house tonight. I’ve had my family visiting for the past week, and the house has been a bustling hive of activity. Curtains have gone up, the door frames have been painted, fences mended, floors screwed down, power points installed front and back, signs put up, wasps eliminated. And there was also time to sit around the television as a family to watch thirteen hours of graphic violence. Oh, and I also saw Sleater-Kinney, who are indeed still awesome, and Turn Me On is every bit as vital when I heard it in 1997.

I am old

To add to the already full house, I then invited lots of friends to come and visit for a picnic (which was moved indoors after the to-be-expected rains were forecast). That involved a day of cooking, four different desserts and three more people sleeping in the house (though, granted, two of them were smaller people. Smaller people who were impressively interested in Transformers. More of this!).

Now, my family are somewhere past New England, ready to enter Canada and head across Greenland over the Atlantic. To home and waiting cats. The house is quieter, but feels as if my family has finally had more involvement with it, leaving their stamp on almost every room in the house, improving things and leaving me with a list of things to improve. Dad was even impressed with my choice of electric screwdriver.

And so, quiet evening on a long sunny Sunday evening. Knowing that I have great friends and family, even if there’s nobody here right now, there will be somebody dropping by before too long has passed.

(also, if you happen to be that person: I HAVE SO MUCH LEFTOVER FOOD AND I WILL MAKE YOU TAKE SOME)

crickets

crickets