Jun 27, 2015 · 1 minute
read
ow
no, seriously, why did you walk that far
yes, very good that you won’t be happy until the state is destroyed. good luck with that.
sometimes i do think i’m getting more right-wing with age, but no, it’s just that they’re that obnoxious
A new definition of ‘own worst enemy’: knowing that you are going to have surgery soon to alleviate your bad foot…and then walking 13 miles in two days pretty much by accident.
So, yes, I’m currently lying down and in quite a bit of pain.
The talk at the Red Hat Summit seemed to go down well, though. Even if the wireless connection decided to go down right in the middle of the demo. That was a tense minute or two, but thanks to the rather aggressive polling in the web client, it eventually worked (hurrah!).
(also, apologies to everybody I know in Boston - I didn’t know how much time I would have to myself, and how mobile I’d be, otherwise, I would have sought you all out!)
Remember, everyone: have a good long drink of Scalia’s Tears this weekend, and tune out the obnoxious leftist-radical whining about how Friday’s SCOTUS decision means ’nothing’.
Jun 22, 2015 · 3 minute
read
1.3, 1.4, and above
apache spark
apache kafka
you are trapped in a big data room. to the north, there are five trillion exits
I’m a huge fan of the reappearance of Enterprise Service Buses. They are especially great for Big Data systems and the Lambda Architecture: messages get sent to various different streams on the bus and consumers can read them in a streaming or a batch operation as desired.
(a good introduction to the idea of a Enterprise Service/Message Bus from last year)
Obviously, you wait for a decent Enterprise Service Bus/Data Stream Bus/PubSub/Messaging Log and then many come at once. One of the most popular in recent times is Apache Kafka - developed at LinkedIn to be capable of handling their huge throughput requirements. It’s quickly become a de-facto component of many a Spark Streaming or Storm solution.
In the world of Spark, though, Kafka integration has always been a bit of a pain. If you look at this guide to integrating Kafka and Spark, it’s clear that wrangling more than a simple connection to Kafka involves quite a bit of faff, having to union multiple DStreams
as they’re coming in from Kafka to increase parallelism. Spark is supposed to be easier to work with than that!
Well, in Spark 1.3, a new interface to Kafka was added. It’s still (in 1.4) marked as ‘experimental’, but I know of several companies who have been using it in producing for months, handling billions of messages per day (I imagine that it will be marked as safe in 1.5 if you’re still cautious). And it makes things so much simpler!
val ssc = new StreamingContext(new SparkConf, Seconds(5))
val kafkaParams = Map("metadata.broker.list" -> “kafka-1:9092,kafka-2:9092,kafka-3:9092”)
val topics = Set(“example-topic”, “another-example-topic”)
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)
stream.map(_._2). // and then do Spark stuff!
// ...
ssc.start()
ssc.awaitTermination()
This automatically creates a DStream
comprised of KafkaRDDs
which read in parallel from the number of Kafka partitions. No union required! As a bonus, because Spark handles the offsets that have been read, bypassing ZooKeeper, the new approach gains exactly-once semantics (with the downside that ZooKeeper no longer knows exactly what the Spark Streaming application is doing, which may cause some monitoring issues unless you manually update ZooKeeper from within Spark).
Also in 1.3 and above - batch access to Kafka! You can create KafkaRDDs
and operate on them in the usual way (a boon if you’re working on a Lambda Architecture application).
val offsetRanges = Array(
// args are: topic, partitionId, fromOffset (inclusive), untilOffset (exclusive)
OffsetRange(“example-topic”, 0, 110, 220)
)
val rdd = KafkaUtils.createRDD[String, String, StringDecoder, StringDecoder](sc, kafkaParams, offsetRanges)
(In batch mode, you currently have to manage reading from all the partitions and the offsets in the Kafka log yourself.)
Okay, so we can now do parallelism with Spark and Kafka in a much simpler manner…but an important feature of these architectures is writing results back to the bus (e.g., flagging possible fraudulent bids in a real-time auctioning system for further investigation). Unfortunately, baked-in support of this is not scheduled until 1.5 (see SPARK-4122 for more details), so for now, you have to handle the details here yourself - consider a connection pool if you find yourself doing many writes back to Kafka in a short time.
Jun 19, 2015 · 1 minute
read
my first operation by fisher-price
but anything else is communism, obviously
no, really
Good news first! I’m going to Boston again next Wednesday, for the Red Hat Summit. We’ll be doing a presentation on financial modelling.
And then the bad news…results from the MRI are in, and an appointment with a surgeon is incoming; my first operation will be involve doing things to my left foot. Things that will leave me unable to walk for a while, and rather impaired mobility for some time beyond that as well. Hurrah for being in a job where remote work is possible!
(though the upcoming surgery did mean I had to pass on a rather fancy posting today; a shame, but I’m sure there’ll be others!)
Other than that, though, quiet week. I did adult things like buy new filters for the house’s air conditioning system and sorted out the various bills that my MRI adventures have cost me so far. Fun times!
Maybe some chocolate-making this weekend…
Jun 7, 2015 · 2 minute
read
but anything else is communism, obviously
“Your insurance company has not approved this MRI yet. We can still go ahead with it, but we’ll need you to sign this waiver that holds you liable for the cost if they don’t approve.”
“Er, how much will it be?”
“…”
“Nobody asks that?”
“I’ve seen some for as low as $2,000, and some as high as $11,000.”
deep breath from the British person on the other side
“How about we reschedule until next week and see if they approve it?”
Meanwhile, back in Britain:
“Here for your MRI? This way!”
You can infer from this that I did not have my MRI this week, and thus I still don’t know what’s wrong with it, and I’m also laid up in bed after hurting it again last night and then having to drive four hours back from South Carolina on top of that. Still, a fun trip down to SC where I discovered the useful effects of trampoline parks on children (they’re so tired afterwards!), was given a geometric painting as a birthday present (yay triangles!), and practiced my Sichuan Wonton construction skills. Oh, and I saw all of Flash Gordon for the first time. Richard O’Brien was better in Jubilee, I think. As for the rest of it, my goodness, there were some awful films produced in the wake of Star Wars.
And finally, I got promoted! I’m now a Lead Consultant at Mammoth Data. I now consult in a leading way on all the Big Data things! Perhaps.
May 31, 2015 · 1 minute
read
okay
okay
okay
stop saying okay, okay
okay
Happily, work ended on a much better note this week than it began.
Not much going on here except work this week…but next week: Ian Gets His First MRI!
May 23, 2015 · 2 minute
read
HDP2.2
Storm
HBase
HDFS
but not hive unfortunately
security
kerberos
authentication
hadoop
java
Using Kerberos with Storm is, like most things involving Kerberos, an experience akin to pulling teeth with a pair of tweezers: it hurts and it goes on for a long time. Can you get the keytabs generated and into the right place, and what does that end up meaning for your Storm supervisor nodes? Wouldn’t it be lovely if Storm could simply hand out Hadoop Kerberos credentials to a topology when it is submitted and Everything Just Works™?
Well, if you’re attempting to use HBase or HDFS in your Bolts, then things are looking up for you. You can use the AutoHBase and AutoHDFS classes to do exactly that, and then the only keytab you need worry about is the one on your Nimbus server.
Except
It’s never quite that easy. Mainly, the thing you have to be aware of is this: the class hierarchy of AutoHDFS and AutoHBase have changed in the last few months, so if you’re using a platform like Cloudera, MapR, or HortonWorks, you may find yourself staring at a terminal wondering why on Earth Kerberos isn’t working…and like all things Kerberos, the errors are obtuse and unhelpful.
Anyway, the old hierarchy is:
backtype.storm.security.auth.hadoop.AutoHDFS
backtype.storm.security.auth.hadoop.AutoHBase
and the new locations are:
org.apache.storm.hdfs.common.security.AutoHDFS
org.apache.storm.hdfs.common.security.AutoHBase
Then, in your topology, update the Config.TOPOLOGY_AUTO_CREDENTIALS
with a list of all the credentials it needs access to (in this example, just HDFS, but you could simply add HBase into the autoCreds
list and it’ll have access to HBase too:
public static void main(String[] args) throws Exception {
//...
Config cfg = new Config();
List<String> autoCreds= new ArrayList<String>();
// Use this hierarchy for an older distribution, e.g. HDP 2.2
autoCreds.add("backtype.storm.security.auth.hadoop.AutoHDFS");
// This is the current hierarchy
//autoCreds.add("org.apache.storm.hdfs.common.security.AutoHDFS");
cfg.put(Config.TOPOLOGY_AUTO_CREDENTIALS, autoCreds);
// [...other topology and config setup...]
StormSubmitter.submitTopology(TOPOLOGY_NAME, cfg, builder.createTopology());
}
Then, on your Nimbus server, you need to update your storm.yaml
(this example uses the current hierarchy, but you can replace the entries with the old ones and it’ll work if you’re on a non-current version of Storm):
nimbus.autocredential.plugins.classes: [“org.apache.storm.hdfs.common.security.AutoHDFS”, “org.apache.storm.hdfs.common.security.AutoHBase”]
nimbus.credential.renewers.classes: [“org.apache.storm.hdfs.common.security.AutoHDFS”, "org.apache.storm.hdfs.common.security.AutoHBase”]
hdfs.keytab.file: "/path/to/keytab/on/nimbus"
hdfs.kerberos.principal: "superuser@EXAMPLE.com"
nimbus.credential.renewers.freq.secs : 82800
Restart your Nimbus server, submit your topology and watch Secure HDFS be authenticated without any further Kerberos nightmares! This time, at least. Kerberos is always out there, waiting. Waiting.
May 17, 2015 · 1 minute
read
though i’d be up for a version of priest’s black panther
Avengers: 2, then:
Things I liked:
- An actual rescue and evacuation of a city area where superpowers came in handy for situations other than hitting things!
- Grafting Ultron onto Stark makes a good shortcut (having to explain that, no, really, Ant-Man created in the MU made me look slightly silly)
And things that were somewhat less liked:
- Stark ruins everybody’s day and then saves the day by doing exactly the same thing (I imagine this will crop up again during Civil War, but seems somewhat lazy).
- After making such a big deal about why Pietro and Wanda really hate Stark, there’s no scene with them just having a general chat about how he supplied arms that killed their parents?
- Look! We just happened to have this big huge [REDACTED] lying around!
Also, watching the Batman vs. Superman trailer, this kept repeating in my head:
But no, Superman must be grim and gritty! Batman must have that damn exo-suit! Aren’t we just simply tired of the simple politics of Dark Knight Returns by now? Can’t we move on, like Morrison did back in 1997 in JLA?
Hn. Clark.
May 10, 2015 · 1 minute
read
sad clegg
now witness the firepower of this fully armed and operational battle station
Oh well, it only took 100 years for them to come back last time…
Apr 26, 2015 · 2 minute
read
full house
empty house
all the desserts
It’s quiet in the house tonight. I’ve had my family visiting for the past week, and the house has been a bustling hive of activity. Curtains have gone up, the door frames have been painted, fences mended, floors screwed down, power points installed front and back, signs put up, wasps eliminated. And there was also time to sit around the television as a family to watch thirteen hours of graphic violence. Oh, and I also saw Sleater-Kinney, who are indeed still awesome, and Turn Me On is every bit as vital when I heard it in 1997.
I am old
To add to the already full house, I then invited lots of friends to come and visit for a picnic (which was moved indoors after the to-be-expected rains were forecast). That involved a day of cooking, four different desserts and three more people sleeping in the house (though, granted, two of them were smaller people. Smaller people who were impressively interested in Transformers. More of this!).
Now, my family are somewhere past New England, ready to enter Canada and head across Greenland over the Atlantic. To home and waiting cats. The house is quieter, but feels as if my family has finally had more involvement with it, leaving their stamp on almost every room in the house, improving things and leaving me with a list of things to improve. Dad was even impressed with my choice of electric screwdriver.
And so, quiet evening on a long sunny Sunday evening. Knowing that I have great friends and family, even if there’s nobody here right now, there will be somebody dropping by before too long has passed.
(also, if you happen to be that person: I HAVE SO MUCH LEFTOVER FOOD AND I WILL MAKE YOU TAKE SOME)
Apr 12, 2015 · 1 minute
read
crickets
crickets