I was late to the party with Isaacson’s Steve Jobs biography. What prompted me to read it was the release of The Innovators. Of course, the story is fascinating, but even more fascinating to me is how fascinating it is to everyone else. What is the about the products and the brand that creates this deep and worldwide interest in Steve Jobs? My interest is primarily an intellectual journey that began more than a year ago – how do you manage innovation? How do you think outside the box, but still keep the trains running on time? At the risk of disappointment, I haven’t  found an answer that would fit in a blog post. What I have done, however, is make a dent in several books both by reading and via audible.com. I’ve found that I enjoy listening to an opening chapter before bed, and if I get hooked, then I get the print version, and ultimately review it.

I haven’t finished these, nor reviewed them yet, but I’ve read enough of each to summarize my reactions here:

Becoming Steve Jobs: This new book is a bit more fun to read (or listen to) than the Isaacson book, but just as thorough in its own way. The growing consensus is that Jobs reads less like a jerk in this one. Beyond the fact that I’ve had my fill of reviews where this whole “jerk” thing is emphasized, I think it is an oversimplification. Becoming Steve Jobs focuses on the later half of Jobs’ life more than Isaacson. I think most of us can relate to becoming a different person in midlife than we were in our 20s. By emphasizing that time period, it is natural that a wiser, more gentle Jobs emerges, at least compared to the Isaacson book. Apparently Tim Cook did not like the official biography very much, which comes through in some quotes from the new book. This interview with the authors seems to do a pretty good job of summarizing the controversy surrounding the first book: Fox Interview: Authors of Becoming Steve Jobs

The New York Times has a pretty thorough review.

Isaacson’s Steve Jobs: Unless you’ve been off the grid, and on sabbatical, you’ve had your chance to check out this book. Since I listened to the audiobook first, I recently found that there are tons of gently read used hardcovers out there. I think reflects the worldwide enthusiasm as well as the fact that many of us don’t always read the books that we buy. It is a good book. I didn’t leave the experience of listening to this book thinking any less of him. He had regrets. He was tough on those around him. He wasn’t always likable, but another intellectual hero of mine, Frank Lloyd Wright, exhibited just horrible behaviors at various points in his long life. I don’t praise their behaviors, but their accomplishments are just too interesting, and I want the whole story. The whole story, in the case of the Isaacson book includes much more on the very young Steve Jobs as a youth and high school student as well as his college years. That whole period, importantly, includes his friendship with Steven Wozniak. I enjoy a peak into the psychological side, so the first part of the book was of great interest to me. If you are more interested in Steve Jobs’ ultimately triumphant return to Apple, then the more recent book seems to cover than ground a bit more thoroughly in no small part because the Apple team rallied to ‘correct the record’ as they perceived it.

The New York Times reviewed it shortly after it coming out, but for an always fascinating perspective check out Gladwell’s review.

Also, the full hour 60 Minutes was dedicated to Jobs after the book’s release.

You can’t beat Charlie Rose who interviewed Isaacson about the Jobs biography as well as The Innovators.

The Innovators: Am I the only one that was a little disappointed with this one? Isaacson has gotten so much attention from the official biography that all of his books are getting attention, and deservedly so. However, my interest in the inner psychology of the biographical subject leaves me wanting here. There are so many characters in this historical narrative – literally dozens – that no one story gets enough. For instance, the unsatisfying scraps and tidbits on Turing prompted me to leave The Innovators sitting on the end table while I rushed to my laptop to get a better Turing biography. Obviously, his intent was not depth on each subject, but again and again I found myself looking for more. Alan Hodges’ Enigma, which was the basis for the recent film The Imitation Game, was much more to my liking. My disappointment with The Innovators actually prompted me to read the Isaacson’s Steve Jobs on the presumption that his many Isaacson fans can not be wrong, and, sure enough, I preferred the full biography to the short overview of the Apple story in The Innovators.

I recommend this interview. In it Isaacson acknowledges that he didn’t think he could get away with a longer book. I think that he suspects that a longer book could have been justified, but that his readership wouldn’t be patient with it. I wish he had gone the route of a longer book. I think that folks can handle an 800 page book if the subject matter is compelling.

The version 23 edition of Programming and Data Management book for IBM SPSS Statistics has just been released. I’m excited that this great resource has stayed up to date, and so soon after the release of version 23.

IBM developerWorks summarizes the content well: The book covers data management using the IBM SPSS Statistics command language, programming with IBM SPSS Statistics and Python or R, IBM SPSS Statistics extension commands, and IBM SPSS Statistics for SAS programmers.

Importantly, there is a ton of supporting materials including all of the source code, and it can all be found here: Programming and Data Management book

The original author of this valuable book is Raynald Levesque, although I think a number of IBMers help in keeping it up to date. His site has recently been updated with all kinds of new content. I happen to believe that unless you’ve been using macros for decades, you shouldn’t be investing much time in them. You really should be learning python. With that caution, please do visit his wonderful site: spsstools.

IBM has released the latest version of 23. I got to see a sneak peak at last year’s IBM Insight conference so I have been excited for the release. Also, I am writing two books using version 23 to be released this year. Spatio-Temporal analysis is a big part of this year’s release theme. In Modeler’s release notes they are also referring to “Geospatial Analytics”. The basic idea is to look at a specific defined space (zip code, county, grid zone, etc.) through the lens of slices of time. You might learn interesting things like that car break ins occur overnight, but that suburban home break ins occur during the workday. The new techniques have names like Temporal Causal Modeling and Spatio-Temporal Prediction. This is not just a single algorithm, in other words. This is a whole new analysis approach category.

An SPSS Statistic feature that is pretty exciting is expanded reporting options including the new Web Report. A Version 22 feature that I think has gotten too little attention is “Style Output” which many of us have written scripts to do. It is in the menus now, and has been for a while.

I won’t attempt to review the releases here, but will try to do so within the next week or so.

The SPSS Statistics trial can be found here. Info are Version 23 (and other recent versions) new features is also on the IBM website.

Additional exciting news is that SPSS Modeler also has a trial download page, and info on the upgraded features.

My Amazon page announcing the upcoming books discusses them in more detail.

I’m leaving for Las Vegas tonight for IBM Insight. It is one of the best chances all year to meet with the legacy SPSS Inc. folks at IBM, and learn about what is headed our way in 2015 in SPSS. Also, of course, there is always a lot of IBM news. The subtitle this year is “The Conference for Big Data and Analytics”. I’m not a fan of the phrase “Big Data” as everybody who uses the term uses it differently, but I am very interested in IBM BigInsight and what it might mean for those of us that use Modeler. There are sessions about SPSS Statistics 23. There is no release announced, but I would imagine that this is an early sign that a new version is coming soon. There are no sessions for Modeler 17, so the lack of sessions might also be a hint as to where we are in the development cycle. Watson is going to have a real presence at the show.

I will be at the conference bookstore at noon on Monday to sign copies of the IBM SPSS Modeler Cookbook.

Kevin Spacey will be on hand. I love House of Cards, and I’m a fan, so that will be fun. So curious to know what the topic will be and if there are going to try to tie in his presentation into a conference theme. That might be a stretch, but he is such a pro that I’m sure it will be a good talk. No Doubt will be performing. I’m usually wiped out after a day of sessions – I go to something in almost every time slot – so I’m not sure that I will partake. They definitely try to put on a good show, though.

IBM has really formalized the process of watching from home. They call it InsightGO.

I will try to Tweet a few times a day.

PACKT has just posted an sampling of four recipes that I curated from the entire book. I think they are a fun sampling. Here I’ve written a little bit about my rationale for choosing the recipes that I did. Enjoy.

From Chapter Two, Data Preparation: Select I’ve chosen Using the Feature Selection node creatively to remove, or decapitate, perfect predictors, to illustrate this. It happens to be one of mine. It is not difficult, but it uses a key feature in an unexpected way.

From Chapter 6, Selecting and Building a ModelNext-best-offer for large data sets is our representative of the pushing the limits category. Most of the documentation of his subject uses a different approach that while workable on smaller data sets, is not scalable. We were fortunate to have Scott Mutchler contribute this recipe in addition to his fine work in Chapter 8.

From Chapter Seven, Modeling – Assessment, Evaluation, Deployment, and Monitoring, Correcting a confusion matrix for an imbalanced target variable by incorporating priors, by Dean Abbott, is a great example of the unexpected. The Balance Node is not the only way to deal with an out of balance target. Also from Chapter Seven, I’ve chosen Combining generated filters. This short recipe definitely invokes that reaction of “I didn’t know you could do that!” It was provided by Tom Khabaza.

Depending on how you count, it has been a two year process. It was that long ago that I asked Tom Khabaza if he would consider taking on the challenge of an Introductory Guide to SPSS Modeler (aka Clementine). We had a number of spirited discussions via Skype and a flurry of email exchanges, but we were both so busy that we barely had time to have planning meetings much less write in between them. I don’t know if our busy consulting schedules made us the best candidates or the worst candidates for undertaking the first 3rd party book on the subject.

Some weeks after starting our quest, I got a LinkedIN message from an acquisitions editor at PACKT – a well established publisher of technical books in the UK. She wondered if I would consider writing a book about Modeler. I replied the same day. In fact, I replied in minutes because I had been working online. She had a very different idea for a book, however. She recommended a ‘Cookbook’. A large number of small problem solving ‘recipes’. Tom and I felt there was still a need for an Introductory book. (Look for it in Q1 of 2014). Nonetheless we were intrigued. Encouraged by the publisher we got to work again, but in a different direction. Believing, naively, that more authors made it easier, I recruited one more, then two more, and then, eventually, a third additional author. I can now tell you, that five authors does not make it easier. However, it does make it better. I am very proud of the results.

We cover a wide variety of topics, but all the recipes have a focus on the practical step by step application of ‘tricks’ or non-obvious solutions to common problems. From the Preface: “Business Understanding, while critical, is not conducive to a recipe based format. It is such an important topic, however, that it is covered in a prose appendix. Data Preparation receives the most attention with 4 chapters. Modeling is covered, in depth, in its own chapter. Since Evaluation and Deployment often use Modeler in combination with other tools, we include somewhat fewer recipes, but that does not diminish its importance. The final chapter, Modeler Scripting, is not named after CRISP-DM phase or task, but is included at the end because its recipes are the most advanced.”

Perhaps our book it a bit more philosophical than most analysis or coding books. Certainly, the recipes are 90% of the material, but we absolutely insisted on the Business Understanding section: “Business objectives are the origin of every data mining solution. This may seem obvious, for how can there be a solution without an objective? Yet this statement defines the field of data mining; everything we do in data mining is informed by, and oriented towards, an objective in the business or domain in which we are operating. For this reason, defining the business objectives for a data mining project is the key first step from which everything else follows.” Weighing in at 20 pages, it is a substantial addition to a substantial eight chapter book with dozens of recipes including multiple data sets, and accompanying Modeler streams.

I am also terribly proud of my coauthors. We have a kind of mutual admiration society going. I am pleased that they agreed to coauthor with me. They, I suspect, were glad that they didn’t have to play the administration role that I ended up with. In the end, we produced a project where each one of us learned a great deal from the others. Our final ‘coauthor’ was kind enough to write a Forward for us, Colin Shearer. “The first lines of code for Clementine were written on New Years Eve 1992, at my parents’ house, on a DEC Station 3100 I’d taken home for the holidays.” 

Colin has been a part of the story of Modeler from the very beginning, so we were terribly pleased to have him support us in this effort. All 6 of us have run into each other repeatedly over the years. The worldwide Modeler community was a very small one 15 years ago when most of us were learning Modeler. (Tom has a bit of lead on the rest of us.) With IBM’s acquisition of SPSS Inc. some years ago, the community has rapidly grown. From the Forward: “The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art.”

The book is being released in November, just a few weeks away. More information on the book, including a prerelease purchase opportunity, can be found on the PACKT website.

More information on the authors can be found here:

Scott Mutchler and I are the managers of the Advanced Analytics Team at QueBIT.

Dean Abbott is President of Abbott Analytics.

Meta Brown blogs at MetaBrown.com

More information about Tom Khabaza can be found at Khabaza.com

KDD 2013

After many years of trying to align my calendar and travel schedule I have finally made it. I am at kdd 2013 in Chicago.

As I have always feared, it is very academic in nature – lots of graduate student papers and the like. There is not a whole lot of focus on application here. Nonetheless I think it is important to monitor what our friends in the Computer Sciences are up to. So far I have been to a Big Data Camp and a workshop focusing on Healthcare. I have been constantly reminded of the vast gap between my clients – software end users – and the academic researchers. The distance between them is matched by the gap between the software users and their colleagues. Colleagues who don’t care terribly much about the software, but must understand the solution. I feel like a fragile bridge between these very different worlds. I won’t be able to justify coming every year, but I needed to experience this first hand.



My calendar has finally allowed me to attend the Ohio State Center for Public Health Practice's summer program. They had one full length weekend course. I just complete the first day of David Hosmer's Survival Analysis class. The class follows the content of his text (coauthored with Stanley Lemeshow and Susanne May).

The class is bit intense to be honest at more than 200 slides per day, and clocking in at almost 8 hours of content. There are breaks, of course, but class started at 8:30 and ended at just a few minutes to 5. Since most reading my blog would be in industry and coming off a full week be forewarned. Having issued the warning, however, I learned a great deal. I've taught chapter length treatments of this subject in SPSS Inc's old three day Advanced Stats class. That 90 minutes of material clearly had to leave plenty of detail out. Even at a full two days, Dr. Hosmer has to leave plenty of material out of the discussion. Some of the highlights of the experience included learning more about options in Stata and SAS, and when not to trust defaults – topics that just didn't fit my presentation on the subject.

I expect to post again when I've had a chance to reduce some of my lessons learned to writing. In the meantime make a note to check out the 2014 program! It is held around this time of year each year.

IBM has just released a new SPSS brand product. I have numerous friends in the SPSS community, and I have been a frequent beta tester, but I didn't know in advance about this release. It does resemble something that I saw demonstrated at last year's IOD. What to make of this product? It is web based, and looks pretty slick: Analytic Catalyst. There is also a video on YouTube. I like the visuals, and I agree that it looks easy to use. I'm anxious to try it, and might recommend it in certain client situations.

Never forgetting that the lion's share of a Data Mining project's labor is spent on Data Prep, and since I've never been on a project that didn't need Data Prep, I think that a tool like this is most useful after a successful Data Mining project is complete. For instance, I worked like crazy on a recent churn project, but after the project the marketing manager had to explore high churn segments to come up with intervention strategies. This could be used for that purpose. Or perhaps 'repurposed' for that since the video seems to indicate that it would be used in the early stages of a project.

My reaction, not a concern exactly, is the premise. It seems to assume that the problem is business users tapping directly into Big Data to explore it, searching for 'insight'. I don't think most organizations need more insight. I think they need more deployed solutions – solutions that have been validated that are inserted into the day to day running of the business. My two cents.

As of today, I have joined QueBIT Consulting as VP and General Manager of the Advanced Analytics team. I will have the exciting task of building a world class team of SPSS Experts. Joining the team with me will be Scott Mutchler of Big Sky Analytics

Here is today's press release.