Depending on how you count, it has been a two year process. It was that long ago that I asked Tom Khabaza if he would consider taking on the challenge of an Introductory Guide to SPSS Modeler (aka Clementine). We had a number of spirited discussions via Skype and a flurry of email exchanges, but we were both so busy that we barely had time to have planning meetings much less write in between them. I don’t know if our busy consulting schedules made us the best candidates or the worst candidates for undertaking the first 3rd party book on the subject.
Some weeks after starting our quest, I got a LinkedIN message from an acquisitions editor at PACKT – a well established publisher of technical books in the UK. She wondered if I would consider writing a book about Modeler. I replied the same day. In fact, I replied in minutes because I had been working online. She had a very different idea for a book, however. She recommended a ‘Cookbook’. A large number of small problem solving ‘recipes’. Tom and I felt there was still a need for an Introductory book. (Look for it in Q1 of 2014). Nonetheless we were intrigued. Encouraged by the publisher we got to work again, but in a different direction. Believing, naively, that more authors made it easier, I recruited one more, then two more, and then, eventually, a third additional author. I can now tell you, that five authors does not make it easier. However, it does make it better. I am very proud of the results.
We cover a wide variety of topics, but all the recipes have a focus on the practical step by step application of ‘tricks’ or non-obvious solutions to common problems. From the Preface: “Business Understanding, while critical, is not conducive to a recipe based format. It is such an important topic, however, that it is covered in a prose appendix. Data Preparation receives the most attention with 4 chapters. Modeling is covered, in depth, in its own chapter. Since Evaluation and Deployment often use Modeler in combination with other tools, we include somewhat fewer recipes, but that does not diminish its importance. The final chapter, Modeler Scripting, is not named after CRISP-DM phase or task, but is included at the end because its recipes are the most advanced.”
Perhaps our book it a bit more philosophical than most analysis or coding books. Certainly, the recipes are 90% of the material, but we absolutely insisted on the Business Understanding section: “Business objectives are the origin of every data mining solution. This may seem obvious, for how can there be a solution without an objective? Yet this statement defines the field of data mining; everything we do in data mining is informed by, and oriented towards, an objective in the business or domain in which we are operating. For this reason, defining the business objectives for a data mining project is the key first step from which everything else follows.” Weighing in at 20 pages, it is a substantial addition to a substantial eight chapter book with dozens of recipes including multiple data sets, and accompanying Modeler streams.
I am also terribly proud of my coauthors. We have a kind of mutual admiration society going. I am pleased that they agreed to coauthor with me. They, I suspect, were glad that they didn’t have to play the administration role that I ended up with. In the end, we produced a project where each one of us learned a great deal from the others. Our final ‘coauthor’ was kind enough to write a Forward for us, Colin Shearer. “The first lines of code for Clementine were written on New Years Eve 1992, at my parents’ house, on a DEC Station 3100 I’d taken home for the holidays.”
Colin has been a part of the story of Modeler from the very beginning, so we were terribly pleased to have him support us in this effort. All 6 of us have run into each other repeatedly over the years. The worldwide Modeler community was a very small one 15 years ago when most of us were learning Modeler. (Tom has a bit of lead on the rest of us.) With IBM’s acquisition of SPSS Inc. some years ago, the community has rapidly grown. From the Forward: “The authors of this book are among the very best of these exponents, gurus who, in their brilliant and imaginative use of the tool, have pushed back the boundaries of applied analytics. By reading this book, you are learning from practitioners who have helped define the state of the art.”
The book is being released in November, just a few weeks away. More information on the book, including a prerelease purchase opportunity, can be found on the PACKT website.
More information on the authors can be found here:
Scott Mutchler and I are the managers of the Advanced Analytics Team at QueBIT.
Dean Abbott is President of Abbott Analytics.
Meta Brown blogs at MetaBrown.com
More information about Tom Khabaza can be found at Khabaza.com