The Data Scientist’s somewhat surprising role as Honest Broker and Change Agent

They say that you can’t be a prophet in your own land. As someone who is typically an outsider in the organizations where I consult, I find this to be true. I find that building a model is rarely more than 10-20% of the time I spend in front of the laptop, and fully a third of my time is not spent in front of a laptop at all. This is an explanation of what I find myself doing in all of those many hours that I am not using Data Mining software, or any software. What else is there to do?

Inspire Calm: I am often greeted with the admission that my new client’s Data Warehouse is not quite as complete, nor quite as sophisticated as they would like. No one’s is! It is interesting that it is one of the first facts that is shared with me because it implies that if only they had the perfect Data Warehouse that the Data Mining project would be easy. Well, they are never easy. Important work is hard work, and no one really has a perfect Data Warehouse because IT has a hard job to do as well. So, the experienced Data Miner is in a good position to explain that the client really isn’t so far behind everyone else.

Advocate for the Analysis Team’s time within their department:  Yes, this is a full time endeavor! It is surprising how often Data Mining is confused with ad hoc queries like “How many of X did we sell in Q1 in Region A?” I am not sure where this comes from, but new Data Miners are left wondering how they can perform all six stages of CRISP-DM in time for next Tuesday’s meeting. By the time an external consulting resource is involved this confusion is largely cleared up, but sometimes a little bit of it lingers. How can the internal members perform all of their ongoing functions, and commit to a full time multi-week effort? Of course, they can’t. A bit of realism often sinks in during the first week of a project. Much better addressed earlier than later.

Inspire loftier goals:  Data Preparation is said to take 70-90% of the effort. Tom Khabaza has claimed in the 3rd of his 9 Laws of Data Mining that it always takes more that 50%. I have experienced little to convince me that this estimate is far off. The ‘let’s do something preliminary’ idea can be inefficient if you aren’t careful because on a daily basis one is making decisions about how the inputs interact. Refreshing the model on more recent data is straightforward, but if you substantively change the recipe of the variable gumbo that you are mining, you have to repeat a lot of work, and revisit a lot of decisions. It is possible, with careful planning, to minimize the impact, but you risk increasing (albeit not doubling) the data preparation time. It is ultimately best to communicate the importance of the endeavor, knock on doors, marshal resources, and do the most complete job you can right now.

Act as a liaison with IT:  An almost universal truth is that IT has been warned that the Data Miner needs their data, but IT has not been warned that the Data Miner needs their time and attention. Of course, no one wants to be a burden to another team, but some additional burden is inevitable. The analyst about to embark on a Data Mining project is going to have unanswered questions or unfulfilled needs that are going to require the IT team. The external Data Mining resource will often to have to explain to IT management that there is no way to completely eliminate this; that it is natural, and it is not the analysis team’s fault. Concurrent with that, the veteran Data Miner has to anticipate when the extra burden will occur, act to mitigate it, and try to schedule it as conveniently as possible.

Fight for project support (and data) from other departments: Certain players in the organization are expecting to be involved, like IT. Often the word has to get out that a successful Data Mining project is a top to bottom search for relevant data. Some will be surprised that it is a stone in their department that has been left unturned. They may not be pleased. Excited as they may be about the benefit that the entire company will derive, you are catching them at inopportune moment as they leave for vacation, or as a critical deadline looms. Fair warning is always wise, and it should come early. Done properly, the key player in a highly visible project gets a little (not a lot of) political capital which they should spend carefully.

Help get everyone thinking about Deployment and ROI from the start: Far too often it is assumed that the analysts are in charge of the “insights”, and the management team, having received the magic power point slides will pick it up from there, and ride the insights all the way to deployment and ROI. Has this ever happened? The Data Miner must coach, albeit gently, that a better plan must be in place, and the better planning must begin the very first week of a data mining project. Let executives play their critical role, but a little coaching is good for everyone. After all, it might be everyone’s first Data Mining project.

Fade into the background:   Everyone wants credit for their hard work, but the wise Data Miner lets the project advocates and internal customers do all the talking at the valedictory meeting.  The best place to be is on hand, but quiet. Frankly, if the Data Miner is still shoulder deep in the project, the project isn’t ready for a celebration. The “final” meeting, probably the first of many final meetings should be about passing the torch, reporting initial (or estimated) ROI, and announcing deployment details.

<This first appeared on the SACG blog, the LinkedIN group that I help moderate.>