Want to get certified in Data Mining?

Options for formal Data Mining training in university settings are exploding. Just during the week that I have been collecting resources, and putting this together new programs have been announced. Some years ago I looked into what was then a very thin selection of Data Mining university based programs. I took one online class, and was unimpressed. I gave up my search. As the years passed, and my portfolio of real world projects grew, it seemed to make less and less sense for me. You won’t find too many folks that have been doing this more than 10 years with a formal degree in Data Mining because the movement to certify and credential Data Miners is relatively new. I welcome it. We need more Data Miners. The selection is much broader now, and some of the offerings seem very promising. Here are some questions to ask if you are considering it.

Is the program really teaching Data Mining? How do they define it?

Here is my definition, and here is Meta Brown's great discussion of Tom Khabaza's nine laws.

Do I want software certification, a university certificate, or a Master’s?

Both SPSS and SAS have software certifications. Neither certification process will be sufficient to make you a competent data miner, but the investment in time and money is modest compared to a Masters. In both cases, it is an investment of a couple of hundred dollars for the exam, and then taking some corporate training classes. I am quite familiar with the training options for the SPSS certification exam. The classes are good, but focus mostly on the "point and click" aspect which is what the exam covers as well. The SAS certification exam options seem similar. IBM SPSS Modeler doesn't seem to have a strong self study option. SAS does seem to have done a better job with relatively inexpensive self study material dedicated to exam prep. 

Monster shows high demand for proficiency in both SPSS and SAS in general, so the IBM SPSS Modeler and SAS Enterprise Miner exams would seem to be good bets. See Bob Muenchen discussion of "Software Popularity" for an analysis of this demand. Two recent Spotfire posts have addressed the same topic: Data Geek shortage and Data Geeks are "Hotter than Hot". It is worth noting than Meta Brown thinks that the problem isn't a shortage, but that recruiters aren't finding qualified analysts

Some universities have added Data Mining "certificates". Stanford’s offering looks impressive, and it involves 3 courses. Central Connecticut State University was one of the first to offer a Master’s and its Data Mining certificate program is 5 course, 18 credit hours program. KDNuggets maintains a list. Obviously less money than a Masters, but it is too early to know how human resources departments, and recruiters will respond to these. I think it is an interesting option for someone that already has a Stats degree, but wants to differentiate themselves. Is it worth as much as $10,000 to do that? Could you assist someone on a real world project instead?

A Master’s program, and there are now dozens, is going to be approximately 33 credit hours (11 classes), so it is obviously a greater commitment. KDnuggets maintains a list of all the university options. University of Tennessee at Knoxville just announced an Analytics Masters combined with an MBA. Northwestern's Masters in Predictive Analytics was announced earlier this year, and it just about to start its first classes.

Questions to ask in evaluating a program:

1) What software will be used?

Last year IBM announced the creation of a partnership with DePaul. I can't find anything on the site that explicitly mentions IBM SPSS Modeler, but it is bound to be an IBM SPSS Modeler friendly place. SAS has a strong affiliation with Institute for Advanced Analytics at North Carolina State University, which awards a Masters. There is a similarly affiliated certificate program at Oklahoma state. It is unlikely that use of anything other than SAS would be allowed on assignments because they are explicitly designed to teach SAS for university credit. I think it might be a good thing to be forced to learn another software package thereby earning credit and a tangible skill. On the other hand, at midnight, with a deadline drawing close on a capstone project you might regret that decision if the class denied you access to what you already know well. Northwestern's Master of Science in Predictive Analytics, which has a capstone project, explicitly allows the use of SPSS, SAS, or R. Students are expected to learn all three, but can use any of the three of their capstone project.

2) Who will be teaching you?

Here, I must admit, I get a bit skeptical. Are there enough university faculty with actual field experience in Data Mining? There are some, to be sure. Reviewing the CVs of the faculty, most of these programs have great faculty in their fields. But that is the catch. Are they competent in Stats AND Data Mining. Are they competent in Data Warehousing AND Data Mining. Am I being a bit unfair? Well, a Masters program might run you $30,000 – $40,000. Are you just trying to impress your future employer or do you really want to master your craft? There are lots of programs out there now. Be prepared to ask the tough questions. If you are required to take an online "Stats 101", it might be from an adjunct statistician that may or may not be a Data Miner. I am almost certain that the best of the programs will have some faculty that are Data Miners. Some of them, in fact, are pretty impressive. Why should a Stats professor in a Data Mining program be required to have done Data Mining in a corporate setting? For experienced Data Miners, the question almost answers itself. After all, you always have the option of a Stats Masters with a Data Mining course or two. Frankly, on this note, in reviewing faculty backgrounds, I don't think any of the other programs can compete with Stanford. The faculty in the three course certificate programs truly are Data Miners.

3) Do you really need credit? Do you really need the degree?

These programs are popular precisely because earning a degree can increase you attractiveness to employers. If you can learn a marketable skill at the same time all the better. However, what if you are already established in a related field. Maybe you already have a Masters. You might want to pursue just the skill. It is cheaper and quicker, but you don't get the degree. It is a big decision because some employers might favor a candidate with a Masters. If you can live without a Masters in Data Mining, then there are lots of corporate options. SPSS and SAS have their aforementioned corporate training options. Competitors like Salford Systems and Statsoft's Statistica have training programs. There are also tool neutral training vendors like Predictive Analytics World, Statistics.com, or The Modeling Agency. There is a big difference between being a customer and being a student. When you are a customer, the old adage that "the customer is always right" kicks in. If you have a bad experience you might be able to retake a class, or work out a complaint in another way. Years ago, when I gave this a try, I had a bad experience. The professor did not make himself available for questions, and was very slow to provide feedback. It was very difficult to pursue it. There was literally no system in place. It was clear that in that particular venue the philosophy was "the professor is always right".

4) What will be the quality of the online experience?

Some asynchronous online classes might be nothing more than assigned reading with assignments. When CCSU first produced its Data Mining programs, it was in this format, and it seems that it still is. A video does not assure a good experience, but if it is going to be just readings, you will want to make sure the experience will work for you. What will your colleagues in class be like? Will you be interacting with them in meaningful ways? Most programs have video presentations now. It is remarkable to me that some of the sample lectures online are not very good. The lectures themselves are usually competent, but some are uninspiring, and many are very poorly produced. The sound can be poor, the professors often walk out of the frame. When students in the room participate they are in the form of mysterious invisible voices. There is lots of competition now, so you should be a critical consumer. I would ask enough questions to be certain that you are going to get: good lecture material in some form, meaningful assignments, rapid quality feedback, thoughtful exams, good customer service, software support, and job placement.

5) Is money a factor?

Money is probably always a factor. On the cheapest end of the scale are: a one day workshop at a conference, a single online class, or self study for an exam like SAS'. An option like this is going to be less than $1,000. That might get someone's attention on LinkedIN, but it probably won't be enough to be really competent. I have taught this kind of material to hundreds and hundreds of folks. I don't think one class does it. I had a good experience with a one day R workshop at Predictive Analytics World, but I brought more than a decade of experience to that workshop. I just didn't know R. It was fun, but I certainly didn't master R in a day. Having said that, I think you can learn a lot in the equivalent of a couple of weeks study, especially if you already work in a related field. So something like the SPSS or SAS class series leading up to their exams might be work. If you take the classes publicly, you are looking at a few thousand dollars. The university options vary widely. University of California San Diego's Data Mining Certificate charges $625 per course for each of about 6 courses (20 credit hours). In contrast, at Stanford, you are looking at $11,000 for three classes. A Masters program is going to be tens of thousands, certainly, but will vary widely in cost. Also, a busy professional is probably looking at as much as 5 years to get the Masters.