There are hundreds of software options in analytics and more are developed every year. Analytics applications can be reasonably organized into four categories:
- Open-source languages (R, Python)
- Commercial workbenches (SAS Enterprise Miner, IBM SPSS, and many others)
- Auto-ML (Datarobot, H20.ai’s Driverless AI, and many others)
- Open-source workbenches (KNIME and many others)
R and Python have completely dominated the conversation in recent years, but they don’t represent the only choices. There is little doubt that a full-time data scientist has to know a little about each of them, but if you lack programming experience, they can seem daunting. Moreover, it’s not clear that they represent the best choices for someone who interacts with analytics on only a part-time basis. This course will not require a knowledge of either one, but if you do choose to learn some R and Python programming, it’s best to start by choosing an appropriate editing environment—such as R Studio—that can help you become acclimated. It’s rarely necessary to start with a blank computer screen!
In the 90s, when predictive analytics and machine learning software began to take off, two options were dominant: SAS and IBM SPSS. It is valuable to know this because they influenced the design of everything that followed. There is a whole generation of machine learning experts—now in their mid-career years—whose training was influenced by this period. Although these tools are powerful, they are also expensive—and have been losing ground to various open-source options for years.
There are a variety of newer software options that can be used by both business analysts with minimal training as well as by data science experts. Less-experienced users can rely on the software to automatically select settings and options for conducting an analysis while more experienced users can fine-tune the parameters directly. This is not unlike a camera with an auto-focus capability. Both rookies and experts can use the camera, but experts can turn off the auto-focus feature and apply manual settings for better artistic control over their photos.
Auto-ML, as this is called, becomes somewhat controversial when the software makes elections that are either opaque or irreversible (or both). These software technologies are evolving rapidly and will likely grow in popularity. However, it is still desirable to have knowledgeable human oversight until the software becomes more sophisticated and reliable. Any tool that saves time is helpful—as long as the results are transparent and validated. Many tools in the Auto-ML toolkit already meet these criteria, but complete start-to-finish automation that doesn’t require human oversight hasn’t yet been achieved.