Trying out a definition of Data Mining

 

Everybody who writes about Data Mining has to get around to formally defining it at one point or another. 

 

Data Mining is the search of data, accumulated during the normal course of doing business, in order to find and confirm the existence of previously unknown relationships that can produce positive and verifiable outcomes through the deployment of predictive models when applied to new data.

 

Below are the key features. One or more are often missing from definitions I come across in my reading: 

 

– The mined data is not new

– The data that can answer the question was not collected solely to mine

– The miner is not testing hypotheses or known relationships

– The relationships must be verifiable

– The resulting models must be useful

– The resulting models must work on new data 

 

One comment on “Trying out a definition of Data Mining

  1. I now favor this modification:

    Data Mining is the selection and analysis of data,
    accumulated during the normal course of doing business,
    to find (and confirm) previously unknown relationships
    that can produce positive and verifiable outcomes through
    the deployment of predictive models applied to new data.

    Note that on a client’s suggestion I have added “selection”.




Leave a Reply

Your email address will not be published. Required fields are marked *

*

* Copy This Password *

* Type Or Paste Password Here *

388 Spam Comments Blocked so far by Spam Free Wordpress

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>