A world famous life sciences company wants to do a prestigious new drug promotion. The requirement is that it wants to find out who are their top notch sales representatives to do the job. The selection should be scientifically done and based on years and years of performance data in their data warehouse.
A telecom service provider is launching new LTE – Long Term Evolution Services. As part of their initial run they would like to test the service by giving it to select ten thousands data usage intensive customers. How to select these customers from large amount of information available in various systems?
Data mining helps in the above tasks. Data mining is about bringing out the valued result, predictive information and their pattern concealed in the data bases. The valued result that it brings may be powerful to the company; powerful to allow companies make informed and well researched decisions. Those decisions may be for finding out solutions to the existing problems or giving direction to the future. There had been always problems to the business. Despite having large information systems supported by data bases, there had been questions as what is likely to be the result of next quarter?
There are three primary techniques used in data mining. They are:
- Classification
- Clustering
- Association Rules
Classification techniques will include decision tree based methods, rule based methods, and memory based reasoning, neural networks, genetic algorithms, Bayesian belief networks and support vector machines. Clustering will include partition clustering and hierarchical clustering.
Verifying the patterns is the final step. This is called result validation. This can be better explained with an illustration. Let us say there are about a million email messages. Identifying spam mails and forming a pattern is a data mining job. A sample of 1000 mails are tested – which is called training set is mined and a pattern is arrived at. Now when we run the pattern with a general set of email, we find that the same pattern or spam is not detected. To avoid this data mining algorithms are tested with test sets of email. Now the patterns are applied to another set and result is verified. Actually, till the time that the learned patterns meet the desired results, data mining should be continued.
Businesses worldwide apply data mining to improve the return on investment. For example, data mining the CRM – Customer Relationship Management data can improve the bottom line of the company. Service departments can benefit from data mining customer calls. Human Resources Department can use data mining for analysing the performance of employees.
Of late, data mining is widely used in genetics, bioinformatics research. Diagnosis, detection, protection and cure of diseases like cancer continue to remain as a great challenge to mankind. This is despite rapid growth in science and engineering field. In the area of genetics, mapping the DNA sequences and related disease susceptibility is done by a data mining technique called multi factor dimensionality reduction.
Forrester research in the year 2010 shows that there has been a general reduction in IT out lay of data base and business intelligence software spend but data mining is growing. This shows us the importance of data mining.
