Home Page
 PhD Work

[Home][Motivation] [Information Retrieval] [Web Mining]
[Data Mining] [Intelligent Agents] [Knowledge Management & Ontologies]

What is Data Mining
Definition 0 [ M. Holsheimher and A.P.J.M. Siebes ] :
Data mining is the search for relationships and global patterns that exist in large databases, but are 'hidden' among the vast amounts of data, such as a relationship between patient data and their medical diagnosis.
Data mining could also be described as trying to create a simplified model of the complex world described in the database. We may therefore say that data mining is a way of dealing with large amounts of information, and it is helpful for finding useful information faster than any human.

Definition 1 [ Gartner Group ] :
Data Mining is the process of discovering meaninful new correlations, patterns and trends by sifting throughlarge amounts of data stored in repositories, using patterns recognition technologies as well as statistical and mathematical techniques.

 

Mining Techniques

Just as a carpenter uses many tools to build a sturdy house, a good analyst employs more than one technique to transform data into information. Most data miners go beyond "the basics" of reporting and OLAP (On-Line Analytical Processing, also known as multi-dimensional reporting) to take a multi-method approach that includes a variety of advanced techniques.


Reporting and OLAP are widely used, but are limited in the decision-making value they can deliver. Reporting and OLAP are a good place to start your data mining because can tell you what happened in the past (what were sales in the North region by product by month), but to find out why those results happened and to predict the future so you can make changes before it is too late, you'll need to employ more advanced techniques.

The use of advanced techniques is also called "modeling." And modeling includes the "S word" (statistics). While some people find statistics intimidating, advances in user interfaces and new, more "automatic" techniques are making advanced analysis practical for more and more people every day. In addition, a new category of "deployment" products, which are designed for "Information Consumers" (in contrast to "Model Builders") are making it easier for more people to gain value from advanced data mining techniques.

Advanced techniques are growing rapidly in popularity because more decision-making value can only be delivered through a combination of three things:

1. Use of more sophisticated analytical techniques so better quality information is produced.
2. Delivery of the information in a more useable form.
3. Faster generation of actionable results.

Here's a picture of how modeling is done:

Analytical reporting includes:

  • Reporting and OLAP

Theory driven modeling tools include the following:

  • Correlations
  • t-tests
  • ANOVA
  • Linear Regression
  • Logistic Regression
  • Discriminant Analysis
  • Forecasting Methods

Data driven modeling tools include the following:

  • Cluster Analysis
  • Factor Analysis
  • Decision Trees
  • Data Visualization
  • Neural Networks
  • Association rules Rule induction
 

Tutorials & Courses

 
Data Examples
 

Software

Links