Data Minning


What is data mining?

Data Mining is the process of extracting knowledge hidden from large volumes of raw data.The knowledge must be new, not obvious, and one must be able to use it.

Knowledge discovery differs from traditional information retrieval from databases.In traditional DBMS, database records are returned in response to a query; while in knowledge discovery, what is retrieved is not explicit in the database. Rather, it is implicit patterns. The Process of discovering such patterns is termed data mining.

Data mining finds these patterns and relationships using data analysis tools and techniques to build models. There are two main kinds of models in data mining. One is predictive models, which use data with known results to develop a model that can be used to explicitly predict values. Aother is descriptive models, which describe patterns in existing data. All the models are abstract representations of reality, and can be guides to understanding business and suggest actions.

Following questions are probably be answered if information hidden among megabytes of data in your database can be found and utilized. Modeling the investigated system, discovering relations that connect variables in a database are the subject of data mining.

More details please refer following sites:

Facing to enormous volumes of data, human analysts with no special tools can no longer make sense. However, Data mining can automate the process of finding relationships and patterns in raw data and the results can be either utilized in an automated decision support system or assessed by a human analyst. This is why to use data mining, especially in science and business areas which need to analyze large amounts of data to discover trends which they could not otherwise find.

If we know how to reveal valuable knowledge hidden in raw data, data might be one of our most valuable assets. while data mining is the tool to extract diamonds of knowledge from your historical data and predict outcomes of future situations.


A brife history of data mining

The term "Data mining" was introduced in the 1990s, but data mining is the evolution of a field with a long history.

Data mining roots are traced back along three family lines: classical statistics, artificial intelligence, and machine learning.

Satistics are the foundation of most technologies on which data mining is built, e.g. regression analysis, standard distribution, standad deviation, standard variance, discriminant analysis, cluster analysis, and confidence intervals. All of these are used to study data and data relationships.

Artificial intelligence, or AI, which is built upon heuristics as opposed to statistics, attempts to apply human-thought-like processing to statistical problems. Certain AI concepts which were adopted by some high-end commercial products, such as query optimization modules for Relational Database Management Systems (RDBMS).

Machine learning is the union of statistics and AI. It could be considered an evolution of AI, because it blends AI heuristics with advanced statistical analysis. Machine learning attempts to let computer programs learn about the data they study, such that programs make different decisions based on the qualities of the studied data, using statistics for fundamental concepts, and adding more advanced AI heuristics and algorithms to achieve its goals.

Data mining, in many ways, is fundamentally the adaptation of machine learning techniques to business applications. Data mining is best described as the union of historical and recent developments in statistics, AI, and machine learning. These techniques are then used together to study data and find previously-hidden trends or patterns within.


What is the current state?

Techniques in Data Mining
Association Rule:
Association is to discover interesting associations between attributes contained in a database. This technique is also known as market basket analysis. Based on frequency counts of the number of items occur in the event (i.e. a combination of items), association rule tells if item X is a part of the event, then what is the percentage of item Y is also part of the event.

Clustering is often used to find appropriate groupings of elements for a set of data. Unlike decision trees we discuss above, clustering is a kind of undirected knowledge discovery or unsupervised learning; that is, there is no target field, and the relationship among the data is identified by bottom-up approach.

Decision Trees
Decision Trees performs classification by constructing a tree based on training instances with leaves having class labels. The tree is traversed for each test instance to find a leaf, and the class of the leaf is the predicted class. This is a directed knowledge discovery in the sense that there is a specific field whose value we want to predict.

Neural Network
Neural network is often represented as a layered set of interconnected processors. These processor nodes are frequently referred as neurodes so as to indicate a relationship with the neurons of the brain. Each node has a weighted connection to several other nodes in adjacent layers. Individual nodes take the input received from connected nodes and use the weights together to compute output values.

More details please refer following sites:


The immediate future

The complexity of data mining must be hidden from end-users before it will take the true center stage in an organization. Business use cases can be designed, with tight constrains, around data mining algorithms.


More details please refer following sites:


Implication for database administrators, developers, and general public

For database administrators:
Database administrators will have a chance to explore a new knowledge as their boss may add extra responsibility to do the data mining research. If it is inside a big corporation, database administrators will work with research analysts or statisticians on the research by assisting them to acquire data set. If it is only a small firm, database administers may take the extra responsibility by doing the research on their own.

For database developers:
Many of the leading database vendors have already introduce a data-mining module in their database system. Oracle 9i Data Mining, IBM intelligent miner, and SAS enterprise miner are all powerful tools to let companies to dig into corporate databases that discover new insights on business applications. Database developers have to keep up with and even conduct research on different new data mining techniques that enhance their client’s productivity. After which, database developers have to design an interface to let businessmen who usually do not have statistics or computer science background can handle analysis on data mining easily. The program or interface design should be simple, intuitive, flexible, and easy to understand so that novice pick up easily.

For general public:
The good news is that data mining can lead to better customized services. You will not receive any spam as advertising will target potential customers with new precision. You don’t have to spend hours on looking a book you are interested in because a book recommender system has already figure out your next favorite by studying your borrowing record. However, we lose our privacy while enjoying all the benefits from data mining. When our personal data easily collected by different private companies, we will finally be exploited by their new pricing strategies.