What
is Data Mining?
One of the emerging data architecture repository is Data warehouse. Huge volumes of data is being generated beyond data warehouses. So competitive pressure rises up to provide better customized services. Here we are going to discuss about Data mining concepts and techniques.
There are various definitions for
data mining. But all reveals the same meaning. It’s one of the process of
extracting useful data from large data sets to identify patterns, trends, behaviors
using techniques like artificial intelligence, machine learning, statistics
etc. The Final motto is to extract the data and transform into an
understandable structure for future use. It is one of the crucial step in the
process of predictive analytics.
Data Mining architecture |
Data Mining is also called as KDD-
Knowledge Discovery in databases. It involves some basic steps here data has to
be cleaned first before undergoing the mining process. Some of the methods are:
Ø
Generalization
Ø
Characterization
Ø
Classification
Ø
Clustering
association
Ø
Data
visualization
KDD process consists of sequential
steps as mentioned below:
1.
Data
Cleaning – It’s the process of cleaning the data to remove inconsistency and
noise.
2.
Data
Integration – In this process data from various sources are combined and
integrated.
3.
Data
Selection – In this method, data which is relevant o the analysis task are retrieved
from the database.
4.
Data
Transformation – Here data is transformed and consolidated into many forms that
is appropriate for mining by performing summary operations.
5.
Data
mining – Data patterns are extracted where intelligent methods are applied to
data sets.
6.
Pattern
evaluation – To identify the interesting patterns representing knowledge.
From steps 1- 4 it’s
for data pre-processing where data are prepared for mining. In the data mining step,
interaction is done with users. The knowledge patterns are presented to the
user and stored in the knowledge database.
Knowledge Discovery in Database process |
KDD
process stages:
1. Learning the application domain: includes
relevant prior knowledge and the goals of the application
2. Creating a target dataset: includes selecting
a dataset or focusing on a subset of variables or data samples on which
discovery is to be performed
3. Data cleaning and preprocessing: includes
basic operations, such as removing noise or outliers if appropriate,
collecting the necessary information to model or account for noise, deciding on
strategies for handling missing data fields, and accounting for time sequence
information and known changes, as well as deciding DBMS issues, such as data
types, schema, and mapping of missing and unknown values
4. Data reduction and projection: includes
finding useful features to represent the data, depending on the goal of the
task, and using dimensionality reduction or transformation methods to reduce
the effective number of variables under consideration or to find invariant
representations for the data
5. Choosing the function of data mining:
includes deciding the purpose of the model derived by the data mining algorithm
(e.g., summarization, classification, regression, and clustering)
6. Choosing the data mining algorithm(s):
includes selecting method(s) to be used for searching for patterns in the data,
such as deciding which models and parameters may be appropriate (e.g., models
for categorical data are different from models on vectors over reals) and
matching a particular data mining method with the overall criteria of the KDD
process (e.g., the user may be more interested in understanding the model than
in its predictive capabilities)
7. Data mining: includes searching for patterns
of interest in a particular representational form or a set of such
representations, including classification rules or trees, regression,
clustering, sequence modeling, dependency, and line analysis
8. Interpretation: includes interpreting the
discovered patterns and possibly returning to any of the previous steps, as
well as possible visualization of the extracted patterns, removing redundant or
irrelevant patterns, and translating the useful ones into terms understandable
by users
9. Using discovered knowledge: includes incorporating
this knowledge into the performance system, taking actions based on the
knowledge, or simply documenting it and reporting it to interested parties, as
well as checking for and resolving potential conflicts with previously believed
(or extracted) knowledge
Reference: http://www.ceine.cl/the-kdd-process-for-extracting-useful-knowledge-from-volumes-of-data/
What
Kind of data can be mined?
For what kind of data, does data
mining can be applied? Generally, it indicates as long as the data is
meaningful for a target application. The most basic forms of data for mining
applications are
·
Database
data
·
Data
warehouse data
· Transaction data
·
Data
streams,
·
Spatial
data
·
Text
data.
Goals
of Data Mining:
·
Prediction:
Predicting and forecasting how data attributes will behave in the future.
·
Identification:
Identify the existence of an item, an event or an activity
·
Classification:
Trying to classify the data into categories
·
Optimization:
Optimize the use of limited resources.
Applications
of Data Mining:
·
Market
Analysis
·
Risk
Analysis and management
·
Manufacturing
and production
·
Fraud
detection
·
Detection
of unusual patterns like Outliers
Commercial
tools
·
Oracle
Data miner(Oracle)
·
Data
to Knowledge(D2K)
·
SAS(SPSS)
·
Clementine(IBM)
References:
https://www.youtube.com/watch?v=EHTmxmuhZ10 - How data will transform business?