Wednesday, June 29, 2016

Data Mining Concepts

What is Data Mining?


One of the emerging data architecture repository is Data warehouse. Huge volumes of data is being generated beyond data warehouses. So competitive pressure rises up to provide better customized services. Here we are going to discuss about Data mining concepts and techniques.
There are various definitions for data mining. But all reveals the same meaning. It’s one of the process of extracting useful data from large data sets to identify patterns, trends, behaviors using techniques like artificial intelligence, machine learning, statistics etc. The Final motto is to extract the data and transform into an understandable structure for future use. It is one of the crucial step in the process of predictive analytics.

Data Mining architecture
Data Mining is also called as KDD- Knowledge Discovery in databases. It involves some basic steps here data has to be cleaned first before undergoing the mining process. Some of the methods are:
Ø  Generalization
Ø  Characterization
Ø  Classification
Ø  Clustering association
Ø  Data visualization
KDD process consists of sequential steps as mentioned below:
1.       Data Cleaning – It’s the process of cleaning the data to remove inconsistency and noise.
2.       Data Integration – In this process data from various sources are combined and integrated.
3.       Data Selection – In this method, data which is relevant o the analysis task are retrieved from the database.
4.       Data Transformation – Here data is transformed and consolidated into many forms that is appropriate for mining by performing summary operations.
5.       Data mining – Data patterns are extracted where intelligent methods are applied to data sets.
6.       Pattern evaluation – To identify the interesting patterns representing knowledge.
From steps 1- 4 it’s for data pre-processing where data are prepared for mining. In the data mining step, interaction is done with users. The knowledge patterns are presented to the user and stored in the knowledge database.
Knowledge Discovery in Database process
KDD process stages:
1. Learning the application domain: includes relevant prior knowledge and the goals of the application
2. Creating a target dataset: includes selecting a dataset or focusing on a subset of variables or data samples on which discovery is to be performed
3. Data cleaning and preprocessing: includes basic operations, such as removing noise or outliers if appropriate, collecting the necessary information to model or account for noise, deciding on strategies for handling missing data fields, and accounting for time sequence information and known changes, as well as deciding DBMS issues, such as data types, schema, and mapping of missing and unknown values
4. Data reduction and projection: includes finding useful features to represent the data, depending on the goal of the task, and using dimensionality reduction or transformation methods to reduce the effective number of variables under consideration or to find invariant representations for the data
5. Choosing the function of data mining: includes deciding the purpose of the model derived by the data mining algorithm (e.g., summarization, classification, regression, and clustering)
6. Choosing the data mining algorithm(s): includes selecting method(s) to be used for searching for patterns in the data, such as deciding which models and parameters may be appropriate (e.g., models for categorical data are different from models on vectors over reals) and matching a particular data mining method with the overall criteria of the KDD process (e.g., the user may be more interested in understanding the model than in its predictive capabilities)
7. Data mining: includes searching for patterns of interest in a particular representational form or a set of such representations, including classification rules or trees, regression, clustering, sequence modeling, dependency, and line analysis
8. Interpretation: includes interpreting the discovered patterns and possibly returning to any of the previous steps, as well as possible visualization of the extracted patterns, removing redundant or irrelevant patterns, and translating the useful ones into terms understandable by users
9. Using discovered knowledge: includes incorporating this knowledge into the performance system, taking actions based on the knowledge, or simply documenting it and reporting it to interested parties, as well as checking for and resolving potential conflicts with previously believed (or extracted) knowledge


What Kind of data can be mined?
For what kind of data, does data mining can be applied? Generally, it indicates as long as the data is meaningful for a target application. The most basic forms of data for mining applications are
·         Database data
·         Data warehouse data
·         Transaction data
·         Data streams,
·         Spatial data
·         Text data.

Goals of Data Mining:
·         Prediction: Predicting and forecasting how data attributes will behave in the future.
·         Identification: Identify the existence of an item, an event or an activity
·         Classification: Trying to classify the data into categories
·         Optimization: Optimize the use of limited resources.

Applications of Data Mining:
·         Market Analysis
·         Risk Analysis and management
·         Manufacturing and production
·         Fraud detection
·         Detection of unusual patterns like Outliers

Commercial tools
·         Oracle Data miner(Oracle)
·         Data to Knowledge(D2K)
·         SAS(SPSS)
·         Clementine(IBM)

References:
https://www.youtube.com/watch?v=EHTmxmuhZ10  - How data will transform business?

No comments:

Post a Comment