Hi All,
This is my first
blogging experience and I hope that something you will gain from what I am
trying to elucidate through this blog. Before starting let me give a short Introduction
about me.
Personal Introduction:
I am Aarathy Ravi who is currently pursuing MS
in Management Information Systems at Illinois Institute of Technology,
Chicago.
I have
2.5 years of work experience in India where I worked as a SAP Technical
Consultant in Enzen Technologies
Pvt Ltd. My roles and Responsibilities as below:
- Involved in
creating various reports (classical & interactive) and reports using
ALV’S.
- Prepared of
Technical Design documents/ Response documents.
- Assist in unit
testing of ABAP programs.
- Review
program codes and correct ABAP program errors
- Troubleshoot
production problems in a timely manner
Where I want to be?
Currently I am seeking Full time opportunity as Data Analyst at a company where I
can grow and take on new challenges over time. Ultimately, I’d like to assume
more management responsibilities like a senior manager guiding everyone towards
organizational goals. To be frank, I have become passionate on playing with data
and processing the analysis in accordance with clients requests. In upfront I
am just waiting eagerly where I am ready to put my analytical and decision
making skills together.
Let’s dive
into the subject -Basic Concepts Data warehousing. In this first blog we will
come across basic terms and concepts in data warehousing. I am trying to write
in simple explanation, hope so a layman can
too understand this!!!
Before going into the definition of data warehousing first we should try to
understand the differences between Data Analytics and Data Warehousing.
1 Data Warehousing Vs Data
Analytics
Data Analytics - As the term 'analytics' states that, it
helps us to analyze raw data
mainly to draw conclusions about that information. Here Data is
unstructured for ex: the tweets and FB posts
that we do on daily basis. It provides great insight
on analyzing patterns, trends from large sets of data.
Data Warehousing - A central repository of data
from multiple sources in an integrated and organized manner for reporting and
analysis purposes. A data warehouse is built to enable data analytics for
Predictive analysis, trend etc.
2 Introduction to basic terms in Data Warehousing
2.1 Data
Warehousing:
There are umpteen definitions in google for
the definition of data warehousing, but every link cites the definition given
by two pioneers Kimball and Inmon.
As per Inmon, a data warehouse should be
- Subject Oriented -
For example in an organization there are many units/departments like HR,
Sales, Marketing etc. that holds information related to that units. These
data sets provides information for that particular application.
- Non Volatile -
Once the historical data is loaded in the warehouse, the data can neither
be modified nor deleted.
- Integrated -
When data is loaded from multiple sources, we must ensure that the data is
structured in a consistent manner by
removing inconsistencies.
- Time Variant -
To identify new trends, analysts require huge amount of data that
might change with respect to time. Every data structure in the data warehouse contains a
time element, to track the changes in the data over time. So when required
reports can be shown for current time and past time.
As Per Kimball ' A copy of
transaction data specifically structured for query and analysis’ .
He introduced a bottom up approach model where data marts are built first
by prioritizing important business aspects or departments. As and when required
these data marts can be combined into a larger data warehouse. We call this
Kimball’s architecture as Data ware house bus
There are applications
where data warehousing plays a crucial role. Ex: financial
Services, Banking system, Retail sectors etc.
2.2 What are all the
process involved and tools used?
The Data warehouse stores the information
historically where it involves processes like data cleaning, data integration and
data consolidations. The purpose of the data warehouse tool is to
- Extract of data
- Cleaning of Data
- Transformation of
Data
- Data Loading
- Data Refreshing
2.3 OLAP vs OLTP
OLAP - Online Analytic
processing is a term where it helps in analyzing data related to
specific data base. The standard applications of OLAP are Business Intelligence (BI),
reporting through data mining operations.
The main component is 'Data cube or OLAP cube'.
Its purpose is to combine the DW's structures
like facts and dimensions. When the cubes are merged they create a multi-dimensional
data warehouse.
OLAP operations:
The data in the OLAP deals
with archived information. The key feature of the system is; it works with
SELECT statements for fast data retrieval. Therefore, it highly used for data
analysis and reporting operations. Let’s take a real time example. In a retail
store like Walmart, the store
manager might want to view a report on out of stock materials, bestselling
products for the particular month etc. For this OLAP operations are applied to DW that contains historical information
and the results are obtained through complex queries. Then the report is given
to the manager for analysis.
OLTP -
Online Transaction processing is a process that initiates
transaction system to collect business data. These systems collect
information and store them in DB in a large scale.
OLTP operations: The main focus is on
updating the data in the database with fast and effective queried to the
database. The most commonly used DB operations are INSERT, UPDATE and DELETE
where data is normalized and stored in DB.
In a
retail store like Walmart, once the
purchase is done we pay the final amount at the counter, the sales person
records all the data into a machine. That data is regarded as transaction data
and the system used for that is OLTP.
2.4 How data from source
system is integrated to target system in Data Warehouse?
Let's see how this works. Basically data in source system will be
different in terms of naming conventions, physical attributes of the data, data
types etc. Let’s consider a hospital data base system that has several branches
in several countries, has millions of patient's records
and their details. The details might be Doctor's appointments, patient’s
medical history, billing information etc.
Source System data:
SourceSystem
Name
|
Attribute
|
Column Name
|
Data Type
|
Hospital A
|
Patient ID Number
|
PATIENT_ID_NUMBER
|
NUMERIC (12,0)
|
Hospital B
|
Patient ID Number
|
PATIENT_ID
|
NUMERIC (12,0)
|
Hospital C
|
Patient ID
|
PAT_ID_NO
|
NUMERIC (12,0)
|
In the above example the column name,
attributes are different for each source systems. This shows data is stored in
an inconsistent manner in the database. Before integrating it to target system
in the data warehouse these data inconsistencies should be removed
TargetSystem
Name
|
Attribute
|
Column Name
|
Data Type
|
Record A
|
Patient ID Number
|
PATIENT_ID_NUMBER
|
NUMERIC (12,0)
|
Record B
|
Patient ID Number
|
PATIENT_ID_NUMBER
|
NUMERIC (12,0)
|
Record C
|
Patient ID Number
|
PATIENT_ID_NUMBER
|
NUMERIC (12,0)
|
In
the above Target System Database, all records have same attribute, column name,
data types are integrated from the Source system in a methodical approach. This
shows how data from the source system is integrated with target system stored
accurately in the data warehouse.
3. ETL concepts
ETL is a tool that
is used in accessing and transforming source data and loading it into the
target database (data warehousing). Some examples of ETL tools are Informatica,
Oracle Warehouse Builder.
ETL – Extraction,
Transformation(Transportation), Loading. Let’s discuss the process in ETL
First
Process: In the first step of ETL process, mapping of data between source
and target system (data warehouse or data mart) is performed.
Second process: In the second step,
transformation of data is done. For ex: cleansing, reformatting, normalization
of data is done before loading it into the DW
Third Process: In the final step the transformed data is loaded into the
target system.
Introduction to business
Intelligence:
Business Intelligence is a process for
analyzing raw data and taking necessary actions to help an organization with
strategic business decisions. It helps in creating reports, dashboards and
analytic visualizations.
Why we go for Business
intelligence?
·
To
support better decision making by understanding the past and predicting the future
trends.
·
To
contribute to business success by making strategic and tactical decisions.
·
To
proactively monitor market trends and patterns.
What are the tools used in BI?
The most commonly used tools for BI
purpose is:
· Excel
– It’s one of the commonly used tool for decades. Mainly used for Tracking and
business reporting functionalities.
· Reporting
tool – There are 2 reporting tools used. One is OLAP tool that helps the users
to look at the data from multi- dimensional point of view. Next is Data Mining
tool that is used for correlation purpose.
I take this opportunity to thank my professor William Favre Slater who initiated the concept of blog writing for students. I have
tried my level best in presenting the basics terms and concepts in data warehousing.
I welcome comments and views on my blog and wish to learn something that helps
me to enhance my knowledge. See you with my next blog soon on this.
No comments:
Post a Comment