Wednesday, June 8, 2016

My first blog writing on Technical topic - Data Warehousing

Hi All,

This is my first blogging experience and I hope that something you will gain from what I am trying to elucidate through this blog. Before starting let me give a short Introduction about me.

Personal Introduction: 
I am Aarathy Ravi who is currently pursuing MS in Management Information Systems at Illinois Institute of Technology, Chicago. 
I have 2.5 years of work experience in India where I worked as a SAP Technical Consultant in Enzen Technologies Pvt Ltd. My roles and Responsibilities as below:
  • Involved in creating various reports (classical & interactive) and reports using ALV’S.
  • Prepared of Technical Design documents/ Response documents.
  • Assist in unit testing of ABAP programs.
  • Review program codes and correct ABAP program errors
  • Troubleshoot production problems in a timely manner
   Where I want to be?
     Currently I am seeking Full time opportunity as Data Analyst at a company where I can grow and take on new challenges over time. Ultimately, I’d like to assume more management responsibilities like a senior manager guiding everyone towards organizational goals. To be frank, I have become passionate on playing with data and processing the analysis in accordance with clients requests. In upfront I am just waiting eagerly where I am ready to put my analytical and decision making skills together.

      Let’s dive into the subject -Basic Concepts Data warehousing. In this first blog we will come across basic terms and concepts in data warehousing. I am trying to write in simple explanation, hope so a layman can too understand this!!!

       Before going into the definition of data warehousing first we should try to understand the differences between Data Analytics and Data Warehousing. 

      1  Data Warehousing Vs Data Analytics

       Data Analytics - As the term 'analytics' states that, it helps us to analyze raw data mainly to draw      conclusions about that information. Here Data is unstructured for ex: the tweets and FB posts that we   do on daily basis. It provides great insight on analyzing patterns, trends from large sets of data.  
     Data Warehousing - A central repository of data from multiple sources in an integrated and organized manner for reporting and analysis purposes. A data warehouse is built to enable data analytics for Predictive analysis, trend etc.

        2  Introduction to basic terms in Data Warehousing
  
   2.1 Data Warehousing:

     There are umpteen definitions in google for the definition of data warehousing, but every link cites the definition given by two pioneers Kimball and Inmon.

       As per Inmon, a data warehouse should be 
  •   Subject Oriented - For example in an organization there are many units/departments like HR, Sales, Marketing etc. that holds information related to that units. These data sets provides information for that particular application.
  •  Non Volatile - Once the historical data is loaded in the warehouse, the data can neither be modified nor deleted.
  •  Integrated - When data is loaded from multiple sources, we must ensure that the data is structured in a consistent manner by removing inconsistencies.
  •  Time Variant -  To identify new trends, analysts require huge amount of data that might change with respect to time. Every data structure in the data warehouse contains a time element, to track the changes in the data over time. So when required reports can be shown for current time and past time.
      As Per Kimball ' A copy of transaction data specifically structured for query and analysis’ .
He introduced a bottom up approach model where data marts are built first by prioritizing important business aspects or departments. As and when required these data marts can be combined into a larger data warehouse. We call this Kimball’s architecture as Data ware house bus

There are applications where data warehousing plays a crucial role. Ex: financial Services, Banking system, Retail sectors etc.

2.2 What are all the process involved and tools used?

       The Data warehouse stores the information historically where it involves processes like data cleaning, data integration and data consolidations. The purpose of the data warehouse tool is to 
  •     Extract of data
  •     Cleaning of Data
  •     Transformation of Data
  •     Data Loading
  •     Data Refreshing
2.3  OLAP vs OLTP
       OLAP - Online Analytic processing is a term where it helps in analyzing data related to specific data base. The standard applications of OLAP are Business Intelligence (BI), reporting through data mining operations.
       The main component is 'Data cube or OLAP cube'. Its purpose is to combine the DW's structures like facts and dimensions. When the cubes are merged they create a multi-dimensional data warehouse.

      OLAP operations: The data in the OLAP deals with archived information. The key feature of the system is; it works with SELECT statements for fast data retrieval. Therefore, it highly used for data analysis and reporting operations. Let’s take a real time example. In a retail store like Walmart, the store manager might want to view a report on out of stock materials, bestselling products for the particular month etc. For this OLAP operations are applied to DW that contains historical information and the results are obtained through complex queries. Then the report is given to the manager for analysis.

       OLTP -  Online Transaction processing is a process that initiates transaction system to collect business data. These systems collect information and store them in DB in a large scale. 

     OLTP operations: The main focus is on updating the data in the database with fast and effective queried to the database. The most commonly used DB operations are INSERT, UPDATE and DELETE where data is normalized and stored in DB.
     In a retail store like Walmart, once the purchase is done we pay the final amount at the counter, the sales person records all the data into a machine. That data is regarded as transaction data and the system used for that is OLTP.

2.4 How data from source system is integrated to target system in Data Warehouse?

      Let's see how this works. Basically data in source system will be different in terms of naming conventions, physical attributes of the data, data types etc. Let’s consider a hospital data base system that has several branches in several countries, has millions of patient's records and their details. The details might be Doctor's appointments, patient’s medical history, billing information etc.

     Source System data:
     
 SourceSystem     Name
           Attribute 
            Column Name
            Data Type
       Hospital A
   Patient ID Number
      PATIENT_ID_NUMBER
      NUMERIC (12,0)
       Hospital B
    Patient ID Number
      PATIENT_ID
       NUMERIC (12,0)
       Hospital C
     Patient ID
       PAT_ID_NO
       NUMERIC (12,0)

 In the above example the column name, attributes are different for each source systems. This shows data is stored in an inconsistent manner in the database. Before integrating it to target system in the data warehouse these data inconsistencies should be removed

TargetSystem Name
           Attribute
         Column Name               
          Data Type
       Record A
   Patient ID Number
       PATIENT_ID_NUMBER
      NUMERIC (12,0)
       Record B
    Patient ID Number
       PATIENT_ID_NUMBER
      NUMERIC (12,0)
       Record C 
   Patient ID Number
       PATIENT_ID_NUMBER
      NUMERIC (12,0)

      In the above Target System Database, all records have same attribute, column name, data types are integrated from the Source system in a methodical approach. This shows how data from the source system is integrated with target system stored accurately in the data warehouse.


      3. ETL concepts

ETL is a tool that is used in accessing and transforming source data and loading it into the target database (data warehousing). Some examples of ETL tools are Informatica, Oracle Warehouse Builder.
ETL – Extraction, Transformation(Transportation), Loading. Let’s discuss the process in ETL
      First Process: In the first step of ETL process, mapping of data between source and target system (data warehouse or data mart) is performed.
Second process:  In the second step, transformation of data is done. For ex: cleansing, reformatting, normalization of data is done before loading it into the DW
Third Process: In the final step the transformed data is loaded into the target system.

Introduction to business Intelligence:
      Business Intelligence is a process for analyzing raw data and taking necessary actions to help an organization with strategic business decisions. It helps in creating reports, dashboards and analytic visualizations.

Why we go for Business intelligence?
·         To support better decision making by understanding the past and predicting the future trends.
·         To contribute to business success by making strategic and tactical decisions.
·         To proactively monitor market trends and patterns.

What are the tools used in BI?
The most commonly used tools for BI purpose is:
·    Excel – It’s one of the commonly used tool for decades. Mainly used for Tracking and business reporting functionalities.
·       Reporting tool – There are 2 reporting tools used. One is OLAP tool that helps the users to look at the data from multi- dimensional point of view. Next is Data Mining tool that is used for correlation purpose.

 I take this opportunity to thank my professor William Favre Slater who initiated the concept of blog writing for students. I have tried my level best in presenting the basics terms and concepts in data warehousing. I welcome comments and views on my blog and wish to learn something that helps me to enhance my knowledge. See you with my next blog soon on this.

No comments:

Post a Comment