Last
time I wrote about the basics terms and definitions in Data warehousing. Today
when I was searching on my next topic, I came across this interesting one. Big
data. Let’s see the impact of Big data in a corporate sector and which
alternative to look for whether its data warehouse or Hadoop, when storing
massive data.
Introduction on Big
data
In
today’s fast paced moving technology, Organizations generate huge volumes of
data that has high velocity and variety (log files, video, images, text etc.).
In order to occupy massive data, the storage space plays a crucial role. Those
days are gone where traditional relational database systems were used, now it
seems those methods are defunct now. Now new technologies, platforms with right
analytic tools are introduced that gives a boost to top technology companies.
Big data – Its considered to be the new
generation in the data management. As the name suggests data is bigger based on
its volume, velocity (impact of the data) and the variety of forms where its analyzed.
That’s why it’s called big data. For ex: When a person tweets, huge amount of
data is generated. So this data is captured, stored and analyzed using right
analytical tools to promote business growth.
What Is Apache hadoop?
It’s an open source java framework
that is primarily used for storing and analyzing big data. Hadoop helps in
processing big data sets, where data is split into small parts across clusters
or nodes. Many major tech companies like Yahoo, IBM, Google use Hadoop
framework for advertising, optimization of search engine process etc.
MapReduce – It’s a data platform
for Apache Hadoop where the application logic splits the data for processing in
parallel on large clusters and nodes. This framework is intended for scheduling
tasks, monitoring them and re executing the failed tasks.
Benefits
of using Hadoop over other technologies
·
Apache
hadoop is considered to be a faster and cheaper analytical tool for big data.
· Data
can be stored either in a structured or unstructured way without the need of formatting
it, whereas Relational databases requires the data to be defined with proper
schema before storing it.
·
Its
cost effective where the data is stored at per terabyte that delivers fast
computation.
· Its
Fault tolerant, in case of failure, data is replicated across a cluster and can
be recovered.
Why
Big data?
Usually collecting and storing huge
volumes of data doesn’t generate any potential value to the organization. We should realize that value is created only
when data is analyzed and acted upon. We must ensure that how the stored data
can be analyzed and those analytic results provides great value to the business
that can be used for decision making strategies, improving customer engagement,
product development, and to optimize search engine process in the digital
marketing world.
Some Take on points are:
·
Increase
of storage capacities
·
Stores
and analyze all structured and unstructured data
·
Deep
data exploration for analysts
·
Flexible
and adapts to changing business trends
Some Popular tools used in Big
Data.
·
NoSQL
(HBase, Cassandra)
·
MapReduce-
Hadoop, Hive, pig, MapR
·
Storage-S3,
Hadoop Distributed file system(HDFS)
What is the big
difference between big data(Hadoop) and DW?
Nowadays
IT organizations faces tremendous challenges in using Big data or Data warehouse
to promote business growth. Many organizations have confusion on when to use
which alternative.
The
major advantage of Hadoop lies in handling two complicated problems.
·
Capacity
to handle large data sets
·
Run
and Execute complex analytics.
In
the below diagram it shows how Hadoop gels well with DW in the above mentioned
aspects.
Below
is a table that highlights major differences between Hadoop vs Data warehousing.
Hadoop
|
DWH
|
|
Data
|
All
forms of data (structured, semi-structured and unstructured)
|
Before
storing the data, allows only structured data and well defined schema
|
Application
|
Newly
used concept in corporate sector. Ex:
Health care, retail
|
Traditional
approach established in Organizations already
|
Tooling
|
New
tools, Ex: MapReduce and business use SQL queries or BI tools
|
Installed,
good Knowledge and experience
|
Costs
|
Low
(per GB)
|
High
(Per GB)
|
Access
|
Batch
processing in parallel
|
Interactive
and Batch
|
In
conclusion both Hadoop and DW shares a symbiotic relationship. Try to implement
hadoop in case if you are not able to solve your business problem. Keep a check
on security, governance, performance. Some differences are clear, but majorly
its dependent on your organization and use cases. Do a careful analysis of your
business requirements and technical analysis to ensure best business outcome.
Potential value of
Big data:
Below
are some Insights on how big data captures tremendous market growth:
·
It
generates $300 billion potential annual value to US health care
·
As
per Forbes report Big data analytics
is the next trillion-dollar market, says Michael Dell.
IDC has a more modest and specific prediction, forecasting the market for big data technology and services to grow at a
23.1% compound annual growth rate, reaching $48.6 billion in 2019.
·
The Mckinsey Global Institute estimates that data
volume is growing 40% per year, and will grow 44x between 2009 and 2020.
Interesting links and
videos to look out for:
1. https://www.youtube.com/watch?v=lz_kIDxbzGA( How we found the worst place to
park in New York City- using Big data)
2. https://www.youtube.com/watch?v=1RYKgj-QK4I(
IBM Big Data and analytics at work in Banking)
Hope this would have helped you. And see you soon on my next blog !!!
No comments:
Post a Comment