Wednesday, June 29, 2016

Data Mining Concepts

What is Data Mining?


One of the emerging data architecture repository is Data warehouse. Huge volumes of data is being generated beyond data warehouses. So competitive pressure rises up to provide better customized services. Here we are going to discuss about Data mining concepts and techniques.
There are various definitions for data mining. But all reveals the same meaning. It’s one of the process of extracting useful data from large data sets to identify patterns, trends, behaviors using techniques like artificial intelligence, machine learning, statistics etc. The Final motto is to extract the data and transform into an understandable structure for future use. It is one of the crucial step in the process of predictive analytics.

Data Mining architecture
Data Mining is also called as KDD- Knowledge Discovery in databases. It involves some basic steps here data has to be cleaned first before undergoing the mining process. Some of the methods are:
Ø  Generalization
Ø  Characterization
Ø  Classification
Ø  Clustering association
Ø  Data visualization
KDD process consists of sequential steps as mentioned below:
1.       Data Cleaning – It’s the process of cleaning the data to remove inconsistency and noise.
2.       Data Integration – In this process data from various sources are combined and integrated.
3.       Data Selection – In this method, data which is relevant o the analysis task are retrieved from the database.
4.       Data Transformation – Here data is transformed and consolidated into many forms that is appropriate for mining by performing summary operations.
5.       Data mining – Data patterns are extracted where intelligent methods are applied to data sets.
6.       Pattern evaluation – To identify the interesting patterns representing knowledge.
From steps 1- 4 it’s for data pre-processing where data are prepared for mining. In the data mining step, interaction is done with users. The knowledge patterns are presented to the user and stored in the knowledge database.
Knowledge Discovery in Database process
KDD process stages:
1. Learning the application domain: includes relevant prior knowledge and the goals of the application
2. Creating a target dataset: includes selecting a dataset or focusing on a subset of variables or data samples on which discovery is to be performed
3. Data cleaning and preprocessing: includes basic operations, such as removing noise or outliers if appropriate, collecting the necessary information to model or account for noise, deciding on strategies for handling missing data fields, and accounting for time sequence information and known changes, as well as deciding DBMS issues, such as data types, schema, and mapping of missing and unknown values
4. Data reduction and projection: includes finding useful features to represent the data, depending on the goal of the task, and using dimensionality reduction or transformation methods to reduce the effective number of variables under consideration or to find invariant representations for the data
5. Choosing the function of data mining: includes deciding the purpose of the model derived by the data mining algorithm (e.g., summarization, classification, regression, and clustering)
6. Choosing the data mining algorithm(s): includes selecting method(s) to be used for searching for patterns in the data, such as deciding which models and parameters may be appropriate (e.g., models for categorical data are different from models on vectors over reals) and matching a particular data mining method with the overall criteria of the KDD process (e.g., the user may be more interested in understanding the model than in its predictive capabilities)
7. Data mining: includes searching for patterns of interest in a particular representational form or a set of such representations, including classification rules or trees, regression, clustering, sequence modeling, dependency, and line analysis
8. Interpretation: includes interpreting the discovered patterns and possibly returning to any of the previous steps, as well as possible visualization of the extracted patterns, removing redundant or irrelevant patterns, and translating the useful ones into terms understandable by users
9. Using discovered knowledge: includes incorporating this knowledge into the performance system, taking actions based on the knowledge, or simply documenting it and reporting it to interested parties, as well as checking for and resolving potential conflicts with previously believed (or extracted) knowledge


What Kind of data can be mined?
For what kind of data, does data mining can be applied? Generally, it indicates as long as the data is meaningful for a target application. The most basic forms of data for mining applications are
·         Database data
·         Data warehouse data
·         Transaction data
·         Data streams,
·         Spatial data
·         Text data.

Goals of Data Mining:
·         Prediction: Predicting and forecasting how data attributes will behave in the future.
·         Identification: Identify the existence of an item, an event or an activity
·         Classification: Trying to classify the data into categories
·         Optimization: Optimize the use of limited resources.

Applications of Data Mining:
·         Market Analysis
·         Risk Analysis and management
·         Manufacturing and production
·         Fraud detection
·         Detection of unusual patterns like Outliers

Commercial tools
·         Oracle Data miner(Oracle)
·         Data to Knowledge(D2K)
·         SAS(SPSS)
·         Clementine(IBM)

References:
https://www.youtube.com/watch?v=EHTmxmuhZ10  - How data will transform business?

Monday, June 27, 2016

Predictive Analytics

Introduction to Predictive Analytics

Let’s dive into the concept of predictive analytics and what’s the big difference between analytics and Predictive analytics. To explain in simple words, Predictive analytics is like driving a car during heavy traffic, where you will anticipate vehicles at the back by viewing your front view mirror and being wary about the traffic, you will drive smoothly and reach the destination safely.

In similar fashion how predictive analytics create business value? Predictive analytics is nothing but a practice of extracting meaningful insights from data sets using data mining process, and keeps informed about the business activities that might affect future business outcomes. The term predictive means finding a probability of future events- for example a customer uses predictive analytics to deliver more relevant data, to improve overall profit. Here analysis is done in depth why customer satisfaction is not being attained.

Features:
·         Identify and observe patterns to unknowns in the Past, Present or future.
·         Use data mining techniques to discover hidden insights.

Analytics- It’s a foundation stone where we try to understand the existing data set using trends and patterns via comparison method. It’s one of the first step towards Predictive analytics. Below is a table that shows primary differences between Analytics and Predictive analytics.

Analytics
Predictive analytics
Purpose is to understand the past and observe trends
Purpose is to gain insights that leads to effective decision making that is oriented towards futuristic goals.
Data used is raw, structured and compiled
Information is structured and unstructured.
Benefits productivity improvements for producing reports and metrics
Benefits process improvements leading to enhanced decision making skills.




Above Diagram shows  How Predictive analytics is viewed today by the IT people?
Defining Business Intelligence and its relationship with Predictive analysis.

With helps of BI you will learn how to use data to learn about your customers and what’s the current state of your business. BI looks up to identify areas that is under performing. Areas like: Products, customer reach, partners, time and business dimensions.  As per Gartner This knowledge baseline is shaped through descriptive analysis examining past data to extract useful customer/buyer/prospect information”. Whereas in Predictive analytics you will predict like how customer’s behavior likely to be in future? So Business Intelligence is gaining knowledge, strategy, Infrastructure and here analytics provides feedback to business people signaling success or failure of their model by predicting futuristic events.

BI relationship with Predictive analytics
Applications using Predictive analytics

How Predictive analytics is being used by many sectors and how it helps in transforming business growth.

Benefits:
 To improve customer relationship management and thereby revenue. Ex: It can be signing up for a newsletter, clicking on a promotion code etc. There are some vendors that helps retailers to track their customer engagement like ‘Lattice’, ‘SAS’ etc. Top companies like Netflix and Amazon use this to create a loyal relationship with their customer that results in enhanced customer satisfaction and revenue growth.
 
How Amazon and Netflix uses predictive analytics
Launching New promotional deals that attracts customersAll retail stores from mid-sized to large size organizations depend on promotional deals, discounts to succeed in the market. According to a study by Oracle, 98% of fast-growing merchants feel that segmentation & targeting are important for their online merchandising strategy, yet more than half are not satisfied with the tools they have for promotions. For ex: Macy’s has experienced the benefits of predictive analytics by deploying a solution from SAP that helps in retaining their registered customers. Reports suggest that It has 8-12% increase in online sales by combining browsing behavior within product categories.


1.       Optimizing pricing index to Maximize Profit growth: Predictive analytics play an important role by supporting real time pricing that accepts input from sources like:
·         Customer activity
·         Order History and Preferences
·         Historical product pricing
·         Available Inventory

Refer the below video that shows how Uber and AirBnB have been setting prices with analytics. https://www.youtube.com/watch?v=-KFe5pGMFbo

1    Reduction in fraud by detecting it
Fraud and theft in terms of data occurs in all industries where billions of dollars are lost every year. IBM’s SPSS Suite is one of the best Predictive analytics solution that helps retailers to analyze browsing patterns, payment types and purchasing patterns to detect fraud. For ex:
Corruption, IP theft, Phishing etc. Nowadays leading retailers like Walmart have started using algorithms that are helps in catching fraud before it happens using Predictive analytics.


Deploying methods for predictive analysis
1.       Try to leverage with best data scientist who is technically and functionally qualified to integrate with e-commerce platforms. There are several online predictive tools and plugins available where you can utilize it and reap the benefits Ex: Custora – a tool that generates great customer lifetime value.
2.   Use an Open source predictive analytics product that helps in creating more custom solutions using platforms R, Prediction IO etc. For this you should hire well qualified skill set programmers are required.
3.       One of the easy and expensive method is to buy a full featured suite like SAS that comes as a full package with offerings like SAP, Prediction. The features of this offerings is many pre-built in products are available for fraud, pricing management etc.

Predictive software players

Spotfire, R, SPSS (An IBM company), Rapid Miner, SAS.

Limitations
  • Each data pint must be planned and collected properly for proper execution
  • Expensive to design


Conclusion

In order to benefit from predictive analytics, people in the company should liaise proper communication between one another. Here comes the difficulty. BI professionals speaks about the output in SQL language whereas executives will try to understand that in terms of reports, metrics etc. So together both IT people and BI professionals should try to understand the language of strategy, business models while solving business issues. Organizations depend on predictive analytics for strategic planning, achieving profits and targets, financial outcomes, trying to b e a competitor in the market. So with the help of predictive analytics, organization can rely on timely feedback that explains about their strategic initiatives, and assist them in answering futuristic questions.





Wednesday, June 22, 2016

Big Data in Enterprise Mobility

Big Data in Enterprise Mobility
Data or an information is a foundation for any business be it small sized organization or a conglomerate. Data analysts, CIO’s, DBA engineers in a company will think on how to utilize the massive amount of data being generated by their business. In order to stand out in the crowd, the organization should find the hidden value behind their data, which helps them to be competitive in today’s world. Here IT decision makers bring about the concept of Big data analytics solutions that helps the business to adapt to newer products and keep their customer satisfied.
Digital data is the trend now in today’s landscape. Four technological mega trends shape the reality of business strategies today: the use of Big Data, mobility, the social media and cloud computing. Today in our daily basis we use gadgets, iPad, smartphones, tablets etc. EY company states that with 900 million+ mobile connections, 100 million+ active mobile data users and increasing number of connected devices, the amount of consumer and enterprise data will grow exponentially.

Data is available in myriad forms like SMS, text, photos etc. These unstructured data have increased the ability to investigate deeply to satisfy customer values. If you take an example, If an Automobile company like BMW launches a new model of BMW, it creates a lot of buzz in the social networking sites and tries to capture consumer attention and public reaction for the launch.

Processing of data:
Big data possess opportunities for growth of the business. In order to improve the efficiency and effectiveness of decision making, the data must be processed in timely fashion and the use of analytics must give way for better business decisions. The use of analytics helps to predict statistical and quantitative analysis using business models. The best effective action in using analytics helps the business to meet stakeholders’ demands, manage huge volumes of data and enhance performance of the business by bringing changes.



 What Organization thinks about Enterprise mobility?
Based on the survey, 50% of organizations are keen on providing business applications such as ERP, CRM etc. will be available to use on mobile devices like tablets, and handsets. tablets being more preferred for business applications and mobile handsets for access to emails and collaboration tools. 


In today’s competitive market, almost all travel companies like Orbitz, retail companies have given the mobile enabled applications for the customers to use on daily basis. For example, let’s take Expedia Inc. which is considered to be a global leader in online travel, that offers notable deals and packages for flight booking, hotel reservations and car rentals. It gives an option of booking flights, hotels via mobiles with cheap discounts that caters to different ages by providing all sorts of trips ranging from beach holidays, cruises, city tour breaks etc. They Choose Expedia.com as more people prefer cheap holiday deals combined with luxury.  Expedia constantly works creating the website more user-friendly, but intense competition and the rise of price-comparison sites were leading to a decline in value, and its profitability After years of growing its business, Expedia found itself at a major crisis, with the need to evolve its online travel site to progress to the next level and better serve its customers. As per the latest report by (suite.searchmetrics.com.2016) Expedia has lost 25% of Search visibility in Google.

Even recently Expedia has acquired Orbitz to form a massive travel company. Joining of these two travel services could means access to even more massive volumes of data. The trick lies in handling, and processing this huge data in an efficient manner.

So here comes the use of Big data analytics that helps Expedia to find how to resolve the customer handling issues. By using Big data and analytics it helps them to understand the customer better and give them better customer relationship experience combined with better travel experience. Predictive models and visual analytics gives them a clarity on what they lack, how they can overcome it and standouts in the market. Finally, the motto of Expedia is to give right data to deliver customers with effective information to make good travel decisions and to develop a good rapport with their stakeholders in the market.

Complete solution to handle future problems and exceptions
Big data technique is the most powerful weapon in the world. With big data, we can look more closely to customer. For example, we can analyze the average customer satisfaction, the quality track of products and services, what customers need from organization.
1.      Apply visualization tools such as Tableau and perform real-time analytics to map out how service and product match customers’ needs and what level they satisfy.
2.      Perform predictions on what customer want with big data by performing predictive analytics using SAS, R and other related tools. All the data related to customer should be collect and pay attention to. For instance, gather customer’s preference, location, interest, and recent activities to analyze what customers’ preference. Then provide them with satisfied service for them before they request.
3.      Expedia can have a dedicated team to perform customer analytics based on their preferences and suggest the flights and hotels accordingly. In addition, collect the customer related history, ratings and reviews, perform analytics for this data, and suggest better fit for them.
4.      Also, improve the services based on the technology advancements and provide good offers or discount/ reward points to the customer based on loyalty. This makes customers use their products most for booking flights, hotels etc.
5.      Integrate technology and data to improve interaction. The data is growing day by day, while the representatives and other staffs is limited to handle all the customers’ problems or deal with all the customer services in high quality level. With machine learning, organizations can use special algorithm to analyze what problems that customer often encounter and how they can be satisfied. Then establish an intelligent system or a virtual robot to handle customer problems in high call volumes time.
6.      The hottest trend in the market to achieve next level of growth is to use Amazon Web services (AWS) for data analysis. There is plenty of tools available, so the organization must choose the right analytic tool that helps to store and visualize everything on cloud. Since Expedia is committed to deliver great experience to customers, it started using AWS that uses ESS (Expedia Suggest service) that has algorithm to detect customer location and other relevant details when booking through mobiles. In simple words its using SAS Business Analytics that displays suggestions when a customer types something. In order to optimize online customer experiences, as well as increase the valuable relationship of each customer thereby increasing customer satisfaction and loyalty, while achieving growth revenue at the same time.


Monday, June 20, 2016

Guidance to Big Data Analytics
Let’s today dive into Big Data Analytics and their tools. Also how Big data companies helps organization with their solutions. All of these big data tools have unique features with distinct functionality according to the process. We have discussed just the important Big data analytics software vendors.
Big data is creating a revolution in many industries and changing the competitive landscape with its potential benefits. ‘Big Data’ is the emerging discipline of capturing, storing, processing, analyzing and visualizing these huge quantities of information. The data sets may start at a few terabytes and run to many petabytes – far more than traditional data analysis packages can handle.
 In any industry, satisfying and meeting to the expectations of the customer’s is a huge responsibility for the organization. Nowadays customer experience improvements are happening in social media that is recorded in online and offline, with data being collected from smartphones, mobile apps, and various e-commerce portals. Due to huge volume of data to collect, store and analyze, the organization is in a position to know what works, what doesn’t work for their business. When the plan is executed well, it can help to boost customer loyalty and revenue growth. In other way round, if the customer is not pleased with the outcome, then the organization will lose customer service loyalty and will not be able to sustain in the growing market. In order to grow exponentially in the competitive landscape, companies must focus on products that are aligned with customer needs and desires.
Selecting a Big data tool
Out in the market, there are plethora of Big data tools that saves time, money and helps you to undertake strategic business decisions and insights. There are many tools depending upon certain areas that your organization wants to focus on. Currently there are many tools in the areas of storage, extraction, cleaning, mining, visualizing and analyzing. So based on the skill set and the requirements for your project choose the right data tool.
Quote of the day “A good data storage provider should offer you an infrastructure on which to run all your other analytics tools as well as a place to store and query your data.”
Below is the list of data tools that is making a bench mark in the market:

Hadoop: It’s an open source java framework that is primarily used for storing and analyzing big data. Hadoop helps in processing big data sets, where data is split into small parts across clusters or nodes. Its cost effective where the data is stored at per terabyte that delivers fast computation. Many major tech companies like Yahoo, IBM, Google use Hadoop framework for advertising, optimization of search engine process etc.


Cloud Era:
Cloud era is mostly and enterprise solution to help businesses manage their Hadoop ecosystem. Essentially, they do a lot of the hard work of administering Hadoop for you. They will also deliver a certain amount of data security, which is highly important if you’re storing any sensitive or personal data. They can help your business build an enterprise data hub, to allow people in your organization better access to the data you are storing.
IBM Big Data Analytics
Like many other big data companies, IBM builds its offerings on Hadoop – so it’s fast, affordable and open source. It allows businesses to capture, manage and analyze structured and unstructured data with its Big Insights product. This is also available on the cloud (Big Insights on Cloud) to give the benefits of outsourcing storage and processing, providing Hadoop as a service.
HP Big Data
Then, ‘when you’re ready to transform your infrastructure, HP can help you develop an IT architecture that provides the capacity to manage the volume, velocity, variety, voracity, and value of your data.’ The platform itself is based on Hadoop.
Microsoft
Microsoft’s big data solutions run on Hadoop and can be used either in the cloud or natively on Windows. Business users can use Hadoop to gain insights into their data using standard tools including Excel or Office 365. It can be integrated with core databases to analyze both structured and unstructured data and create sophisticated 3D visualizations
Amazon Web Services
Amazon supports products like Hadoop, Pig, Hive and Spark, enabling you to build your own solution on their platform and create your own big data stack. Amazon is a huge name in providing web hosting and other services, and the benefits of using them are unparalleled economies of scale and uptime. Amazon tend to offer a basic framework for customers to use, without providing much in the way of customer support
Tera Data
Teradata call their big data product a ‘data warehouse system’, which stores and manages data. The different server nodes share nothing, having their own memory and processing power, and each new node increases storage capacity. they also work with unstructured data gathered from online interactions.
Tableau
Tableau offers significant flexibility over how you work with data. Using Tableau’s own servers and Desktop visualization with your existing big data storage makes it a versatile and powerful system. Tableau supports more than 30 databases and formats, and is easy to connect to and manage.
Informatica Big Data
Informatica has several options that make life easy by giving you access to the functionality and allow you to integrate all types of data efficiently without having to learn Hadoop itself. This makes for a fantastically versatile solution that is still simple enough to be used without intensive training.
QlikView
Qlik View offers two big data solutions, enabling users to switch between them as they require. This offers exceptional performance, and other features further enhance response rates and make exploring very large data sets extremely fast.
Benefits of Big Data Tools:
1.       Big Data Tools like hadoop allow businesses to store massive volumes of data at much cheaper price.
2.       One of the major advantages of big data analytics is that it gives businesses access to data that was previously unavailable or difficult to access.
3.       Helps to optimize sale order delivery, optimal store hours by efficient data analysis and business model.
4.       HR departments in some companies utilize big Data analytics to optimize workforce by hiring right candidates with right skill set. For ex: Xerox used Big Data to reduce the attrition rate by 20% in its call centers, as stated by ‘Information Week.com’
Use Cases for big Data Analysis

There are many industries that shows how Big Data is used: Below is a diagrammatic picture that shows impact of Big Data in the industries like Healthcare, Marketing etc.

Conclusion
Big data isn’t just an emerging phenomenon. It’s already here and being used by major companies to drive their business forwards. As ever, which you decide on will depend on a number of factors. These include not just the nature of the data you are working with, but organizational budgets, infrastructure and the skillset of your team, amongst other things. Others are intended to be more flexible but should only be used by those with coding expertise. With advent growth in Technology, streamlining to customer’s needs is a major challenge to the companies. Their goals, and services change according to the market value. To cope up with this trend the organization must continually produce a change in their services and be flexible to new business models that cater to the needs of their customers in all aspects, See you soon with more information on this in my next blog. According to the Information Week.com website reports suggest that Avis Budget has committed to doing all of this. It implemented an integrated strategy to increase market share, which has yielded hundreds of millions of dollars in additional revenue”.

So, for any company to thrive in the market must rely on customer feedback survey, and binge into social media to track and monitor customer satisfaction that benefits both business and customers.
See you soon with more information on this in my next blog.
References:

https://www.youtube.com/watch?v=qXyzDd2heK8 - What is Big Data and why does it matter?

 

https://www.youtube.com/watch?v=J0bp2kUh9hw - How to predict the future with big data?