Professional Data Science

Home / Professional Data Science

1.Introduction to Big Data Analytics :

1.1 Big Data Overview

1.1.1 Data Structures

1.1.2 Analyst Perspective on Data Repositories

1.2 State of the Practice in Analytics

1.2.1 BI Versus Data Science

1.2.2 Current Analytical Architecture

1.2.3 Drivers of Big Data

1.2.4 Emerging Big Data Ecosystem and a New Approach to Analytics

1.3 Key Roles for the New Big Data Ecosystem

1.4 Examples of Big Data Analytics

 

2.Data Analytics Lifecycle:

2.1 Data Analytics Lifecycle Overview

2.1.1 Key Roles for a Successful Analytics Project

2.1.2 Background and Overview of Data Analytics Lifecycle

2.2 Phase 1: Discovery

2.2.1 Learning the Business Domain

2.2.2 Resources

2.2.3 Framing the Problem

2.2.4 Identifying Key Stakeholders

2.2.5 Interviewing the Analytics Sponsor

2.2.6 Developing Initial Hypotheses

2.2.7 Identifying Potential Data Sources

2.3 Phase 2: Data Preparation

2.3.1 Preparing the Analytic Sandbox

2.3.2 Performing ETLT

2.3.3 Learning About the Data

2.3.4 Data Conditioning

2.3.5 Survey and Visualize

2.3.6 Common Tools for the Data Preparation Phase

2.4 Phase 3: Model Planning

2.4.1 Data Exploration and Variable Selection

2.4.2 Model Selection

2.4.3 Common Tools for the Model Planning Phase

2.5 Phase 4: Model Building

2.5.1 Common Tools for the Model Building Phase

2.6 Phase 5: Communicate Results

2.7 Phase 6: Operationalize

2.8 Case Study: Global Innovation Network and Analysis (GINA)

2.8.1 Phase 1: Discovery

2.8.2 Phase 2: Data Preparation

2.8.3 Phase 3: Model Planning

2.8.4 Phase 4: Model Building

2.8.5 Phase 5: Communicate Results

2.8.6 Phase 6: Operationalize

 

3.Review of Basic Data Analytic Methods Using R :

3.1 Introduction to

3.1.1 R Graphical User Interfaces

3.1.2 Data Import and Export

3.1.3 Attribute and Data Types

3.1.4 Descriptive Statistics

3.2 Exploratory Data Analysis

3.2.1 Visualization Before Analysis

3.2.2 Dirty Data

3.2.3 Visualizing a Single Variable

3.2.4 Examining Multiple Variables

3.2.5 Data Exploration Versus Presentation

3.3 Statistical Methods for Evaluation

3.3.1 Hypothesis Testing

3.3.2 Difference of Means

3.3.3 Wilcoxon Rank-Sum Test

3.3.4 Type I and Type II Errors

3.3.5 Power and Sample Size

3.3.6 ANOVA

 

4.Advanced Analytical Theory and Methods: Clustering:

4.1 Overview of Clustering

4.2 K-means

4.2.1 Use Cases

4.2.2 Overview of the Method

4.2.3 Determining the Number of Clusters

4.2.4 Diagnostics

4.2.5 Reasons to Choose and Cautions

4.3 Additional Algorithms

 

5.Advanced Analytical Theory and Methods: Association Rules :

5.1 Overview

5.2 Apriori Algorithm

5.3 Evaluation of Candidate Rules

5.4 Applications of Association Rules

5.5 An Example: Transactions in a Grocery Store

5.5.1 The Groceries Dataset

5.5.2 Frequent Itemset Generation

5.5.3 Rule Generation and Visualization

5.6 Validation and Testing

5.7 Diagnostics

 

6.Advanced Analytical Theory and Methods: Regression :

6.1 Linear Regression

6.1.1 Use Cases

6.1.2 Model Description

6.1.3 Diagnostics

6.2 Logistic Regression

6.2.1 Use Cases

6.2.2 Model Description

6.2.3 Diagnostics

6.3 Reasons to Choose and Cautions

6.4 Additional Regression Models

 

7. Advanced Analytical Theory and Methods: Classification:

7.1 Decision Trees

7.1.1 Overview of a Decision Tree

7.1.2 The General Algorithm

7.1.3 Decision Tree Algorithms

7.1.4 Evaluating a Decision Tree

7.1.5 Decision Trees in R

7.2 Naïve Bayes

7.2.1 Bayes’ Theorem

7.2.2 Naïve Bayes Classifier

7.2.3 Smoothing

7.2.4 Diagnostics

7.2.5 Naïve Bayes in R

7.3 Diagnostics of Classifiers

7.4 Additional Classification Methods

 

8.Advanced Analytical Theory and Methods: Time Series Analysis:

8.1 Overview of Time Series Analysis

8.1.1 Box-Jenkins Methodology

8.2 ARIMA Model

8.2.1 Autocorrelation Function (ACF)

8.2.2 Autoregressive Models

8.2.3 Moving Average Models

8.2.4 ARMA and ARIMA Models

8.2.5 Building and Evaluating an ARIMA Model

8.2.6 Reasons to Choose and Cautions

8.3 Additional Methods

9.Advanced Analytical Theory and Methods: Text Analysis:

9.1 Text Analysis Steps

9.2 A Text Analysis Example

9.3 Collecting Raw Text

9.4 Representing Text

9.5 Term Frequency—Inverse Document Frequency (TFIDF)

9.6 Categorizing Documents by Topics

9.7 Determining Sentiments

9.8 Gaining Insights

 

10.Advanced Analytics—Technology and Tools: MapReduce and Hadoop:

10.1 Analytics for Unstructured Data

10.1.1 Use Cases

10.1.2 MapReduce

10.1.3 Apache Hadoop

10.2 The Hadoop Ecosystem

10.2.1 Pig

10.2.2 Hive

10.2.3 HBase

10.2.4 Mahout

10.3 NoSQL

 

11.Advanced Analytics—Technology and Tools: In-Database Analytics:

11.1 SQL Essentials

11.1.1 Joins

11.1.2 Set Operations

11.1.3 Grouping Extensions

11.2 In-Database Text Analysis

11.3 Advanced SQL

11.3.1 Window Functions

11.3.2 User-Defined Functions and Aggregates

11.3.3 Ordered Aggregates

11.3.4 MADlib

 

12.The Endgame, or Putting It All Together:

12.1 Communicating and Operationalizing an Analytics Project

12.2 Creating the Final Deliverables

12.2.1 Developing Core Material for Multiple Audiences

12.2.2 Project Goals

12.2.3 Main Findings

12.2.4 Approach

12.2.5 Model Description

12.2.6 Key Points Supported with Data

12.2.7 Model Details

12.2.8 Recommendations

12.2.9 Additional Tips on Final Presentation

12.2.10 Providing Technical Specifications and Code

12.3 Data Visualization Basics

12.3.1 Key Points Supported with Data

12.3.2 Evolution of a Graph

12.3.3 Common Representation Methods

12.3.4 How to Clean Up a Graphic

12.3.5 Additional Considerations