Professional Data Science

Home / Professional Data Science


1.Exploratory Data Analysis
Elements of Structured Data
Further Reading
Rectangular Data
Data Frames and Indexes
Nonrectangular Data Structures
Further Reading
Estimates of Location
Median and Robust Estimates
Further Reading
Estimates of Variability
Standard Deviation and Related Estimates
Estimates Based on Percentiles
Further Reading
Exploring the Data Distribution
Percentiles and Boxplots
Frequency Table and Histograms
Density Estimates
Further Reading
Exploring Binary and Categorical Data
Expected Value
Further Reading
Further Reading
Exploring Two or More Variables
Hexagonal Binning and Contours (Plotting Numeric versus Numeric Data)
Two Categorical Variables
Categorical and Numeric Data
Visualizing Multiple Variables

2. Data and Sampling Distributions
Random Sampling and Sample Bias
Random Selection
Size versus Quality: When Does Size Matter?
Sample Mean versus Population Mean
Further Reading
Selection Bias
Regression to the Mean
Further Reading
Sampling Distribution of a Statistic
Central Limit Theorem
Standard Error
Further Reading
The Bootstrap
Resampling versus Bootstrapping
Further Reading
Confidence Intervals
Further Reading
Normal Distribution
Standard Normal and QQ-Plots
Long-Tailed Distributions
Further Reading
Student’s t-Distribution
Further Reading
Binomial Distribution
Further Reading
Poisson and Related Distributions
Poisson Distributions
Exponential Distribution
Estimating the Failure Rate
Weibull Distribution

3. Statistical Experiments and Significance Testing
A/B Testing
Why Have a Control Group?
Why Just A/B? Why Not C, D…?
For Further Reading
Hypothesis Tests
The Null Hypothesis
Alternative Hypothesis
One-Way, Two-Way Hypothesis Test
Further Reading
Permutation Test
Exhaustive and Bootstrap Permutation Test
Permutation Tests: The Bottom Line for Data Science
For Further Reading
Statistical Significance and P-Values
Type 1 and Type 2 Errors
Data Science and P-Values
Further Reading
Further Reading
Multiple Testing
Further Reading
Degrees of Freedom
Further Reading
Further Reading
Chi-Square Test
Chi-Square Test: A Resampling Approach
Chi-Square Test: Statistical Theory
Fisher’s Exact Test
Relevance for Data Science
Further Reading
Multi-Arm Bandit Algorithm
Further Reading
Power and Sample Size
Sample Size

4. Regression and Prediction
Simple Linear Regression
The Regression Equation
Fitted Values and Residuals
Least Squares
Prediction versus Explanation (Profiling)
Further Reading
Multiple Linear Regression
Assessing the Model
Model Selection and Stepwise Regression
Weighted Regression
Further Reading
Prediction Using Regression
The Dangers of Extrapolation
Confidence and Prediction Intervals
Factor Variables in Regression
Dummy Variables Representation
Factor Variables with Many Levels
Ordered Factor Variables
Interpreting the Regression Equation
Correlated Predictors
Confounding Variables
Interactions and Main Effects
Testing the Assumptions: Regression Diagnostics
Influential Values
Heteroskedasticity, Non-Normality and Correlated Errors
Partial Residual Plots and Nonlinearity
Polynomial and Spline Regression
Generalized Additive Models

5. Classification
Naive Bayes
Why Exact Bayesian Classification Is Impractical
The Naive Solution
Numeric Predictor Variables
Further Reading
Discriminant Analysis
Covariance Matrix
Fisher’s Linear Discriminant
A Simple Example
Further Reading
Logistic Regression
Logistic Response Function and Logit
Logistic Regression and the GLM
Generalized Linear Models
Predicted Values from Logistic Regression
Interpreting the Coefficients and Odds Ratios
Linear and Logistic Regression: Similarities and Differences
Assessing the Model
Further Reading
Evaluating Classification Models
Confusion Matrix
The Rare Class Problem
Precision, Recall, and Specificity
ROC Curve
Further Reading
Strategies for Imbalanced Data
Oversampling and Up/Down Weighting
Data Generation
Cost-Based Classification
Exploring the Predictions

6. Statistical Machine Learning
K-Nearest Neighbors
A Small Example: Predicting Loan Default
Distance Metrics
One Hot Encoder
Standardization (Normalization, Z-Scores)
Choosing K
KNN as a Feature Engine
Tree Models
A Simple Example
The Recursive Partitioning Algorithm
Measuring Homogeneity or Impurity
Stopping the Tree from Growing
Predicting a Continuous Value
How Trees Are Used
Further Reading
Bagging and the Random Forest
Random Forest
Variable Importance
The Boosting Algorithm
Regularization: Avoiding Overfitting
Hyperparameters and Cross-Validation

7. Unsupervised Learning
Principal Components Analysis
A Simple Example
Computing the Principal Components
Interpreting Principal Components
Further Reading
K-Means Clustering
A Simple Example
K-Means Algorithm
Interpreting the Clusters
Selecting the Number of Clusters
Hierarchical Clustering
A Simple Example
The Dendrogram
The Agglomerative Algorithm
Measures of Dissimilarity
Model-Based Clustering
Multivariate Normal Distribution
Mixtures of Normals
Selecting the Number of Clusters
Further Reading
Scaling and Categorical Variables
Scaling the Variables
Dominant Variables
Categorical Data and Gower’s Distance
Problems with Clustering Mixed Data


1. The Machine Learning Landscape
What Is Machine Learning?
Why Use Machine Learning?
Types of Machine Learning Systems
Supervised/Unsupervised Learning
Batch and Online Learning
Instance-Based Versus Model-Based Learning
Main Challenges of Machine Learning
Insufficient Quantity of Training Data
Nonrepresentative Training Data
Poor-Quality Data
Irrelevant Features
Overfitting the Training Data
Underfitting the Training Data
Stepping Back
Testing and Validating

2. End-to-End Machine Learning Project
Working with Real Data
Look at the Big Picture
Frame the Problem
Select a Performance Measure
Check the Assumptions
Get the Data
Create the Workspace
Download the Data
Take a Quick Look at the Data Structure
Create a Test Set
Discover and Visualize the Data to Gain Insights
Visualizing Geographical Data
Looking for Correlations
Experimenting with Attribute Combinations
Prepare the Data for Machine Learning Algorithms
Data Cleaning
Handling Text and Categorical Attributes
Custom Transformers
Feature Scaling
Transformation Pipelines
Select and Train a Model
Training and Evaluating on the Training Set
Better Evaluation Using Cross-Validation
Fine-Tune Your Model
Grid Search
Randomized Search
Ensemble Methods
Analyze the Best Models and Their Errors
Evaluate Your System on the Test Set
Launch, Monitor, and Maintain Your System

3. Classification
Training a Binary Classifier
Performance Measures
Measuring Accuracy Using Cross-Validation
Confusion Matrix
Precision and Recall
Precision/Recall Tradeoff
The ROC Curve
Multiclass Classification
Error Analysis
Multilabel Classification
Multioutput Classification

4. Training Models
Linear Regression
The Normal Equation
Computational Complexity
Gradient Descent
Batch Gradient Descent
Stochastic Gradient Descent
Mini-batch Gradient Descent
Polynomial Regression
Learning Curves
Regularized Linear Models
Ridge Regression
Lasso Regression
Elastic Net
Early Stopping
Logistic Regression
Estimating Probabilities
Training and Cost Function
Decision Boundaries
Softmax Regression

5. Support Vector Machines
Linear SVM Classification
Soft Margin Classification
Nonlinear SVM Classification
Polynomial Kernel
Adding Similarity Features
Gaussian RBF Kernel
Computational Complexity
SVM Regression
Under the Hood
Decision Function and Predictions
Training Objective
Quadratic Programming
The Dual Problem
Kernelized SVM
Online SVMs

6. Decision Trees
Training and Visualizing a Decision Tree
Making Predictions
Estimating Class Probabilities
The CART Training Algorithm
Computational Complexity
Gini Impurity or Entropy?
Regularization Hyperparameters

7. Ensemble Learning and Random Forests
Voting Classifiers
Bagging and Pasting
Bagging and Pasting in Scikit-Learn
Out-of-Bag Evaluation
Random Patches and Random Subspaces
Random Forests
Feature Importance
Gradient Boosting

8. Dimensionality Reduction
The Curse of Dimensionality
Main Approaches for Dimensionality Reduction
Manifold Learning
Preserving the Variance
Principal Components
Projecting Down to d Dimensions
Using Scikit-Learn
Explained Variance Ratio
Choosing the Right Number of Dimensions
PCA for Compression
Incremental PCA
Randomized PCA
Kernel PCA
Selecting a Kernel and Tuning Hyperparameters

9. Up and Running with TensorFlow
Creating Your First Graph and Running It in a Session
Managing Graphs
Lifecycle of a Node Value
Linear Regression with TensorFlow
Implementing Gradient Descent
Manually Computing the Gradients
Using autodiff
Using an Optimizer
Feeding Data to the Training Algorithm
Saving and Restoring Models
Visualizing the Graph and Training Curves Using TensorBoard
Name Scopes
Sharing Variables

10. Artificial Neural Networks
From Biological to Artificial Neurons
Biological Neurons
Logical Computations with Neurons
The Perceptron
Multi-Layer Perceptron and Backpropagation
Training an MLP with TensorFlow’s High-Level API
Training a DNN Using Plain TensorFlow
Construction Phase
Execution Phase
Using the Neural Network
Fine-Tuning Neural Network Hyperparameters
Number of Hidden Layers
Number of Neurons per Hidden Layer
Activation Functions

11. Training Deep Neural Nets
Vanishing/Exploding Gradients Problems
Xavier and He Initialization
Nonsaturating Activation Functions
Batch Normalization
Gradient Clipping
Reusing Pretrained Layers
Reusing a TensorFlow Model
Reusing Models from Other Frameworks
Freezing the Lower Layers
Caching the Frozen Layers
Tweaking, Dropping, or Replacing the Upper Layers
Model Zoos
Unsupervised Pretraining
Pretraining on an Auxiliary Task
Faster Optimizers
Momentum Optimization
Nesterov Accelerated Gradient
Adam Optimization
Learning Rate Scheduling
Avoiding Overfitting Through Regularization
Early Stopping
ℓ1 and ℓ2 Regularization
Max-Norm Regularization
Data Augmentation
Practical Guidelines

12. Distributing TensorFlow Across Devices and Servers
Multiple Devices on a Single Machine
Managing the GPU RAM
Placing Operations on Devices
Parallel Execution
Control Dependencies
Multiple Devices Across Multiple Servers
Opening a Session
The Master and Worker Services
Pinning Operations Across Tasks
Sharding Variables Across Multiple Parameter Servers
Sharing State Across Sessions Using Resource Containers
Asynchronous Communication Using TensorFlow Queues
Loading Data Directly from the Graph
Parallelizing Neural Networks on a TensorFlow Cluster
One Neural Network per Device
In-Graph Versus Between-Graph Replication
Model Parallelism
Data Parallelism

13. Convolutional Neural Networks
The Architecture of the Visual Cortex
Convolutional Layer
Stacking Multiple Feature Maps
TensorFlow Implementation
Memory Requirements
Pooling Layer
CNN Architectures

14. Recurrent Neural Networks
Recurrent Neurons
Memory Cells
Input and Output Sequences
Basic RNNs in TensorFlow
Static Unrolling Through Time
Dynamic Unrolling Through Time
Handling Variable Length Input Sequences
Handling Variable-Length Output Sequences
Training RNNs
Training a Sequence Classifier
Training to Predict Time Series
Creative RNN
Deep RNNs
Distributing a Deep RNN Across Multiple GPUs
Applying Dropout
The Difficulty of Training over Many Time Steps
Peephole Connections
GRU Cell
Natural Language Processing
Word Embeddings
An Encoder–Decoder Network for Machine Translation

15. Autoencoders
Efficient Data Representations
Performing PCA with an Undercomplete Linear Autoencoder
Stacked Autoencoders
TensorFlow Implementation
Tying Weights
Training One Autoencoder at a Time
Visualizing the Reconstructions
Visualizing Features
Unsupervised Pretraining Using Stacked Autoencoders
Denoising Autoencoders
TensorFlow Implementation
Sparse Autoencoders
TensorFlow Implementation
Variational Autoencoders
Generating Digits
Other Autoencoders

16. Reinforcement Learning
Learning to Optimize Rewards
Policy Search
Introduction to OpenAI Gym
Neural Network Policies
Evaluating Actions: The Credit Assignment Problem
Policy Gradients
Markov Decision Processes
Temporal Difference Learning and Q-Learning
Exploration Policies
Approximate Q-Learning and Deep Q-Learning
Learning to Play Ms. Pac-Man Using the DQN Algorithm


1 Linear Algebra
Scalars, Vectors, Matrices and Tensors
Multiplying Matrices and Vectors
Identity and Inverse Matrices
Linear Dependence and Span
Special Kinds of Matrices and Vectors
Singular Value Decomposition
The Moore-Penrose Pseudoinverse
The Trace Operator
The Determinant

2 Probability and Information Theory
Why Probability?
Random Variables
Probability Distributions
Marginal Probability
Conditional Probability
The Chain Rule of Conditional Probabilities
Independence and Conditional Independence
Expectation, Variance and Covariance
Common Probability Distributions
Useful Properties of Common Functions
Bayes’ Rule
Technical Details of Continuous Variables
Information Theory
Structured Probabilistic Models

3 Numerical Computation
Overflow and Underflow
Poor Conditioning
Gradient-Based Optimization
Constrained Optimization

4 Machine Learning Basics
Learning Algorithms
Capacity, Overfitting and Underfitting
Hyperparameters and Validation Sets
Estimators, Bias and Variance
Maximum Likelihood Estimation
Bayesian Statistics
Supervised Learning Algorithms
Unsupervised Learning Algorithms
Stochastic Gradient Descent
Building a Machine Learning Algorithm
Challenges Motivating Deep Learning

5 Deep Feedforward Networks
Gradient-Based Learning
Hidden Units
Architecture Design
Back-Propagation and Other Differentiation Algorithms
Historical Notes

6 Regularization for Deep Learning
Parameter Norm Penalties
Norm Penalties as Constrained Optimization
Regularization and Under-Constrained Problems
Dataset Augmentation
Noise Robustness
Semi-Supervised Learning
Multi-Task Learning
Early Stopping
Parameter Tying and Parameter Sharing
Sparse Representations
Bagging and Other Ensemble Methods
Adversarial Training
Tangent Distance, Tangent Prop, and Manifold Tangent Classifier

7 Optimization for Training Deep Models
How Learning Differs from Pure Optimization
Challenges in Neural Network Optimization
Basic Algorithms
Parameter Initialization Strategies
Algorithms with Adaptive Learning Rates
Approximate Second-Order Methods
Optimization Strategies and Meta-Algorithms

8 Convolutional Networks
The Convolution Operation
Convolution and Pooling as an Infinitely Strong Prior
Variants of the Basic Convolution Function
Structured Outputs
Data Types
Efficient Convolution Algorithms
Random or Unsupervised Features
The Neuroscientific Basis for Convolutional Networks
Convolutional Networks and the History of Deep Learning

9 Sequence Modeling: Recurrent and Recursive Nets
Unfolding Computational Graphs
Recurrent Neural Networks
Bidirectional RNNs
Encoder-Decoder Sequence-to-Sequence Architectures
Deep Recurrent Networks
Recursive Neural Networks
The Challenge of Long-Term Dependencies
Echo State Networks
Leaky Units and Other Strategies for Multiple Time Scales
The Long Short-Term Memory and Other Gated RNNs
Optimization for Long-Term Dependencies
Explicit Memory

10 Practical Methodology
Performance Metrics
Default Baseline Models
Determining Whether to Gather More Data
Selecting Hyperparameters
Debugging Strategies

11 Applications
Large-Scale Deep Learning
Computer Vision
Speech Recognition
Natural Language Processing

12 Linear Factor Models
Probabilistic PCA and Factor Analysis
Independent Component Analysis (ICA)
Slow Feature Analysis
Sparse Coding
Manifold Interpretation of PCA

13 Autoencoders
Undercomplete Autoencoders
Regularized Autoencoders
Representational Power, Layer Size and Depth
Stochastic Encoders and Decoders
Denoising Autoencoders
Learning Manifolds with Autoencoders
Contractive Autoencoders
Predictive Sparse Decomposition
Applications of Autoencoders

14 Representation Learning
Greedy Layer-Wise Unsupervised Pretraining
Transfer Learning and Domain Adaptation
Semi-Supervised Disentangling of Causal Factors
Distributed Representation
Exponential Gains from Depth
Providing Clues to Discover Underlying Causes

15 Structured Probabilistic Models for Deep Learning
The Challenge of Unstructured Modeling
Using Graphs to Describe Model Structure
Sampling from Graphical Models
Advantages of Structured Modeling
Learning about Dependencies
Inference and Approximate Inference
The Deep Learning Approach to Structured Probabilistic Models

16 Monte Carlo Methods
Sampling and Monte Carlo Methods
Importance Sampling
Markov Chain Monte Carlo Methods
Gibbs Sampling
The Challenge of Mixing between Separated Modes

17 Confronting the Partition Function
The Log-Likelihood Gradient
Stochastic Maximum Likelihood and Contrastive Divergence
Score Matching and Ratio Matching
Denoising Score Matching
Noise-Contrastive Estimation
Estimating the Partition Function

18 Approximate Inference
Inference as Optimization
Expectation Maximization
MAP Inference and Sparse Coding
Variational Inference and Learning
Learned Approximate Inference

19 Deep Generative Models
Boltzmann Machines
Restricted Boltzmann Machines
Deep Belief Networks
Deep Boltzmann Machines
Boltzmann Machines for Real-Valued Data
Convolutional Boltzmann Machines
Boltzmann Machines for Structured or Sequential Outputs
Other Boltzmann Machines
Back-Propagation through Random Operations
Directed Generative Nets
Drawing Samples from Autoencoders
Generative Stochastic Networks
Other Generation Schemes
Evaluating Generative Models


1 Basic Image Handling and Processing
PIL – the Python Imaging Library

2 Local Image Descriptors
Harris corner detector
SIFT – Scale-Invariant Feature Transform
Matching Geotagged Images

3 Image to Image Mappings
Warping images
Creating Panoramas

4 Camera Models and Augmented Reality
The Pin-hole Camera Model
Camera Calibration
Pose Estimation from Planes and Markers
Augmented Reality

5 Multiple View Geometry
Epipolar Geometry
Computing with Cameras and 3D Structure
Multiple View Reconstruction
Stereo Images

6 Clustering Images
K-means Clustering
Hierarchical Clustering
Spectral Clustering

7 Searching Images
Content-based Image Retrieval
Visual Words
Indexing Images
Searching the Database for Images
Ranking Results using Geometry
Building Demos and Web Applications

8 Classifying Image Content
K-Nearest Neighbors
Bayes Classifier
Support Vector Machines
Optical Character Recognition

9 Image Segmentation
Graph Cuts
Segmentation using Clustering
Variational Methods

10 OpenCV
The OpenCV Python Interface
OpenCV Basics
Processing Video


1. Up and Running with TensorFlow
2. Understanding TensorFlow Basics
3. Convolutional Neural Networks
4. Working with Text and Sequences, and TensorBoard Visualization
5. Word vectors, Advanced RNN and Embedding Visualization
6. TensorFlow Abstractions and Simplifications
7. Queues, Threads and Reading Data
8. Distributed TensorFlow
9. Exporting and Serving Models with TensorFlow


1. Language Processing and Python
2. Accessing Text Corpora and Lexical Resources
3. Processing Raw Text
4. Writing Structured Programs
5. Categorizing and Tagging Words
6. Learning to Classify Text
7. Extracting Information from Text
8. Analyzing Sentence Structure
9. Building Feature-Based Grammars
10.Analyzing the Meaning of Sentences
11.Managing Linguistic Data


1. Rosenblatt’s Perceptron
2. Model Building through Regression
3. The Least-Mean-Square Algorithm
4. Multilayer Perceptrons
5. Kernel Methods and Radial-Basis Function Networks
6. Support Vector Machines
7. Regularization Theory
8. Principal-Components Analysis
9. Self-Organizing Maps
10.Information-Theoretic Learning Models
11.Stochastic Methods Rooted in Statistical Machinanics
12.Dynamic Programming
14.Bayesian Filtering for State Estimation of Dynamic Systems
15.Dynamically Driven Recurrent Networks