## Data Science with R and Python Course Details

With Artificial Intelligence continuing to be a prominent buzzword in 2019; the race to implement artificial intelligence (AI) and machine learning into products and services across every industry has caused a job boom in the field. Most of the organizations across globe have already beginning to see the incredible capabilities of AI, using advantages of AI to enhance human intelligence and gain real value from their data.

The global machine learning (ML) market is estimated to grow from $1.4 billion in 2017 to $8.8 billion by 2022. AI is projected to create 2.3 million related jobs by 2020. Artificial Intelligence (AI) will define the next generation of software solutions. Human-like capabilities such as understanding natural language, speech, vision, and making inferences from knowledge will extend software beyond the app.

## Course Curriculum

#### Basic Probability and Terms

Events and their Probabilities | Rules of Probability | Conditional Probability and Independence | Permutations and Combinations | Bayers Theorem | Descriptive Statistics | Compound Probability | Conditional Probability.

#### Probability Distributions

Types of Distributions | Functions of Random Variables | Probability Distribution Graphs | Conﬁdence Intervals.

#### Data Transformations and Quality Analysis

Merge, Rollup, Transpose and Append | Missing Analysis and Treatment Outlier Analysis and Treatment

#### Exploratory Data Analysis

Summarizing and Visualizing the Important Characteristics of Data | Hypothesis Testing | Visualizations | Univariates, Bivariates | Crosstabs, Correlation

#### Linear Regression

Implementing Simple & Multiple Linear Regression with Python | Making Sense of Result Parameters | Model Validation | Handling Other Issues/Assumptions in Linear Regression: Handling Outliers, Categorical Variables, Autocorrelation, Multicollinearity, Heteroskedasticity | Prediction and Conﬁdence Intervals | Use Cases

#### Logistics Regression

Implementing Logistic Regression with Python | Making Sense of Result Parameters: Wald Test, Likelihood Ratio Test Statistic, Chi-Square Test | Goodness of Fit Measures | Model Validation: Cross Validation, ROC Curve, Confusion Matrix | Use Cases

#### Decision Trees

Implementing Decision Trees using Python | Homogeneity | Entropy Information Gain | Gini Index | Standard Deviation Reduction | Vizualizing & Prunning a Tree | Implementing Random Forests using Python | Random Forest Algorithm | Important hyper-parameters of Random Forest for tuning the model | Variable Importance | Out of Bag Errors

#### Pandas

Introduction to Pandas | IO Tools | Basics of NumPy | NumPy Functions Pandas – Series and Data frames,

#### Scikit Learn

Introduction to SciKit Learn | Load Data into Scikit Learn | Run Machine Learning Algorithms Both for Unsupervised and Supervised Data | Supervised Methods: Classiﬁcation & Regression | Unsupervised Methods: Clustering, Gaussian Mixture Models | Decide What’s the Best Model for Every Scenario.

#### 3 Projects

Linear + Logistics + Decision trees

#### Keras

Keras for Classiﬁcation and Regression in Typical Data Science Problems | Setting up KERAS | Diﬀerent Layers in KERAS | Creating a Neural Network Training Models and Monitoring | Artiﬁcial Neural Network

#### Tensorflow

Introducing Tensorﬂow | Neural Networks using Tensorﬂow | Debugging and Monitoring | Convolutional Neural Networks | Unsupervised Learning

#### 2 Projects

ANN + CNN

#### Introduction to R

What is R? |What is Open Source? |Capabilities of R |GUI for R| R IDE – Rstudio |Using R

#### Programming in R

Data Types | Operators in R |Data Input and Output |R Data Frames |R statistics – Mean, Median, Mode etc. | Data Manipulation in R – Counting, Merging, Append, Sort, Subset, Filter, New Variable Creation etc. |R Logical Statements – If/ else, Loops etc. |Plotting- Graphs and Charts | Packages in R- Details of the most commonly used packages | Functions in R (High Level) |R- Best Practices.

#### Introduction

What is Statistics |Data Types |Qualitative vs. Quantitative |Basic Operations Based on Data Type |Variables |Measurement Scales |Measures of Variance |Measures of Central Tendency |Correlation vs. Causation (Correlational vs. Experimental Research) |Sampling – Usage of Sampling | Distributions |Central Limit Theorem |Hypothesis Testing | Types of Hypothesis Testing |Introduction to ANOVA and Basics of Regression/Classification.

#### Linear regression

Implementing Simple & Multiple Linear Regression with R | Making Sense of Result Parameters | Model Validation | Handling Other Issues/Assumptions in Linear Regression: Handling Outliers, Categorical Variables, Autocorrelation, Multicollinearity, Heteroskedasticity | Prediction and Conﬁdence Intervals | Use Cases.

#### Logistics Regression

Implementing Logistic Regression with R | Making Sense of Result Parameters: Wald Test, Likelihood Ratio Test Statistic, Chi-Square Test | Goodness of Fit Measures | Model Validation: Cross Validation, ROC Curve, Confusion Matrix | Use Cases.

#### Decision Trees

Implementing Decision Trees using R | Homogeneity | Entropy Information Gain | Gini Index | Standard Deviation Reduction | Vizualizing & Prunning a Tree | Implementing Random Forests using Python | Random Forest Algorithm | Important hyper-parameters of Random Forest for tuning the model | Variable Importance | Out of Bag Errors.

#### 3 Projects

Linear + Logistics + Decision trees