Introduction to MS Excel
Functionalities for a Data Scientist
Data Analysis in Excel
Basic Data Manipulation Functions- Mean, Maximum, Round, Sum etc., Statistical functions, Filter, sort, lookup,Using Pivots and Plotting in Excel – Usage of Visualization Capabilities.
Introduction to R
What is R? |What is Open Source? |Capabilities of R |GUI for R| R IDE – Rstudio |Using R
Programming in R
Data Types | Operators in R |Data Input and Output |R Data Frames |R statistics – Mean, Median, Mode etc. | Data Manipulation in R – Counting, Merging, Append, Sort, Subset, Filter, New Variable Creation etc. |R Logical Statements – If/ else, Loops etc. |Plotting- Graphs and Charts | Packages in R- Details of the most commonly used packages | Functions in R (High Level) |R- Best Practices.
Analytics in R and Statistics
What is Statistics |Data Types |Qualitative vs. Quantitative |Basic Operations Based on Data Type |Variables |Measurement Scales |Measures of Variance |Measures of Central Tendency |Correlation vs. Causation (Correlational vs. Experimental Research) |Sampling – Usage of Sampling | Distributions |Central Limit Theorem |Hypothesis Testing | Types of Hypothesis Testing |Introduction to ANOVA and Basics of Regression/Classification.
Introduction to Simple Linear Regression | Graphical Understanding of Regression (Scatter Plot, Box Plot, Density Plot) |Example Problem and | Mathematics behind Regression |Assumptions for Linear Regression |Correlation (Linear and Non Linear |Introduction to Multiple Linear Regression |Building A Regression Model (Steps to Establish a Regression) |Data Preparation – Data Audit, Missing Value and Outliers
Building the model |Linear Regression – Interpretation of Output and Diagnostics |Assessing the Coefficients |P Value – Checking for Statistical Significance |R-Square and Adjusted R Squared |Standard Error and F-Statistic |How to Know if the Model is Best Fit for Your Data? |Using Linear Model for Predictions |Checking Accuracy and Error Rates |Heteroskadisticity |Model Improvement| Over-fitting and Cross |Validation |Multicollinearity and VIF |Do it Yourself Case |Flavor of Advanced Regression Models.
Why Logistic Regression| Introduction to Classification and Challenges with Linear Regression |Event Rate and Class Bias |Example Problem (Some real world examples of Binary Classification problems),Mechanics and Mathematics behind Logistic Regression|Assumptions for Logistic Regression |Building a Logistic Regression Model |Data Preparation – Data Audit, Missing Value and Outliers |Variable Importance and Feature Extraction |Create WOE for Categorical Variables |Compute Information Value |Multicollinearity (VIF) |Building Logit Models |Predictions |Logistic Regression – Interpretation of Output |Coefficients |Variable Importance |Model Diagnostics |Misclassification Error and Confusion Matrix |ROC Curve |Accuracy |Specificity, Sensitivity and F Score |Lift/Gain Charts and KS Curve |Model Improvement |Over-fitting and Cross Validation |Flavor of Advanced Classification Concepts – Classification of Unstructured Data |Do it Yourself Case.
Time Series Modeling
Introduction to Time Series |Difference between Time Series, Cross-Sectional and Pooled Data |Example Problem (Some real world examples of Time Series Problems), Mechanics and Fundamental of Mathematics behind Time series Analysis | Assumptions for Time Series analysis |Understanding Time Series Data |Visualizing Time Series Data |Stationary vs. No Stationary Data |Trend vs Seasonality vs White Noise |Decomposing Time Series Data |Decomposing Non-Seasonal Data |Decomposing Seasonal Data | Seasonally Adjusting |Forecasts using Exponential Smoothing | Simple Exponential Smoothing | Holt’s Exponential Smoothing |Holt-Winters Exponential Smoothing |Challenges with Smoothing |ARIMA Models |Concept of Auto-Correlation and Partial Auto Correlation |Differencing a Time Series |Selecting a Candidate ARIMA Model |Forecasting Using an ARIMA Model |Predictions and Diagnostics| Advanced Time Series Concepts |Do it Yourself Case.
Market Basket Analysis
Supervised, Unsupervised and Semi-supervised Algorithms | Concept of a Recommendation Engine Example Problem (Real world examples of MBA applications |MBA Hyper Parameters |Lift |Confidence |Support |Generating output using Association rules |Filtration of Rules |Removal of Redundant Rules |Control the Rules |Finding rules for Particular Entity |Visualizing Rules |Challenges with Association Rules and Ways to Overcome| Advanced Recommendation Engine Concepts |Do it Yourself Case
Type of Classification Algorithms |Fundamentals of Tree bases Systems | Concept of Impurity Measure | Building a Decision Tree Model|Prediction using Decision Trees |Over fitting and Cross Validation |Flavor of Advanced Concepts in Trees (Random Forests) |Decision Boundary of Tree based Algorithms |Types of Tree Algorithms.
Unsupervised Algorithms and Introduction to Clustering | Example Problem (Some real world examples of Clustering Applications) |Assumptions for Clustering | Mechanics of Clustering| Creating Clusters |Understanding the Output |Advanced Clustering Concepts | Do it your self case.
Python for Analytics
Understanding Python | Categorization in Python | Visualization |Model Evaluation
Linear + Logistics + Time Series + Market Basket Analysis + Decision Trees