# Data Science with R Training

• Overview
• Course Content
• Drop us a Query

The training on Data Science with R provides the skills required to work with real data sets and provide an opportunity to use data to provide data-driven strategic and tactical recommendations. This training will provide some insights on techniques such as linear and logistic regression, ANOVA, Segmentation, Ensemble models, SVM and machine learning in big data. In addition to technical skills, the program also allows students to build effective leadership and communication skills to advance their career upon graduation.

The training allows the learner to:

• Explore R data structures and syntaxes
• Read and write data from a local file to a cloud-hosted database
• Work with data, get summaries, and transform them to fit your needs
• Explore R language fundamentals, including basic syntax, variables, and types
• Create functions and use control flow
• Read, write and work with data in R
Target audience
• Professionals working as Data and Business Analysts
• Software professionals willing to change career path into analytics field
• Individuals having an interest in the field of Data Science
• Graduates willing to make career in Analytics and Data Science
Prerequisites

The following are the prerequisites for joining Data Science with R training:

• Should have any of these degrees in the STEM fields: Master’s /PhD/Graduate Degree
• Know the fundamentals of programming
• Know the basics of SQL
• Familiar with the basic math and statistic concepts

## Module 1: Essential to R programming

• An Introduction to R
• History of  R
• Introduction to R
• The R environment  What is Statistical Programming?
• Why use a command line?
• Introduction to the R language
• Starting and quitting R
• Basic features of R
• Calculating with R
• Named storage
• Functions
• Exact or approximate?
• R is case-sensitive
• Listing the objects in the workspace
• Vectors
• Extracting elements from vectors
• Vector arithmetic
• Simple patterned vectors
• Missing values and other special values
• Character vectors
• Factors
• More on extracting elements from vectors
• Matrices and arrays
• Data frames
• Dates and times
• Import and Export data in R
• Importing data in to R
• CSV File
• Excel File
• Import data from text table
• SAS and SPSS datasets
• Exporting Data from R
• CSV File
• Text Table
• Excel File
• SAS dataset
• Merge / Join
• Inner Join
• Left Join
• Right Join
• Full Join
• Anti Join
• Semi Join
• Programming statistical graphics
• High-level plots
• Bar charts and dot charts
• Pie charts
• Histograms
• Box plots
• Scatterplots
• QQ plots
• Density Plot
• Choosing a high-level graphic
• Low-level graphics functions
• The plotting region and margins
• Setting graphical parameters
• Programming with R
• Flow control
• The for() loop
• The if() statement
• The while() loop
• The repeat loop, and the break and next statements
• Apply
• Sapply
• Lapply
• Managing complexity through functions  What are functions?
• Scope of variables

## Module 2: Data Manipulation Techniques using R programming

• Data in R
• Modes and Classes
• Data Storage in R
• Testing for Modes and Classes
• Structure of  R Objects
• Conversion of Objects
• Missing Values
• Working with Missing Values
• Comma- and Tab-Delimited Input Files
• Fixed-Width Input Files
• Extracting Data from R Objects
• Connections
• Generating Data
• Sequences
• Random Numbers
• Permutations
• Random Permutations
• Enumerating All Permutations
• Working with Sequences  v Spreadsheets
• The RODBC Package on Windows
• The gdata Package (All Platforms)
• Working with Binary Files
• Writing R Objects to Files in ASCII Format
• The write Function
• The write.table function
• Reading Data from Other Programs
• Dates
• as.Date
• The chron Package
• POSIX Classes
• Working with Dates
• Time Intervals
• Time Sequences
• Current time
• Present date
• Factors
• Using Factors
• Numeric Factors  v  Manipulating Factors
• Creating Factors from Continuous Variables
• Subscripting
• Basics of Subscripting
• Numeric Subscripts
• Character Subscripts
• Logical Subscripts
• Subscripting Matrices and Arrays
• Specialized Functions for Matrices
• Lists
• Subscripting Data Frames
• Character Manipulation
• Basics of Character Data
• Displaying and Concatenating Character
• Working with Parts of Character Values
• Regular Expressions in R
• Basics of Regular Expressions
• Breaking Apart Character Values
• Using Regular Expressions in R
• Substitutions and Tagging
• Reshaping Data
• Modifying Data Frame Variables
• Recoding Variables
• The recode Function
• Reshaping Data Frames
• The reshape Package
• Combining Data Frames
• Data Manipulation
• Random Selection of rows and columns
• Summarization
• Sort, Arrange
• Group by
• Filter
• Missing Value and Outlier
• Identify Missing values
• Impute missing values
• Identify Outliers
• Capping outliers

## DATA SCIENCE

• Introduction to Statistics:
• Types of Statistics
• Types of Data
• Descriptive Statistics
• Measures of Central Tendency
• Measures of Central Tendency – Usage Chart
• Measures of Dispersion / Variability
• Measures of Shape
• Application of Variance/Std Deviation
• Hypothesis Testing
• Applications of Hypothesis Testing (Called T Test or Z Test)
• Steps in Hypothesis Testing
• Anova (Analysis of Variance)
• What is Anova
• Anova Steps
• Simple One-Way Anova
• Simple Two-Way Anova With Multiple Variables
• Chi Square Tests
• What is Chi-Square
• Applications of Chi-Square
• Correlation
• Types of Correlation
• Properties of Correlation
• Methods of Calculating Correlation
• Steps to Calculate Correlation
• Regression Analysis
• What is Regression
• Types of Regression Analysis
• Properties of The Regression Line
• Validating the Model
• Regression Assumptions
• Data Transformation for Regression
• Dummy Variable Analysis
• Variable Selection Procedure for Regression
• Forward Selection Procedure
• Backward Elimination Procedure
• Stepwise Regression Method
• Logistic Regression
• Likelihood Profiling
• Assumption
• Variable Selection Method :- Woe And Iv
• Model Validation
• Model Performance
• Prediction
• Cluster Analysis
• What is cluster
• Application of clustering
• Types of clustering
• K Means
• Dendrogram
• Validation of Cluster
• Decision Tree
• What is decision Tree
• How decision tree works
• Cart
• Pruning
• Overfitting
• Underfitting
• Model validation
• Model performance
• What is MBA
• Application of MBA
• Support
• Confidence
• Lift
• Rules
• Random Forest
• What is random forest
• Application of random forest
• Tune parameters
• How to tune parameters
• Model validation
• Model performance
• Support Vector Machine
• What is support vector machine
• Why to use SVM
• Hyperplane
• Kernel
• Cost
• Gamma
• Model validation
• Model performance
• Naïve bayes
• What is Naïve bayes
• Bayes theorem
• Conditional probability
• Prior probability
• Posterior probability
• Application of Naïve bayes
• Model validation
• Model performance
• ARIMA
• What is time series
• What is Arima
• Stationary
• Seasonality
• Trend
• How to find p,d,q
• What are p,d,q
• Find best model
• Forecasting
• GBM

+