# R Fundamentals Training Course

bspkrintro

## Duration

21 hours (usually 3 days including breaks)

## Requirements

Good understanding of statistics.

## Overview

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining.

## Course Outline

### Introduction and preliminaries

• Making R more friendly, R and available GUIs
• Rstudio
• Related software and documentation
• R and statistics
• Using R interactively
• An introductory session
• Getting help with functions and features
• R commands, case sensitivity, etc.
• Recall and correction of previous commands
• Executing commands from or diverting output to a file
• Data permanency and removing objects

### Simple manipulations; numbers and vectors

• Vectors and assignment
• Vector arithmetic
• Generating regular sequences
• Logical vectors
• Missing values
• Character vectors
• Index vectors; selecting and modifying subsets of a data set
• Other types of objects

### Objects, their modes and attributes

• Intrinsic attributes: mode and length
• Changing the length of an object
• Getting and setting attributes
• The class of an object

### Ordered and unordered factors

• A specific example
• The function tapply() and ragged arrays
• Ordered factors

### Arrays and matrices

• Arrays
• Array indexing. Subsections of an array
• Index matrices
• The array() function
• Mixed vector and array arithmetic. The recycling rule
• The outer product of two arrays
• Generalized transpose of an array
• Matrix facilities
• Matrix multiplication
• Linear equations and inversion
• Eigenvalues and eigenvectors
• Singular value decomposition and determinants
• Least squares fitting and the QR decomposition
• Forming partitioned matrices, cbind() and rbind()
• The concatenation function, (), with arrays
• Frequency tables from factors

### Lists and data frames

• Lists
• Constructing and modifying lists
• Concatenating lists
• Data frames
• Making data frames
• attach() and detach()
• Working with data frames
• Attaching arbitrary lists
• Managing the search path

### Data manipulation

• Selecting, subsetting observations and variables
• Filtering, grouping
• Recoding, transformations
• Aggregation, combining data sets
• Character manipulation, stringr package

• Txt files
• CSV files
• XLS, XLSX files
• SPSS, SAS, Stata,… and other formats data
• Exporting data to txt, csv and other formats
• Accessing data from databases using SQL language

### Probability distributions

• R as a set of statistical tables
• Examining the distribution of a set of data
• One- and two-sample tests

### Grouping, loops and conditional execution

• Grouped expressions
• Control statements
• Conditional execution: if statements
• Repetitive execution: for loops, repeat and while

### Day 3

• Simple examples
• Defining new binary operators
• Named arguments and defaults
• The '...' argument
• Assignments within functions
• Efficiency factors in block designs
• Dropping all names in a printed array
• Recursive numerical integration
• Scope
• Customizing the environment
• Classes, generic functions and object orientation

### Statistical analysis in R

• Linear regression models
• Generic functions for extracting model information
• Updating fitted models
• Generalized linear models
• Families
• The glm() function
• Classification
• Logistic Regression
• Linear Discriminant Analysis
• Unsupervised learning
• Principal Components Analysis
• Clustering Methods (k-means, hierarchical clustering, k-medoids)
• Survival analysis
• Survival objects in r
• Kaplan-Meier estimate
• Confidence bands
• Cox PH models, constant covariates
• Cox PH models, time-dependent covariates

### Graphical procedures

• High-level plotting commands
• The plot() function
• Displaying multivariate data
• Display graphics
• Arguments to high-level plotting functions
• Basic visualisation graphs
• Multivariate relations with lattice and ggplot package
• Using graphics parameters
• Graphics parameters list

### Automated and interactive reporting

• Combining output from R with text
• Creating html, pdf documents

★★★★★
★★★★★

## Some of our clients

#### is growing fast!

We are looking for a good mixture of IT and soft skills in Denmark!

#### As a NobleProg Trainer you will be responsible for:

• delivering training and consultancy Worldwide
• preparing training materials
• creating new courses outlines
• delivering consultancy
• quality management

#### At the moment we are focusing on the following areas:

• Statistic, Forecasting, Big Data Analysis, Data Mining, Evolution Alogrithm, Natural Language Processing, Machine Learning (recommender system, neural networks .etc...)
• SOA, BPM, BPMN
• Hibernate/Spring, Scala, Spark, jBPM, Drools
• R, Python
• Mobile Development (iOS, Android)
• LAMP, Drupal, Mediawiki, Symfony, MEAN, jQuery
• You need to have patience and ability to explain to non-technical people