Course Leader: Dr Tomasz Szubert
Home Institution: WSB University
Course pre-requisite: Basics of Mathematics
Course Overview
The course has two main goals: to present how to work in the R program (an advanced but fully free tool for mathematical and statistical calculations) and to present the most important analytical techniques (starting from the description of one variable distribution, searching for the relationship between more variables, modeling of cause-and-effect relationships, creating dynamic forecasts, up to selected methods of multivariate analysis - i.e. cluster analysis or classification trees). After completing the course, the student will be able to perform professional statistical analyzes on various types of data: from simple series with one variable up to multidimensional tables and even spatial data.
Learning Outcomes
After completing the course the student will be able to:
- import data of various types (XLS, CVS, etc.) into the R program, transform variables (aggregate & recode them), create selection conditions, create own calculation functions, present data on charts, export results to present in other programs
- make a synthetic description of the analyzed variables (using measures of central tendency and variability, measures of asymmetry and concentration, elements of statistical inference)
- model causal relationships using linear and non-linear regression models and make predictions for a dependent variable based on them
- model the time series, by decomposing the series into the trend, cyclic, seasonal and random fluctuations and on their basis forecast future values of the analyzed variable
- classify data, i.e. using cluster analysis, classification trees or discriminant analysis
- present spatial data in the form of maps and cartograms
- create a dynamic application of the R-Shiny type, in which by adding buttons, sliders, drop-down lists etc. it will be able to present graphs in a professional manner and publish them on websites
Course Content
Basics of R
program installation, license, help system, program modes, basic commands, data import and saving work results, installation of packages, basic calculations, graphics, control instructions, types and structures of data, an overview of R-environment applications in business process modelling
Structure analysis of data
measures of central tendency and variability, measures of asymmetry and concentration, sample statistics as estimators of population parameters, confidence intervals for means and proportions, testing of statistical hypothesis
Regression and correlation
analysis of correlation and regression of two quantitative variables (Pearson correlation coefficient and linear regression), multi-variable linear correlation and regression, non-linear models
Forecasting in time series
methods of isolating linear and non-linear trends from time series, analysis of seasonal fluctuations in time series, moving average method, exponential smoothing methods
Spatial data analysis
downloading maps and data attached to them in the form of SHP (shape of a map) and DBF (database) files, presentation of the intensity of individual variables for the studied areas, checking whether there are a spatial correlation and determination of spatial regimes: groups of objects with similar properties
R-Shiny application
creating graphs, tables and dynamic reports by using format tools for data visualization (buttons, sliders, drop-down lists etc.), publishing the created application on websites
Instructional Method
Learning the subject will initially be based on the implementation of the basic R program codes (such basic procedures can be found on many websites, such as the official R-Cran website), then the students will learn how to modify these codes according to their needs and finally they will create procedures completely from start, without using any ready-made formulas. All necessary materials can be found in the Internet resources, but the lecturer will also provide his materials.
Assessment
Students participating in the classes will have to design path of analysis carried out on their own data, other than shown in the classroom, but it is not a complicated challenge, because the advantage of the R program is to perform the analysis using procedures written in special scripts, therefore replacing the data set name, names of variables and necessary parameters, it will very quickly bring the expected results. The awarded points will depend on the level of complexity of a given method