Teaching Computational Geosciences with MATLAB, Part 4: Univariate Stats

In a series of blog posts, I will tell you a little about how I teach computational geosciences with MATLAB.  After the introduction to MATLAB on the first day, I teach univariate statistics in the morning of the second day of the five-day course.

Univariate statistical methods are described in Chapter 3 of the MRES book. We first need to describe the characteristics of the sample using statistical parameters, and to compute an empirical distribution (descriptive statistics). A brief introduction is provided to the most important statistical parameters (such as the measures of central tendency and dispersion), followed by MATLAB examples. Over the years of teaching I found that shortening the theoretical part in favor of the practical part is much better than too much lecturing. Therefore the only MATLAB example I use during this first part of the stats course is the example described in the blog post “Sample Size: How many is enough?“.

During the second part of the course on univariate statistics I introduce a selection of theoretical distribution that shows similar characteristics to the empirical distribution. We then try to draw conclusions from the sample that can be applied to the larger population of interest (hypothesis testing). Then I introduce the most important statistical tests for applications in earth sciences. The final section in this chapter introduces methods used to fit distributions to our own data sets. The MATLAB example explains the Chi2-test introduced by Karl Pearson (1900) involving the comparison of distributions, allowing two distributions to be tested for derivation from the same population. I think the Chi2-test, described in Chapter 3.9 of the book, is a good example to show how statistical tests work. After the introduction of the Chi2-test I briefly demonstrate how to use distribution fitting techniques.

This morning course on univariate statistics has been working well for many years. The students have a complete workflow through a statistical analysis, without being burdened by too many of different methods, designed for specific types of data and tasks. They get information about other methods, but only about what is different about these methods than by the method shown. Let me know how to teach your univariate statistics with or without MATLAB in your classes!