MATLAB Example to Illustrate John Aitchison’s Log-Ratio Transformation, Part 3

A while back I wrote a post about John Aitchison’s Log-Ratio Transformation, Part 1, in the time domain and today Part 2 in the frequency domain. Here’s Part 3 with a MATLAB demonstration of a nice Aitchison example presented in an extended abstract by Pawlowsky-Glahn and Egozcue (2013).

In their article (in German) the authors show spurious correlations and countermeasures in the sense of J. Aitchison in a three-component system Sand, Lehm, Ton (engl. sand, loam, clay), with a dilution effect by Wasser (engl. water). To reproduce the example in MATLAB, we first clear the workspace and create colors for plotting.

clear, clc, close all

colors = [
   0 114 189
   217 83 25
   237 177 32
   126 47 142
]./255;

According to the authors, we have the samples in the rows of the arrays, the columns are Sand, Lehm, Ton, Wasser in the data set of Scientist A, but only Sand, Lehm, Ton in the data set of Scientist B. Both A and B get the same values for Sand, Lehm and Ton, but after normalizing it to one Sand and Lehm are positively correlated in A and negatively correlated in B.

According to my interpretation, however, due to the dilution effect of decreasing amount of Wasser in samples 1 to 3 in the analysis of Scientist A, Sand and Lehm together have a positive trend, that is slightly rotated towards a negative trend in the analysis of Scientist B after removing the Wasser content, i.e., normalizing to 1 including and excluding Wasser after drying has a different effect.

A = [
   0.1 0.2 0.1 0.6
   0.2 0.1 0.2 0.5
   0.3 0.3 0.1 0.3
];

B = [
   0.25 0.50 0.25
   0.40 0.20 0.40
   0.43 0.43 0.14
];

We can calculate the correlation coefficients by typing

corr(A(:,1),A(:,2))
corr(B(:,1),B(:,2))

which shows the different signs of the correlation coefficients in the data set of Scientist A and Scientist B.

ans =

   0.5000

ans =

   -0.5583

We can display the data by typing

figure('Position',[100 1000 800 300])
axes('Position',[0.1 0.15 0.35 0.75],...
   'Box','On',...
   'YLim',[0 0.7])
line(1:3,A(:,1),...
   'Color',colors(2,:),...
   'LineWidth',1,...
   'LineStyle','--')
line(1:3,A(:,2),...
   'Color',colors(3,:),'LineWidth',1,...
   'LineStyle','-.')
line(1:3,A(:,3),...
   'Color',colors(4,:),'LineWidth',1,...
   'LineStyle','-.')
line(1:3,A(:,4),...
   'Color',colors(1,:),...
   'LineWidth',1,...
   'LineStyle','--')
legend('Sand','Lehm','Ton','Wasser',...
   'Location','NorthWest',...
   'Box','Off')
axes('Position',[0.55 0.15 0.35 0.75],...
   'Box','On',...
   'YLim',[0 0.7])
line(1:3,B(:,1),...
   'Color',colors(2,:),...
   'LineWidth',1,...
   'LineStyle','--')
line(1:3,B(:,2),...
   'Color',colors(3,:),...
   'LineWidth',1,...
   'LineStyle','-.')
line(1:3,B(:,3),...
   'Color',colors(4,:),...
   'LineWidth',1,...
   'LineStyle','-.')
legend('Sand','Lehm','Ton',...
   'Location','SouthWest',...
   'Box','Off')

We can see this effect by calculating the correlation matrix (the values in Tab 2, B are slightly different in the paper).

corr_A = corrcoef(A)
corr_B = corrcoef(B)

which yields

corr_A =

   1.0000   0.5000   0.0000   -0.9820
   0.5000   1.0000  -0.8660   -0.6547
   0.0000  -0.8660   1.0000    0.1890
  -0.9820  -0.6547   0.1890    1.0000


corr_B =

   1.0000   -0.5583   -0.0675
  -0.5583    1.0000   -0.7901
  -0.0675   -0.7901    1.0000

As said above the joint positive trend of Sand and Lehm in the analysis of Scientist A is removed after removing the negative trend in Wasser content through drying the samples by Scientist B. We correct the dilution effect by calculating the log-ratios according to the principles of John Aitchison

lrA(:,1) = log(A(:,1)./A(:,2));
lrA(:,2) = log(A(:,1)./A(:,3));
lrA(:,3) = log(A(:,2)./A(:,3));

lrB(:,1) = log(B(:,1)./B(:,2));
lrB(:,2) = log(B(:,1)./B(:,3));
lrB(:,3) = log(B(:,2)./B(:,3));

and display the data by typing

figure('Position',[100 600 800 300])
axes('Position',[0.1 0.15 0.35 0.75],...
   'Box','On',...
   'YLim',[0 3])
line(1:3,lrA(:,1),...
   'Color',colors(2,:),...
   'LineWidth',1,...
   'LineStyle','--')
line(1:3,lrA(:,2),...
   'Color',colors(3,:),...
   'LineWidth',1,...
   'LineStyle','-.')
line(1:3,lrA(:,3),...
   'Color',colors(4,:),...
   'LineWidth',1,...
   'LineStyle','-.')
legend('Sand','Lehm','Ton',...
   'Location','NorthWest',...
   'Box','Off')
axes('Position',[0.55 0.15 0.35 0.75],...
   'Box','On',...
   'YLim',[0 3])
line(1:3,lrB(:,1),...
   'Color',colors(2,:),...
   'LineWidth',1,...
   'LineStyle','--')
line(1:3,lrB(:,2),...
   'Color',colors(3,:),...
   'LineWidth',1,...
   'LineStyle','-.')
line(1:3,lrB(:,3),...
   'Color',colors(4,:),...
   'LineWidth',1,...
   'LineStyle','-.')
legend('Sand','Lehm','Ton',...
   'Location','SouthWest',...
   'Box','Off')

As we see the data sets are now identical with no differences in the correlations. We can also test this by calculating the correlation matrices of log-ratios that almost identical: we get the same (true) correlations.

corr_lrA = corrcoef(lrA)
corr_lrB = corrcoef(lrB)

which yields

corr_lrA =

   1.0000         0   -0.7377
   0         1.0000    0.6751
  -0.7377    0.6751    1.0000


corr_lrB =

   1.0000         0   -0.7306
   0         1.0000    0.6828
  -0.7306    0.6828    1.0000

References:

Aitchison, J., 1986, 2003, The Statistical Analysis of Compositional Data. Blackburn PR, 460 pages.

Aitchison, J., 1999, Logratios and natural laws in compositional data analysis, Mathematical Geology, 31, 563-580.

Pawlowsky-Glahn, V., Egozcue, J.J., Tolosana-Delgado, R., 2007, Lecture Notes on Compositional Data Analysis. (Link)

Pawlowsky-Glahn, V., Egozcue, J.J., 2013, Statistische Analyse von Kompositionsdaten. 58. Berg- und Hüttenmännischer Tag: GIS – Geowissenschaftliche Anwendungen und Entwicklungen. Abstract Volume, 253–360. (Link)