During the current MATLAB data analysis course, we discussed the safest way to import data into MATLAB. Like all the years before, I still recommend saving the data in ASCII format, checking it carefully and then importing the data into MATLAB – despite the availability of functions such as xlsread in MATLAB.
Unfortunately we often get data in Microsoft Excel format, as has just happened, our MSCL results from Ethiopian sediment cores. These are files with 55,000 lines and 60-80 columns! In such cases, careful checking is required, first in the Excel spreadsheets, then in the ASCII files generated from them, and then you can run a few scripts in MATLAB to detect errors.
I do not use Excel. It is usually annoying me by a variety of automatic processes, e.g. in the creation of graphics, but also in the treatment of numbers. It constantly tries to guess what number format it is when I enter numbers. The worst example of this is converting a number to a date. The internet is full of people who have this problem and are upset about it. And of course there is a solution from Microsoft, but that’s a pretty ridiculous suggestions. Instead of really fixing the problem, users are requested to place spaces or apostrophes in before entering a number.
Having very large data sets some of the mistakes may be overlooked. A few years ago, Mark Ziemann and colleagues published a paper about gene names automatically converted to dates and floating-point numbers. The paper received much attention in the media, both scientific and popular. The first author also wrote an interesting blog post some time after the publication of the paper. According to their analysis approximately one-fifth of the papers with supplementary Excel gene lists published in the leading genomics journals contain erroneous gene name conversions.
I think that’s just one more reason not to use Excel and try using more advanced tools like MATLAB, Python or R instead – that’s my way how to stop Excel automatically changing numbers to dates.
Ziemann, M., Eren, Y., El-Osta, A., 2016, Gene name errors are widespread in the scientific literature, Genome Biology, 17, 177, DOI 10.1186/s13059-016-1044-7.