Using DATASTORE to Work with Large Collections of Data

The function datastore can be used to handle large collections of data. As an example, the modern multisensor core loggers for the automatic determination of physical and chemical properties of drill cores generates huge amounts of data within a very short time. An array with 80 variables, each with 50,000 measured values, is not unusual. In addition, the corresponding files contain mixtures of character strings for core section names, measurement dates and times, and comments from the laboratory technician who carried out the measurements, along with the actual numeric readings.

As an example we import the content of the file geoxrf.txt into the MATLAB workspace. The file looks like this in the editor:
Location  Core  Depth     K    Si
ChewBahir   2A   0.30   975  1576
ChewBahir   2A   0.60  2785  9706
ChewBahir   2A   0.90  5469  9572
ChewBahir   2A   0.12  9575  4854
ChewBahir   2A   0.15  9649  8003
As we see the first line contains the header of the data. The first and the second column contains the location name and the name of the drill core. The third column contains the depth in the core. The fourth and fifth column contain numerical values representing X-ray fluorescence counts for potassium and silicon. The columns are separated by one or more spaces. We use the function datastore to create the datastore data without actually importing the data.
clear
data = datastore('geoxrf.txt');
Typing
data
yields a list of properties of data. We can read all of the data from the datastore using
dataxrf = readall(data)
which yields
dataxrf =

  5×5 table

     Location      Core    Depth      K         Si  
    ___________    ____    _____    ______    ______

    'ChewBahir'    '2A'     0.3     0.0975    0.1576
    'ChewBahir'    '2A'     0.6     0.2785    0.9706
    'ChewBahir'    '2A'     0.9     0.5469    0.9572
    'ChewBahir'    '2A'    0.12     0.9575    0.4854
    'ChewBahir'    '2A'    0.15     0.9649    0.8003
If the file is too large to be loaded into memory, we can import only part of it. First we define the variables Depth and K we want to load and then we use readall again by typing
data.SelectedVariableNames = {'Depth','K'}
dataxrf = readall(data)
which yields
dataxrf =

  5×2 table

    Depth      K
    _____    ______

     0.3     0.0975
     0.6     0.2785
     0.9     0.5469
    0.12     0.9575
    0.15     0.9649
Now we can use the properties of tables and their associated functions to convert the data into simple double arrays, e.g. by typing
depth = dataxrf.Depth
potassium = dataxrf.K
which yields
depth =

    0.3000
    0.6000
    0.9000
    0.1200
    0.1500
and
potassium =

    0.0975
    0.2785
    0.5469
    0.9575
    0.9649
Alternatively, we can use the function table2array to convert the table to a double precision array by typing
dataxrfarray = table2array(dataxrf)
which yields
dataxrfarray =

    0.3000    0.0975
    0.6000    0.2785
    0.9000    0.5469
    0.1200    0.9575
    0.1500    0.9649