[16]:
import numpy as np
import glob
import os
from opac_mixer.read import ReadOpac
import scipy.constants as const
import h5py

%matplotlib inline

Tutorial: Add custom k-tables

In this tutorial we are going to explain what you need in order load in your custom k-table format. This is a very important step, if you want to be able to use the DeepSet mixing in your GCM/application.

The opac_mixer code uses an abstraction of the k-tables and as such you need to make sure that you read in the opacity in the correct format. When done correctly, you will not have any issues to perform the mixing correctly and emulate the process.

The basis of this abstraction is the parent class ReadOpac (from opac_mixer/read.py). You can draw inspiration from the ReadOpacChubb class, which inherits the ReadOpac class and is the abstraction of the pRT format (binned-down) k-tables from Katy Chubb on exomolOP.

A k-table grid requires a few things: pressure, temperature, wavelength/frequency/wavenumber and \(g\) values. We require all of these for every species as well! Furthermore, we need to ensure that these match for all species. Note: We need to load multiple opacity species, because we want to mix multiple species. When we use multiple species, we may however, encounter that we have different temperatures and different pressures at which the k-table is defined. We have methods to deal with that by interpolation (see below).

ExomolOP pRT k-tables

We will now explain how we have build the ReadOpacChubb (see opac_mixer/read.py) class by rebuilding it

[17]:
base = f'{os.environ["pRT_input_data_path"]}/opacities/lines/corr_k'  # directory, where I stored the opacity files
demofile = os.path.join(base, 'H2O_Exomol_R_S1/H2O_Exomol_R_S1.h5')

with h5py.File(demofile) as f:
    weights = np.array(f['weights'])
    tgrid = np.array(f['t'])
    pgrid = np.array(f['p'])
    bin_edges = np.array(f['bin_edges'])
    kcoeff = np.array(f['kcoeff'])

    print('Available data:', list(f.keys()))
    print('---------------')
    print(f'temperature points ({len(tgrid)}):', tgrid)
    print(f'pressure points ({len(pgrid)}):', pgrid)
    print('---------------')
    print(f'wavenumber - edges ({len(bin_edges)})', bin_edges)
    print(f'weights ({len(weights)})', weights)

    print('---------------')
    print('---------------')
    print(f'actual ktable - shape', kcoeff.shape)
Available data: ['DOI', 'Date_ID', 'bin_centers', 'bin_edges', 'kcoeff', 'key_iso_ll', 'method', 'mol_mass', 'mol_name', 'ngauss', 'p', 'samples', 't', 'temperature_grid_type', 'weights', 'wlrange', 'wnrange']
---------------
temperature points (27): [ 100.  200.  300.  400.  500.  600.  700.  800.  900. 1000. 1100. 1200.
 1300. 1400. 1500. 1600. 1700. 1800. 1900. 2000. 2200. 2400. 2600. 2800.
 3000. 3200. 3400.]
pressure points (22): [1.00000000e-05 2.15443469e-05 4.64158883e-05 1.00000000e-04
 2.15443469e-04 4.64158883e-04 1.00000000e-03 2.15443469e-03
 4.64158883e-03 1.00000000e-02 2.15443469e-02 4.64158883e-02
 1.00000000e-01 2.15443469e-01 4.64158883e-01 1.00000000e+00
 2.15443469e+00 4.64158883e+00 1.00000000e+01 2.15443469e+01
 4.64158883e+01 1.00000000e+02]
---------------
wavenumber - edges (12) [3.33333333e+01 5.00000000e+02 1.14942529e+03 2.27272727e+03
 2.85714286e+03 4.00000000e+03 4.95049505e+03 7.57575758e+03
 1.17647059e+04 1.63934426e+04 2.38095238e+04 3.84615385e+04]
weights (16) [0.04555284 0.10007147 0.14116799 0.1632077  0.1632077  0.14116799
 0.10007147 0.04555284 0.00506143 0.01111905 0.01568533 0.01813419
 0.01813419 0.01568533 0.01111905 0.00506143]
---------------
---------------
actual ktable - shape (22, 27, 11, 16)

\(g\) values and weights

We operate on weights instead of \(g\) values. It is, however, easy to switch between them using these two functions:

[18]:
def compute_ggrid(w, Ng):
    """Helper function that calculates the ggrid for given weights. Works on a halfinteger grid."""
    cum_sum = 0.0
    gcomp = np.empty(Ng)

    for i in range(Ng):
        gcomp[i] = cum_sum + 0.5 * w[i]
        cum_sum = cum_sum + w[i]

    return gcomp

def compute_weights(g, Ng):
    """Calculate weights from g values"""
    weights = np.empty(Ng)

    cum_sum = 0.0
    for i in range(Ng):
        weights[i] = 2.0*(g[i] - cum_sum)
        cum_sum = cum_sum + weights[i]

    return weights

# Verify that both functions are compatible
np.testing.assert_allclose(weights, compute_weights(compute_ggrid(weights,len(weights)),len(weights)))

Building a reader class

The reader class only needs to define a read in function and pass important metadata to the constructor of the parent class. That’s it.

The constructor (__init__) needs to call the parent constructor with the following arguments:

  1. ls (int): number of species that are read in

  2. lp (array(ls)): array that holds the number of pressure grid points for each species

  3. lt (array(ls)): array that holds the number of temperature grid points for each species

  4. lf (array(ls)): array that holds the number of frequency grid points for each species

  5. lg (array(ls)): array that holds the number of \(g\) grid points for each species

Note, that we require that ``lf[0]==lf[i]`` and ``lg[0]==lg[i]`` for all i in number of species

The read in function (read_opac) has to fill the following arrays:

  1. self.spec (array(ls): array holding the names of the opacity species

  2. self.T (array(ls, max(lt))): array holding the temperature in K at which the k-table grid is defined

  3. self.p (array(ls, max(lp))): array holding the pressure values in bar at which the k-table grid is defined

  4. self.bin_edges (array(ls, lf[0]+1)): array holding the wave number (\(1/\lambda\)) values in 1/cm of the edges of the wavenumber grid at which the k-table grid is defined

  5. self.bin_center (array(ls, lf[0])): array holding the wave number (\(1/\lambda\)) values in 1/cm of the center of the wavenumber grid at which the k-table grid is defined.

  6. self.weights (array(ls, lg[0])): array holding the weights (\(\Delta g\)) of the k-tables (see above for conversion from \(g\) values)

  7. self.kcoeff (array(ls, max(lp), max(lt), lf[0], lg[0]): array holding the actual values of the k-table grid in cm2/g.

Note, the data arrays are initialized with space up unto the maximum number of temperature and pressure grid points, hence the max(lt) and max(lp).

We can now build the opac reader (see also in opac_mixer/read.py):

[19]:
class ReadOpacChubb(ReadOpac):
    """A ktable grid reader for the ExomolOP-pRT k-table format"""
    def __init__(self, files) -> None:
        """
        Construct the chubb reader for the ExomolOP-pRT k-table format.

        Parameters
        ----------
        files (list):
            A list of filenames of the h5 files in which the k-tables are stored.
        """
        ls = len(files)             # Number of opacity species is the number of k-table grid files
        self._files = files         # This is custom to this reader, since we do the readin later

        # initialize the arrays that hold the dimensions
        # of pressure, temperature, frequency and g values for each species
        lp, lt, lf, lg = (
            np.empty(ls, dtype=int),
            np.empty(ls, dtype=int),
            np.empty(ls, dtype=int),
            np.empty(ls, dtype=int),
        )

        # read in this metadata for all species
        for i, file in enumerate(files):
            with h5py.File(file) as f:
                lp[i], lt[i], lf[i], lg[i] = f["kcoeff"].shape

        # call the parent constructor with the metadata
        super().__init__(ls, lp, lt, lf, lg)

    def read_opac(self):
        """Read in the kcoeff from h5 file."""
        # initialize some arrays
        bin_edges = np.empty((self.ls, self.lf[0] + 1), dtype=np.float64)
        weights = np.empty((self.ls, self.lg[0]), dtype=np.float64)

        # Iterate over all species and fill in the data
        for i, file in enumerate(self._files):
            with h5py.File(file) as f:
                bin_edges[i, :] = np.array(f["bin_edges"], dtype=np.float64)
                weights[i, :] = np.array(f["weights"], dtype=np.float64)

                # store species name
                self.spec[i] = f["mol_name"][0].decode("ascii")

                # store pressure and temperature of the opacity species
                self.T[i, : self.lt[i]] = np.array(f["t"], dtype=np.float64)
                self.p[i, : self.lp[i]] = np.array(f["p"], dtype=np.float64)

                # convert k-table grid from cm2/mol to cm2/g:
                conversion_factor = 1 / (
                    np.float64(f["mol_mass"][0]) * const.atomic_mass * 1000
                )
                kcoeff = np.array(f["kcoeff"], dtype=np.float64) * conversion_factor

                # store ktable grid
                self.kcoeff[i, : self.lp[i], : self.lt[i], :, :] = kcoeff

        # Do the check if the frequencies and g values are the same for all species
        assert np.all(bin_edges[1:, :] == bin_edges[:-1, :]), "frequency needs to match"
        assert np.all(weights[1:, :] == weights[:-1, :]), "g grid needs to match"

        # store the weights and frequency edges
        self.weights = weights[0, :]
        self.bin_edges = bin_edges[0, :]

        # This removes those frequencies from the grid that have no k-table data (kappa=0)
        self.remove_sparse_frequencies()  # this function also sets self.bin_center

        # Set the read_done switch to true, since we are done with reading in the ktables
        self.read_done = True

There is not more to take into account, since all other functionality is already defined in the parent class.

Reader class in action

Initialize the reader

[20]:
base = f'{os.environ["pRT_input_data_path"]}/opacities/lines/corr_k'
files_S1 = glob.glob(os.path.join(base,f'*_R_S1/*.h5'))
opac = ReadOpacChubb(files_S1)

Read in the files

[21]:
opac.read_opac()

Do the interpolation to a common pressure-temperature grid. Note, that this is done automatically in the emulator class as well.

[22]:
opac.setup_temp_and_pres()

Print the shape of the ktable grid

[23]:
opac.kcoeff.shape
[23]:
(14, 22, 27, 10, 16)