xncml Usage

xncml serves two purposes: modifying NcML files, and opening NcML files as an xarray.Dataset.

[1]:
import xncml
from pathlib import Path
from IPython.display import Code

Modify an NcML document

xncml can add or remove global and variable attributes, and remove variables and dimensions. It can also be used to create NcML files from scratch. This is all done using the xncml.Dataset class and its methods.

Create an Ncml Dataset object from a local NcML file

The xncml.Dataset class is instantiated by passing the NcML file location. Alternatively, the class can be created using its from_text classmethod.

[2]:
fn = Path(xncml.__file__).parent.parent / "tests" / "data" / "exercise1.ncml"

# Instantiate Dataset class from the file location. An alternative would have been to do
# nc = xncml.Dataset.from_text(fn.read_text())
nc = xncml.Dataset(fn)


# This is just to pretty print the XML
Code(repr(nc), language="XML")
[2]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="T" shape="time lat lon" type="double">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="C"></attribute>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
</netcdf>

Create an NcML Dataset modifying a netCDF file

Here we’re creating an empty NcML dataset from scratch, in which we can include modifying statements that will apply to an existing netCDF dataset identified by the location argument.

[3]:
new = xncml.Dataset(location="nc/example1.nc")
Code(repr(new), language="XML")
[3]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="nc/example1.nc"></netcdf>

Rename the variable T to Temp

[4]:
nc.rename_variable('T', 'Temp')
Code(repr(nc), language="XML")
[4]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="Temp" shape="time lat lon" type="double" orgName="T">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="C"></attribute>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
</netcdf>

Remove the variable Temp from the dataset

[5]:
nc.remove_variable('Temp')
Code(repr(nc), language="XML")
[5]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="Temp" shape="time lat lon" type="double" orgName="T">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="C"></attribute>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
    <remove name="Temp" type="variable"></remove>
</netcdf>

Remove the attribute units from the variable Temp

[6]:
nc.remove_variable_attribute(variable='Temp', key='units')
Code(repr(nc), language="XML")
[6]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="Temp" shape="time lat lon" type="double" orgName="T">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="C"></attribute>
            <remove name="units" type="attribute"></remove>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
    <remove name="Temp" type="variable"></remove>
</netcdf>

Remove the global title attribute from the dataset

[7]:
nc.remove_dataset_attribute('title')
Code(repr(nc), language="XML")
[7]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="Temp" shape="time lat lon" type="double" orgName="T">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="C"></attribute>
            <remove name="units" type="attribute"></remove>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
    <remove name="Temp" type="variable"></remove>
    <remove name="title" type="attribute"></remove>
</netcdf>

Add a global history attribute

[8]:
nc.add_dataset_attribute(key='Conventions', value='CF-2.0')
Code(repr(nc), language="XML")
[8]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <attribute name="Conventions" type="String" value="CF-2.0"></attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="Temp" shape="time lat lon" type="double" orgName="T">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="C"></attribute>
            <remove name="units" type="attribute"></remove>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
    <remove name="Temp" type="variable"></remove>
    <remove name="title" type="attribute"></remove>
</netcdf>

Rename a global attribute

[9]:
nc.rename_dataset_attribute(old_name="Source", new_name="source")
Code(repr(nc), language="XML")
[9]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <attribute name="Conventions" type="String" value="CF-2.0"></attribute>
    <attribute name="source">
            <orgName>Source</orgName>
    </attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="Temp" shape="time lat lon" type="double" orgName="T">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="C"></attribute>
            <remove name="units" type="attribute"></remove>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
    <remove name="Temp" type="variable"></remove>
    <remove name="title" type="attribute"></remove>
</netcdf>

Add a variable attribute

[10]:
nc.add_variable_attribute(variable='Temp', key='units', value='Kelvin')
nc.add_variable_attribute(variable='Temp', key='Fill_value', value=-999999999.)
Code(repr(nc), language="XML")
[10]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
    <dimension name="time" length="2" isUnlimited="true"></dimension>
    <dimension name="lat" length="3"></dimension>
    <dimension name="lon" length="4"></dimension>
    <attribute name="title" type="String" value="Example Data"></attribute>
    <attribute name="Conventions" type="String" value="CF-2.0"></attribute>
    <attribute name="source">
            <orgName>Source</orgName>
    </attribute>
    <variable name="rh" shape="time lat lon" type="int">
            <attribute name="long_name" type="String" value="relative humidity"></attribute>
            <attribute name="units" type="String" value="percent"></attribute>
    </variable>
    <variable name="Temp" shape="time lat lon" type="double" orgName="T">
            <attribute name="long_name" type="String" value="surface temperature"></attribute>
            <attribute name="units" type="String" value="Kelvin"></attribute>
            <attribute name="Fill_value" type="String" value="-999999999.0"></attribute>
            <remove name="units" type="attribute"></remove>
    </variable>
    <variable name="lat" shape="lat" type="float">
            <attribute name="units" type="String" value="degrees_north"></attribute>
            <values>41.0 40.0 39.0</values>
    </variable>
    <variable name="lon" shape="lon" type="float">
            <attribute name="units" type="String" value="degrees_east"></attribute>
            <values>-109.0 -107.0 -105.0 -103.0</values>
    </variable>
    <variable name="time" shape="time" type="int">
            <attribute name="units" type="String" value="hours"></attribute>
            <values>6 18</values>
    </variable>
    <remove name="Temp" type="variable"></remove>
    <remove name="title" type="attribute"></remove>
</netcdf>

Write Dataset back to an ncml file

[11]:
import tempfile
tmp_fn = Path(tempfile.mkdtemp()) / "exercise1_modified.ncml"
nc.to_ncml(tmp_fn)

Export metadata to a dictionary

Dataset has a to_cf_dict method that returns a dictionary following the CF-JSON specifications. The output may not always be fully compliant with the CF-JSON specification because NcML files used to create virtual datasets do not always include all information that CF-JSON expects.

[12]:
nc.to_cf_dict()
[12]:
OrderedDict([('@location', 'nc/example1.nc'),
             ('@xmlns',
              {'': 'http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2'}),
             ('dimensions',
              OrderedDict([('time', 2), ('lat', 3), ('lon', 4)])),
             ('attributes',
              OrderedDict([('title', 'Example Data'),
                           ('Conventions', 'CF-2.0'),
                           ('source', None)])),
             ('variables',
              OrderedDict([('lat',
                            OrderedDict([('shape', ['lat']),
                                         ('type', 'float'),
                                         ('attributes',
                                          OrderedDict([('units',
                                                        'degrees_north')])),
                                         ('data', [41.0, 40.0, 39.0])])),
                           ('lon',
                            OrderedDict([('shape', ['lon']),
                                         ('type', 'float'),
                                         ('attributes',
                                          OrderedDict([('units',
                                                        'degrees_east')])),
                                         ('data',
                                          [-109.0, -107.0, -105.0, -103.0])])),
                           ('time',
                            OrderedDict([('shape', ['time']),
                                         ('type', 'int'),
                                         ('attributes',
                                          OrderedDict([('units', 'hours')])),
                                         ('data', [6, 18])])),
                           ('rh',
                            OrderedDict([('shape', ['time', 'lat', 'lon']),
                                         ('type', 'int'),
                                         ('attributes',
                                          OrderedDict([('long_name',
                                                        'relative humidity'),
                                                       ('units',
                                                        'percent')]))])),
                           ('Temp',
                            OrderedDict([('shape', ['time', 'lat', 'lon']),
                                         ('type', 'double'),
                                         ('attributes',
                                          OrderedDict([('long_name',
                                                        'surface temperature'),
                                                       ('units', 'Kelvin'),
                                                       ('Fill_value',
                                                        -999999999.0)]))]))]))])

Open an NcML document as an xarray.Dataset

xncml can parse NcML instructions to create an xarray.Dataset. Calling the close method on the returned dataset will close all underlying netCDF files referred to by the NcML document. Note that a few NcML instructions are not yet supported.

[13]:
xncml.open_ncml(fn)
[13]:
<xarray.Dataset>
Dimensions:  (time: 2, lat: 3, lon: 4)
Coordinates:
  * lat      (lat) float32 41.0 40.0 39.0
  * lon      (lon) float32 -109.0 -107.0 -105.0 -103.0
  * time     (time) int32 6 18
Data variables:
    rh       (time, lat, lon) int32 1 2 3 4 5 6 7 8 ... 25 26 27 28 29 30 31 32
    T        (time, lat, lon) float64 1.0 2.0 3.0 4.0 2.0 ... 7.5 15.0 22.5 30.0
Attributes:
    title:    Example Data