xncml Usage
xncml serves two purposes: modifying NcML files, and opening NcML files as an xarray.Dataset
.
[1]:
import xncml
from pathlib import Path
from IPython.display import Code
Modify an NcML document
xncml
can add or remove global and variable attributes, and remove variables and dimensions. It can also be used to create NcML files from scratch. This is all done using the xncml.Dataset
class and its methods.
Create an Ncml Dataset object from a local NcML file
The xncml.Dataset
class is instantiated by passing the NcML file location. Alternatively, the class can be created using its from_text
classmethod.
[2]:
fn = Path(xncml.__file__).parent.parent / "tests" / "data" / "exercise1.ncml"
# Instantiate Dataset class from the file location. An alternative would have been to do
# nc = xncml.Dataset.from_text(fn.read_text())
nc = xncml.Dataset(fn)
# This is just to pretty print the XML
Code(repr(nc), language="XML")
[2]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="T" shape="time lat lon" type="double">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="C"></attribute>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
</netcdf>
Create an NcML Dataset modifying a netCDF file
Here we’re creating an empty NcML dataset from scratch, in which we can include modifying statements that will apply to an existing netCDF dataset identified by the location
argument.
[3]:
new = xncml.Dataset(location="nc/example1.nc")
Code(repr(new), language="XML")
[3]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="nc/example1.nc"></netcdf>
Rename the variable T
to Temp
[4]:
nc.rename_variable('T', 'Temp')
Code(repr(nc), language="XML")
[4]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="Temp" shape="time lat lon" type="double" orgName="T">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="C"></attribute>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
</netcdf>
Remove the variable Temp
from the dataset
[5]:
nc.remove_variable('Temp')
Code(repr(nc), language="XML")
[5]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="Temp" shape="time lat lon" type="double" orgName="T">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="C"></attribute>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
<remove name="Temp" type="variable"></remove>
</netcdf>
Remove the attribute units
from the variable Temp
[6]:
nc.remove_variable_attribute(variable='Temp', key='units')
Code(repr(nc), language="XML")
[6]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="Temp" shape="time lat lon" type="double" orgName="T">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="C"></attribute>
<remove name="units" type="attribute"></remove>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
<remove name="Temp" type="variable"></remove>
</netcdf>
Remove the global title
attribute from the dataset
[7]:
nc.remove_dataset_attribute('title')
Code(repr(nc), language="XML")
[7]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="Temp" shape="time lat lon" type="double" orgName="T">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="C"></attribute>
<remove name="units" type="attribute"></remove>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
<remove name="Temp" type="variable"></remove>
<remove name="title" type="attribute"></remove>
</netcdf>
Add a global history
attribute
[8]:
nc.add_dataset_attribute(key='Conventions', value='CF-2.0')
Code(repr(nc), language="XML")
[8]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<attribute name="Conventions" type="String" value="CF-2.0"></attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="Temp" shape="time lat lon" type="double" orgName="T">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="C"></attribute>
<remove name="units" type="attribute"></remove>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
<remove name="Temp" type="variable"></remove>
<remove name="title" type="attribute"></remove>
</netcdf>
Rename a global attribute
[9]:
nc.rename_dataset_attribute(old_name="Source", new_name="source")
Code(repr(nc), language="XML")
[9]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<attribute name="Conventions" type="String" value="CF-2.0"></attribute>
<attribute name="source">
<orgName>Source</orgName>
</attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="Temp" shape="time lat lon" type="double" orgName="T">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="C"></attribute>
<remove name="units" type="attribute"></remove>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
<remove name="Temp" type="variable"></remove>
<remove name="title" type="attribute"></remove>
</netcdf>
Add a variable attribute
[10]:
nc.add_variable_attribute(variable='Temp', key='units', value='Kelvin')
nc.add_variable_attribute(variable='Temp', key='Fill_value', value=-999999999.)
Code(repr(nc), language="XML")
[10]:
<?xml version="1.0" encoding="utf-8"?>
<netcdf location="nc/example1.nc" xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2">
<dimension name="time" length="2" isUnlimited="true"></dimension>
<dimension name="lat" length="3"></dimension>
<dimension name="lon" length="4"></dimension>
<attribute name="title" type="String" value="Example Data"></attribute>
<attribute name="Conventions" type="String" value="CF-2.0"></attribute>
<attribute name="source">
<orgName>Source</orgName>
</attribute>
<variable name="rh" shape="time lat lon" type="int">
<attribute name="long_name" type="String" value="relative humidity"></attribute>
<attribute name="units" type="String" value="percent"></attribute>
</variable>
<variable name="Temp" shape="time lat lon" type="double" orgName="T">
<attribute name="long_name" type="String" value="surface temperature"></attribute>
<attribute name="units" type="String" value="Kelvin"></attribute>
<attribute name="Fill_value" type="String" value="-999999999.0"></attribute>
<remove name="units" type="attribute"></remove>
</variable>
<variable name="lat" shape="lat" type="float">
<attribute name="units" type="String" value="degrees_north"></attribute>
<values>41.0 40.0 39.0</values>
</variable>
<variable name="lon" shape="lon" type="float">
<attribute name="units" type="String" value="degrees_east"></attribute>
<values>-109.0 -107.0 -105.0 -103.0</values>
</variable>
<variable name="time" shape="time" type="int">
<attribute name="units" type="String" value="hours"></attribute>
<values>6 18</values>
</variable>
<remove name="Temp" type="variable"></remove>
<remove name="title" type="attribute"></remove>
</netcdf>
Write Dataset back to an ncml file
[11]:
import tempfile
tmp_fn = Path(tempfile.mkdtemp()) / "exercise1_modified.ncml"
nc.to_ncml(tmp_fn)
Export metadata to a dictionary
Dataset
has a to_cf_dict
method that returns a dictionary following the CF-JSON specifications. The output may not always be fully compliant with the CF-JSON specification because NcML files used to create virtual datasets do not always include all information that CF-JSON expects.
[12]:
nc.to_cf_dict()
[12]:
OrderedDict([('@location', 'nc/example1.nc'),
('@xmlns',
{'': 'http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2'}),
('dimensions',
OrderedDict([('time', 2), ('lat', 3), ('lon', 4)])),
('attributes',
OrderedDict([('title', 'Example Data'),
('Conventions', 'CF-2.0'),
('source', None)])),
('variables',
OrderedDict([('lat',
OrderedDict([('shape', ['lat']),
('type', 'float'),
('attributes',
OrderedDict([('units',
'degrees_north')])),
('data', [41.0, 40.0, 39.0])])),
('lon',
OrderedDict([('shape', ['lon']),
('type', 'float'),
('attributes',
OrderedDict([('units',
'degrees_east')])),
('data',
[-109.0, -107.0, -105.0, -103.0])])),
('time',
OrderedDict([('shape', ['time']),
('type', 'int'),
('attributes',
OrderedDict([('units', 'hours')])),
('data', [6, 18])])),
('rh',
OrderedDict([('shape', ['time', 'lat', 'lon']),
('type', 'int'),
('attributes',
OrderedDict([('long_name',
'relative humidity'),
('units',
'percent')]))])),
('Temp',
OrderedDict([('shape', ['time', 'lat', 'lon']),
('type', 'double'),
('attributes',
OrderedDict([('long_name',
'surface temperature'),
('units', 'Kelvin'),
('Fill_value',
-999999999.0)]))]))]))])
Open an NcML document as an xarray.Dataset
xncml
can parse NcML instructions to create an xarray.Dataset
. Calling the close
method on the returned dataset will close all underlying netCDF files referred to by the NcML document. Note that a few NcML instructions are not yet supported.
[13]:
xncml.open_ncml(fn)
[13]:
<xarray.Dataset> Dimensions: (time: 2, lat: 3, lon: 4) Coordinates: * lat (lat) float32 41.0 40.0 39.0 * lon (lon) float32 -109.0 -107.0 -105.0 -103.0 * time (time) int32 6 18 Data variables: rh (time, lat, lon) int32 1 2 3 4 5 6 7 8 ... 25 26 27 28 29 30 31 32 T (time, lat, lon) float64 1.0 2.0 3.0 4.0 2.0 ... 7.5 15.0 22.5 30.0 Attributes: title: Example Data