xncml package

Tools for manipulating NcML (NetCDF Markup Language) files with/for xarray

Subpackages

Submodules

xncml.core module

Core features of xncml.

This module exposes the Dataset class which is used to manipulate ncml files.

class xncml.core.AggregationType(value)[source]

Bases: Enum

Type of aggregation.

FORECAST_MODEL_RUN_COLLECTION = 'forecastModelRunCollection'
FORECAST_MODEL_RUN_SINGLE_COLLECTION = 'forecastModelRunSingleCollection'
JOIN_EXISTING = 'joinExisting'
JOIN_NEW = 'joinNew'
TILED = 'tiled'
UNION = 'union'
class xncml.core.Dataset(filepath: str = None, location: str = None)[source]

Bases: object

A class for reading and manipulating NcML file.

Note that NcML documents are used for two distinct purposes:
  • an XML description of NetCDF structure and metadata;

  • create virtual NetCDF datasets, e.g. an aggregation of multiple files.

This class supports both types of uses.

add_aggregation(dim_name: str, type_: str, recheck_every: str = None, time_units_change: bool = None)[source]

Add aggregation.

Parameters:
  • dim_name (str) – Dimension name.

  • type_ (str) – Aggregation type.

  • recheck_every (str) – Time interval for rechecking the aggregation. Only used if type_ is AggregationType.scan.

  • time_units_change (bool) – Whether the time units change. Only used if type_ is AggregationType.scan.

add_dataset_attribute(key, value, type_='String')[source]

Add dataset attribute

Parameters:
  • key (str) – Attribute name.

  • value (object) – Attribute value. Must be a serializable Python Object.

  • type_ (str, default: 'String') – String describing attribute type.

add_scan(dim_name: str, location: str, reg_exp: str = None, suffix: str = None, subdirs: bool = True, older_than: str = None, date_format_mark: str = None, enhance: bool = None)[source]

Add scan element.

Parameters:
  • dim_name (str) – Dimension name.

  • location (str) – Location of the files to scan.

  • reg_exp (str) – Regular expression to match the full pathname of files.

  • suffix (str) – File suffix.

  • subdirs (bool) – Whether to scan subdirectories.

  • older_than (str) – Older than time interval.

  • date_format_mark (str) – Date format mark.

  • enhance (bool) – Whether to enhance the scan.

add_variable_agg(dim_name: str, name: str)[source]

Add variable aggregation.

Parameters:
  • dim_name (str) – Dimension name for the aggregation.

  • name (str) – Variable name.

add_variable_attribute(variable, key, value, type_='String')[source]

Add variable attribute.

Parameters:
  • variable (str) – Variable name

  • key (str) – Attribute name

  • value (object) – Attribute value. Must be a serializable Python Object

  • type_ (str, default: 'String') – String describing attribute type.

classmethod from_text(xml: str)[source]

Create Dataset from xml string.

remove_dataset_attribute(key)[source]

Remove dataset attribute.

Parameters:

key (str) – Name of the attribute to remove.

remove_variable(variable)[source]

Remove dataset variable.

Parameters:

key (str) – Name of the variable to remove.

remove_variable_attribute(variable, key)[source]

Remove variable attribute

rename_dataset_attribute(old_name, new_name)[source]

Rename dataset attribute.

Parameters:
  • old_name (str) – Original attribute name.

  • new_name (str) – New attribute name.

rename_dimension(dimension, new_name)[source]

Rename dimension.

Parameters:
  • dimension (str) – Original dimension name.

  • new_name (str) – New dimension name.

rename_variable(variable, new_name)[source]

Rename variable attribute

Parameters:
  • variable (str) – Original variable name.

  • new_name (str) – New variable name.

rename_variable_attribute(variable, old_name, new_name)[source]

Rename variable attribute.

Parameters:
  • variable (str) – Variable name.

  • old_name (str) – Original attribute name.

  • new_name (str) – New attribute name.

to_cf_dict()[source]

Convert internal representation to a CF-JSON dictionary.

The CF-JSON specification includes data for variables, but if the data is not within the NcML, it cannot be included in the JSON representation.

Returns:

  • Dictionary with dimensions and variables keys. May also optionally include an attributes key and a

  • groups key. Additional keys prefixed with @ may be included for <netcdf> tag attributes,

  • for example @location.

References

http://cf-json.org/specification

to_ncml(path=None)[source]

Write NcML file to disk.

Parameters:

path (str) – Path to write NcML document.

xncml.core.preparse(obj: dict) dict[source]
  • Remove None values from dictionary.

  • Convert booleans to strings.

xncml.parser module

# NcML parser for xarray

The open_ncml function parse an XML document compliant with the NcML-2.2 schema and returns an xarray Dataset.

The XML is parsed into a Python objects using xsdata. The parser converts XML elements into classes instances defined in an autogenerated data model (generated.ncml_2_2). This datamodel was created using:

`bash xsdata generate -ds NumPy --compound-fields -mll=119 --postponed-annotations  schemas/ncml-2.2.xsd `

The code below converts the NcML instructions into xarray instructions. Not all NcML instructions are currently supported.

## TODO

Support for these elements is missing: - <cacheVariable> - <logicalReduce> - <logicalSection> - <logicalSlice> - <promoteGlobalAttribute> - <scanFmrc>

Support for these attributes is missing: - dateFormatMark - olderThan - tiled aggregations

xncml.parser.build_scalar_variable(var_name: str, values_tag: Values, var_type: str) Variable[source]

Build an xr.Variable for scalar variables.

Parameters:
  • var_name (str) – The variable name.

  • values_tag (Values instance) – <values> object description

  • var_type (str) – The variable expected type.

Returns:

A xr.Variable filled with values from <values> element.

Return type:

xr.Variable

Raises:

ValueError – If the <values> tag is not a valid scalar.

xncml.parser.cast(obj: Attribute) tuple | str[source]

Cast attribute value to the appropriate type.

xncml.parser.filter_by_class(iterable, klass)[source]

Return generator filtering on class.

xncml.parser.nctype(typ: DataType) type[source]

Return Python type corresponding to the NcML DataType of object.

xncml.parser.open_ncml(ncml: str | Path, group: str = '/') Dataset[source]

Convert NcML document to a dataset.

Parameters:
  • ncml (str | Path) – Path to NcML file.

  • group (str) – Path of the group to parse within the ncml. The special value * opens every group and flattens the variables into a single dataset, renaming variables and dimensions if conflicting names are found.

Returns:

Dataset holding variables and attributes defined in NcML document.

Return type:

xr.Dataset

xncml.parser.parse(path: Path) Netcdf[source]

Parse NcML file using NetCDF datamodel based on NcML-2.2 Schema.

Parameters:

path (Path) – Path to NcML file.

Returns:

Object description of NcML content.

Return type:

Netcdf instance.

xncml.parser.read_aggregation(target: Dataset, obj: Aggregation, ncml: Path) Dataset[source]

Return merged or concatenated content of <aggregation> element.

Parameters:
  • target (xr.Dataset) – Target dataset to be updated with <netcdf>’s content.

  • obj (Aggregation) – <aggregation> object description.

  • ncml (Path) – Path to NcML document, sometimes required to follow relative links.

Returns:

Dataset holding variables and attributes defined in <aggregation> element.

Return type:

xr.Dataset

xncml.parser.read_attribute(target: Dataset | Variable, obj: Attribute, ref: Dataset = None)[source]

Update target dataset in place with new or modified attribute.

Parameters:
  • target (xr.Dataset | xr.Variable) – Target dataset to be updated.

  • obj (Attribute instance) – <attribute> object description.

  • ref (xr.Dataset) – Reference dataset.

xncml.parser.read_coord_value(nc: Netcdf, agg: Aggregation, dtypes: list = ())[source]

Read coordValue attribute of <netcdf> element.

Parameters:
  • nc (Netcdf instance) – <netcdf> object description.

  • agg (Aggregation instance) – <aggregation> object description

  • dtypes (tuple) – List of preferred type for coordinate value.

Returns:

Coordinate value cast to proper type.

Return type:

str, np.array, scalar

Notes

The casting logic is most likely not up to spec.

xncml.parser.read_dimension(obj: Dimension) Dimension[source]

Return dimension object with its length cast to an integer.

xncml.parser.read_ds(obj: Netcdf, ncml: Path) Dataset[source]

Return dataset defined in <netcdf> element.

Parameters:
  • obj (Netcdf) – <netcdf> object description.

  • ncml (Path) – Path to NcML document, sometimes required to follow relative links.

Returns:

Dataset defined at <netcdf>’ location attribute.

Return type:

xr.Dataset

xncml.parser.read_enum(obj: EnumTypedef) dict[str, list][source]

Parse <enumTypeDef> element.

Parameters:

obj (EnumTypeDef) – <enumTypeDef> object.

Returns:

A dictionary describing the Enum.

Return type:

dict

Examples

xncml.parser.read_group(target: Dataset, ref: Dataset | None, obj: Group | Netcdf, groups_to_read: list[str], parent_group_path: str = '/', dims: dict = None, enums: dict = None) Dataset[source]

Parse <group> items, typically <dimension>, <variable>, <attribute> and <remove> elements.

Parameters:
  • target (xr.Dataset) – Target dataset to be updated.

  • ref (xr.Dataset | None) – Reference dataset used to copy content into target.

  • obj (Group | Netcdf) – <netcdf> object description.

  • groups_to_read (list[str]) – List of groups that must be read and included in target.

  • parent_group_path (str) – Path of parent group, by default the root group ‘/’.

  • dims (dict[str, Dimension]) – Dictionary of the dimensions of this dataset.

Returns:

Dataset holding variables and attributes defined in <netcdf> element.

Return type:

xr.Dataset

xncml.parser.read_netcdf(target: Dataset, ref: Dataset, obj: Netcdf, ncml: Path, group: str) Dataset[source]

Return content of <netcdf> element.

Parameters:
  • target (xr.Dataset) – Target dataset to be updated with <netcdf>’s content.

  • ref (xr.Dataset) – Reference dataset used to copy content into target.

  • obj (Netcdf) – <netcdf> object description.

  • ncml (Path) – Path to NcML document, sometimes required to follow relative links.

  • group (str) – Path of the group to parse within the ncml. The special value * opens every group and flattens the variables into a single dataset.

Returns:

Dataset holding variables and attributes defined in <netcdf> element.

Return type:

xr.Dataset

xncml.parser.read_remove(target: Dataset | Variable, obj: Remove) Dataset[source]

Remove item from dataset.

Parameters:
  • target (xr.Dataset | xr.Variable) – Target dataset or variable to be updated.

  • obj (Remove instance) – <remove> object description.

Returns:

Dataset with attribute, variable or dimension removed, or variable with attribute removed.

Return type:

xr.Dataset or xr.Variable

xncml.parser.read_scan(obj: Scan, ncml: Path) list[Dataset][source]

Return list of datasets defined in <scan> element.

Parameters:
  • obj (Aggregation.Scan instance) – <scan> object description.

  • ncml (Path) – Path to NcML document, sometimes required to follow relative links.

Returns:

List of datasets found by scan.

Return type:

list

xncml.parser.read_values(var_name: str, expected_size: int, values_tag: Values) list[source]

Read values for <variable> element.

Parameters:
  • var_name (str) – The variable name.

  • size (int) – The variable expected size.

  • values_tag (Values instance) – <values> object description

Returns:

A list filled with values from <values> element.

Return type:

list

xncml.parser.read_variable(target: Dataset, ref: Dataset, obj: Variable, dimensions: dict, enums: dict[str, dict[str, int]], group_path: str) Dataset[source]

Parse <variable> element.

Parameters:
  • target (xr.Dataset) – Target dataset to be updated.

  • ref (xr.Dataset) – Reference dataset used to copy content into target.

  • obj (Variable) – <variable> object description.

  • dimensions (dict) – Dimension attributes keyed by name.

  • enums (dict[str, dict]) – The enums types that have been read in the parent groups.

  • group_path (str) – Path to the parent group.

Returns:

Dataset holding variable defined in <variable> element.

Return type:

xr.Dataset

xncml.parser.rename_dimension(target: Dataset, ref: Dataset, obj: Dimension) Dataset[source]

Rename dimension in target dataset.