xncml package

Tools for manipulating NcML (NetCDF Markup Language) files with/for xarray

Subpackages

xncml.generated package

Submodules

xncml.core module

Core features of xncml.

This module exposes the Dataset class which is used to manipulate ncml files.

class xncml.core.AggregationType(value)[source]

Bases: Enum

Type of aggregation.

FORECAST_MODEL_RUN_COLLECTION = 'forecastModelRunCollection'

FORECAST_MODEL_RUN_SINGLE_COLLECTION = 'forecastModelRunSingleCollection'

JOIN_EXISTING = 'joinExisting'

JOIN_NEW = 'joinNew'

TILED = 'tiled'

UNION = 'union'

class xncml.core.Dataset(filepath: str = None, location: str = None)[source]

Bases: object

A class for reading and manipulating NcML file.

Note that NcML documents are used for two distinct purposes:

an XML description of NetCDF structure and metadata;
create virtual NetCDF datasets, e.g. an aggregation of multiple files.

This class supports both types of uses.

add_aggregation(dim_name: str, type_: str, recheck_every: str = None, time_units_change: bool = None)[source]

Add aggregation.

Parameters:

dim_name (str) – Dimension name.
type_ (str) – Aggregation type.
recheck_every (str) – Time interval for rechecking the aggregation. Only used if type_ is AggregationType.scan.
time_units_change (bool) – Whether the time units change. Only used if type_ is AggregationType.scan.

add_dataset_attribute(key, value, type_='String')[source]

Add dataset attribute

Parameters:

key (str) – Attribute name.
value (object) – Attribute value. Must be a serializable Python Object.
type_ (str, default: 'String') – String describing attribute type.

add_scan(dim_name: str, location: str, reg_exp: str = None, suffix: str = None, subdirs: bool = True, older_than: str = None, date_format_mark: str = None, enhance: bool = None)[source]

Add scan element.

Parameters:

dim_name (str) – Dimension name.
location (str) – Location of the files to scan.
reg_exp (str) – Regular expression to match the full pathname of files.
suffix (str) – File suffix.
subdirs (bool) – Whether to scan subdirectories.
older_than (str) – Older than time interval.
date_format_mark (str) – Date format mark.
enhance (bool) – Whether to enhance the scan.

add_variable_agg(dim_name: str, name: str)[source]

Add variable aggregation.

Parameters:

dim_name (str) – Dimension name for the aggregation.
name (str) – Variable name.

add_variable_attribute(variable, key, value, type_='String')[source]

Add variable attribute.

Parameters:

variable (str) – Variable name
key (str) – Attribute name
value (object) – Attribute value. Must be a serializable Python Object
type_ (str, default: 'String') – String describing attribute type.

classmethod from_text(xml: str)[source]: Create Dataset from xml string.

remove_dataset_attribute(key)[source]

Remove dataset attribute.

Parameters:: key (str) – Name of the attribute to remove.

remove_variable(variable)[source]

Remove dataset variable.

Parameters:: key (str) – Name of the variable to remove.

remove_variable_attribute(variable, key)[source]: Remove variable attribute

rename_dataset_attribute(old_name, new_name)[source]

Rename dataset attribute.

Parameters:

old_name (str) – Original attribute name.
new_name (str) – New attribute name.

rename_dimension(dimension, new_name)[source]

Rename dimension.

Parameters:

dimension (str) – Original dimension name.
new_name (str) – New dimension name.

rename_variable(variable, new_name)[source]

Rename variable attribute

Parameters:

variable (str) – Original variable name.
new_name (str) – New variable name.

rename_variable_attribute(variable, old_name, new_name)[source]

Rename variable attribute.

Parameters:

variable (str) – Variable name.
old_name (str) – Original attribute name.
new_name (str) – New attribute name.

to_cf_dict()[source]

Convert internal representation to a CF-JSON dictionary.

The CF-JSON specification includes data for variables, but if the data is not within the NcML, it cannot be included in the JSON representation.

Returns:

Dictionary with dimensions and variables keys. May also optionally include an attributes key and a
groups key. Additional keys prefixed with @ may be included for <netcdf> tag attributes,
for example @location.

References

http://cf-json.org/specification

to_ncml(path=None)[source]

Write NcML file to disk.

Parameters:: path (str) – Path to write NcML document.

xncml.core.preparse(obj: dict) → dict[source]

Remove None values from dictionary.
Convert booleans to strings.

xncml.parser module

# NcML parser for xarray

The open_ncml function parse an XML document compliant with the NcML-2.2 schema and returns an xarray Dataset.

The XML is parsed into a Python objects using xsdata. The parser converts XML elements into classes instances defined in an autogenerated data model (generated.ncml_2_2). This datamodel was created using:

`bash xsdata generate -ds NumPy --compound-fields -mll=119 --postponed-annotations schemas/ncml-2.2.xsd `

The code below converts the NcML instructions into xarray instructions. Not all NcML instructions are currently supported.

## TODO

Support for these elements is missing: - <cacheVariable> - <logicalReduce> - <logicalSection> - <logicalSlice> - <promoteGlobalAttribute> - <scanFmrc>

Support for these attributes is missing: - dateFormatMark - olderThan - tiled aggregations

xncml.parser.build_scalar_variable(var_name: str, values_tag: Values, var_type: str) → Variable[source]

Build an xr.Variable for scalar variables.

Parameters:

var_name (str) – The variable name.
values_tag (Values instance) – <values> object description
var_type (str) – The variable expected type.

Returns:

A xr.Variable filled with values from <values> element.

Return type:

xr.Variable

Raises:

ValueError – If the <values> tag is not a valid scalar.

xncml.parser.cast(obj: Attribute) → tuple | str[source]: Cast attribute value to the appropriate type.

xncml.parser.filter_by_class(iterable, klass)[source]: Return generator filtering on class.

xncml.parser.nctype(typ: DataType) → type[source]: Return Python type corresponding to the NcML DataType of object.

xncml.parser.open_ncml(ncml: str | Path, group: str = '/') → Dataset[source]

Convert NcML document to a dataset.

Parameters:

ncml (str | Path) – Path to NcML file.
group (str) – Path of the group to parse within the ncml. The special value * opens every group and flattens the variables into a single dataset, renaming variables and dimensions if conflicting names are found.

Returns:

Dataset holding variables and attributes defined in NcML document.

Return type:

xr.Dataset

xncml.parser.parse(path: Path) → Netcdf[source]

Parse NcML file using NetCDF datamodel based on NcML-2.2 Schema.

Parameters:: path (Path) – Path to NcML file.
Returns:: Object description of NcML content.
Return type:: Netcdf instance.

xncml.parser.read_aggregation(target: Dataset, obj: Aggregation, ncml: Path) → Dataset[source]

Return merged or concatenated content of <aggregation> element.

Parameters:

target (xr.Dataset) – Target dataset to be updated with <netcdf>’s content.
obj (Aggregation) – <aggregation> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.

Returns:

Dataset holding variables and attributes defined in <aggregation> element.

Return type:

xr.Dataset

xncml.parser.read_attribute(target: Dataset | Variable, obj: Attribute, ref: Dataset = None)[source]

Update target dataset in place with new or modified attribute.

Parameters:

target (xr.Dataset | xr.Variable) – Target dataset to be updated.
obj (Attribute instance) – <attribute> object description.
ref (xr.Dataset) – Reference dataset.

xncml.parser.read_coord_value(nc: Netcdf, agg: Aggregation, dtypes: list = ())[source]

Read coordValue attribute of <netcdf> element.

Parameters:

nc (Netcdf instance) – <netcdf> object description.
agg (Aggregation instance) – <aggregation> object description
dtypes (tuple) – List of preferred type for coordinate value.

Returns:

Coordinate value cast to proper type.

Return type:

str, np.array, scalar

Notes

The casting logic is most likely not up to spec.

xncml.parser.read_dimension(obj: Dimension) → Dimension[source]: Return dimension object with its length cast to an integer.

xncml.parser.read_ds(obj: Netcdf, ncml: Path) → Dataset[source]

Return dataset defined in <netcdf> element.

Parameters:

obj (Netcdf) – <netcdf> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.

Returns:

Dataset defined at <netcdf>’ location attribute.

Return type:

xr.Dataset

xncml.parser.read_enum(obj: EnumTypedef) → dict[str, list][source]

Parse <enumTypeDef> element.

Parameters:: obj (EnumTypeDef) – <enumTypeDef> object.
Returns:: A dictionary describing the Enum.
Return type:: dict

Examples

xncml.parser.read_group(target: Dataset, ref: Dataset | None, obj: Group | Netcdf, groups_to_read: list[str], parent_group_path: str = '/', dims: dict = None, enums: dict = None) → Dataset[source]

Parse <group> items, typically <dimension>, <variable>, <attribute> and <remove> elements.

Parameters:

target (xr.Dataset) – Target dataset to be updated.
ref (xr.Dataset | None) – Reference dataset used to copy content into target.
obj (Group | Netcdf) – <netcdf> object description.
groups_to_read (list[str]) – List of groups that must be read and included in target.
parent_group_path (str) – Path of parent group, by default the root group ‘/’.
dims (dict[str, Dimension]) – Dictionary of the dimensions of this dataset.

Returns:

Dataset holding variables and attributes defined in <netcdf> element.

Return type:

xr.Dataset

xncml.parser.read_netcdf(target: Dataset, ref: Dataset, obj: Netcdf, ncml: Path, group: str) → Dataset[source]

Return content of <netcdf> element.

Parameters:

target (xr.Dataset) – Target dataset to be updated with <netcdf>’s content.
ref (xr.Dataset) – Reference dataset used to copy content into target.
obj (Netcdf) – <netcdf> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.
group (str) – Path of the group to parse within the ncml. The special value * opens every group and flattens the variables into a single dataset.

Returns:

Dataset holding variables and attributes defined in <netcdf> element.

Return type:

xr.Dataset

xncml.parser.read_remove(target: Dataset | Variable, obj: Remove) → Dataset[source]

Remove item from dataset.

Parameters:

target (xr.Dataset | xr.Variable) – Target dataset or variable to be updated.
obj (Remove instance) – <remove> object description.

Returns:

Dataset with attribute, variable or dimension removed, or variable with attribute removed.

Return type:

xr.Dataset or xr.Variable

xncml.parser.read_scan(obj: Scan, ncml: Path) → list[Dataset][source]

Return list of datasets defined in <scan> element.

Parameters:

obj (Aggregation.Scan instance) – <scan> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.

Returns:

List of datasets found by scan.

Return type:

list

xncml.parser.read_values(var_name: str, expected_size: int, values_tag: Values) → list[source]

Read values for <variable> element.

Parameters:

var_name (str) – The variable name.
size (int) – The variable expected size.
values_tag (Values instance) – <values> object description

Returns:

A list filled with values from <values> element.

Return type:

list

xncml.parser.read_variable(target: Dataset, ref: Dataset, obj: Variable, dimensions: dict, enums: dict[str, dict[str, int]], group_path: str) → Dataset[source]

Parse <variable> element.

Parameters:

target (xr.Dataset) – Target dataset to be updated.
ref (xr.Dataset) – Reference dataset used to copy content into target.
obj (Variable) – <variable> object description.
dimensions (dict) – Dimension attributes keyed by name.
enums (dict[str, dict]) – The enums types that have been read in the parent groups.
group_path (str) – Path to the parent group.

Returns:

Dataset holding variable defined in <variable> element.

Return type:

xr.Dataset

xncml.parser.rename_dimension(target: Dataset, ref: Dataset, obj: Dimension) → Dataset[source]: Rename dimension in target dataset.