xncml package
Tools for manipulating NcML (NetCDF Markup Language) files with/for xarray
Subpackages
- xncml.generated package
AggregationAggregation.MetaAggregation.ScanAggregation.ScanFmrcAggregation.VariableAggAggregation.cache_variableAggregation.choiceAggregation.dim_nameAggregation.fmrc_definitionAggregation.netcdfAggregation.promote_global_attributeAggregation.recheck_everyAggregation.scanAggregation.scan_fmrcAggregation.time_units_changeAggregation.typeAggregation.variable_agg
AggregationTypeAttributeCacheVariableDataTypeDimensionEnumTypedefGroupLogicalReduceLogicalSectionLogicalSliceNetcdfObjectTypePromoteGlobalAttributeRemoveValuesVariable
Submodules
xncml.core module
Core features of xncml.
This module exposes the Dataset class which is used to manipulate ncml files.
- class xncml.core.AggregationType(value)[source]
Bases:
EnumType of aggregation.
- FORECAST_MODEL_RUN_COLLECTION = 'forecastModelRunCollection'
- FORECAST_MODEL_RUN_SINGLE_COLLECTION = 'forecastModelRunSingleCollection'
- JOIN_EXISTING = 'joinExisting'
- JOIN_NEW = 'joinNew'
- TILED = 'tiled'
- UNION = 'union'
- class xncml.core.Dataset(filepath: str = None, location: str = None)[source]
Bases:
objectA class for reading and manipulating NcML file.
- Note that NcML documents are used for two distinct purposes:
an XML description of NetCDF structure and metadata;
create virtual NetCDF datasets, e.g. an aggregation of multiple files.
This class supports both types of uses.
- add_aggregation(dim_name: str, type_: str, recheck_every: str = None, time_units_change: bool = None)[source]
Add aggregation.
- Parameters:
dim_name (str) – Dimension name.
type_ (str) – Aggregation type.
recheck_every (str) – Time interval for rechecking the aggregation. Only used if
type_isAggregationType.scan.time_units_change (bool) – Whether the time units change. Only used if
type_isAggregationType.scan.
- add_dataset_attribute(key, value, type_='String')[source]
Add dataset attribute
- Parameters:
key (str) – Attribute name.
value (object) – Attribute value. Must be a serializable Python Object.
type_ (str, default: 'String') – String describing attribute type.
- add_scan(dim_name: str, location: str, reg_exp: str = None, suffix: str = None, subdirs: bool = True, older_than: str = None, date_format_mark: str = None, enhance: bool = None)[source]
Add scan element.
- Parameters:
dim_name (str) – Dimension name.
location (str) – Location of the files to scan.
reg_exp (str) – Regular expression to match the full pathname of files.
suffix (str) – File suffix.
subdirs (bool) – Whether to scan subdirectories.
older_than (str) – Older than time interval.
date_format_mark (str) – Date format mark.
enhance (bool) – Whether to enhance the scan.
- add_variable_agg(dim_name: str, name: str)[source]
Add variable aggregation.
- Parameters:
dim_name (str) – Dimension name for the aggregation.
name (str) – Variable name.
- add_variable_attribute(variable, key, value, type_='String')[source]
Add variable attribute.
- Parameters:
variable (str) – Variable name
key (str) – Attribute name
value (object) – Attribute value. Must be a serializable Python Object
type_ (str, default: 'String') – String describing attribute type.
- remove_dataset_attribute(key)[source]
Remove dataset attribute.
- Parameters:
key (str) – Name of the attribute to remove.
- remove_variable(variable)[source]
Remove dataset variable.
- Parameters:
key (str) – Name of the variable to remove.
- rename_dataset_attribute(old_name, new_name)[source]
Rename dataset attribute.
- Parameters:
old_name (str) – Original attribute name.
new_name (str) – New attribute name.
- rename_dimension(dimension, new_name)[source]
Rename dimension.
- Parameters:
dimension (str) – Original dimension name.
new_name (str) – New dimension name.
- rename_variable(variable, new_name)[source]
Rename variable attribute
- Parameters:
variable (str) – Original variable name.
new_name (str) – New variable name.
- rename_variable_attribute(variable, old_name, new_name)[source]
Rename variable attribute.
- Parameters:
variable (str) – Variable name.
old_name (str) – Original attribute name.
new_name (str) – New attribute name.
- to_cf_dict()[source]
Convert internal representation to a CF-JSON dictionary.
The CF-JSON specification includes data for variables, but if the data is not within the NcML, it cannot be included in the JSON representation.
- Returns:
Dictionary with dimensions and variables keys. May also optionally include an attributes key and a
groups key. Additional keys prefixed with @ may be included for <netcdf> tag attributes,
for example @location.
References
xncml.parser module
# NcML parser for xarray
The open_ncml function parse an XML document compliant with the NcML-2.2 schema and returns an xarray Dataset.
The XML is parsed into a Python objects using xsdata. The parser converts XML elements into classes instances defined in an autogenerated data model (generated.ncml_2_2). This datamodel was created using:
`bash xsdata generate -ds NumPy --compound-fields -mll=119 --postponed-annotations schemas/ncml-2.2.xsd `
The code below converts the NcML instructions into xarray instructions. Not all NcML instructions are currently supported.
## TODO
Support for these elements is missing: - <cacheVariable> - <logicalReduce> - <logicalSection> - <logicalSlice> - <promoteGlobalAttribute> - <scanFmrc>
Support for these attributes is missing: - dateFormatMark - olderThan - tiled aggregations
- xncml.parser.build_scalar_variable(var_name: str, values_tag: Values, var_type: str) Variable[source]
Build an xr.Variable for scalar variables.
- Parameters:
var_name (str) – The variable name.
values_tag (Values instance) – <values> object description
var_type (str) – The variable expected type.
- Returns:
A xr.Variable filled with values from <values> element.
- Return type:
xr.Variable
- Raises:
ValueError – If the <values> tag is not a valid scalar.
- xncml.parser.cast(obj: Attribute) tuple | str[source]
Cast attribute value to the appropriate type.
- xncml.parser.nctype(typ: DataType) type[source]
Return Python type corresponding to the NcML DataType of object.
- xncml.parser.open_ncml(ncml: str | Path, group: str = '/') Dataset[source]
Convert NcML document to a dataset.
- Parameters:
ncml (str | Path) – Path to NcML file.
group (str) – Path of the group to parse within the ncml. The special value
*opens every group and flattens the variables into a single dataset, renaming variables and dimensions if conflicting names are found.
- Returns:
Dataset holding variables and attributes defined in NcML document.
- Return type:
xr.Dataset
- xncml.parser.parse(path: Path) Netcdf[source]
Parse NcML file using NetCDF datamodel based on NcML-2.2 Schema.
- Parameters:
path (Path) – Path to NcML file.
- Returns:
Object description of NcML content.
- Return type:
Netcdf instance.
- xncml.parser.read_aggregation(target: Dataset, obj: Aggregation, ncml: Path) Dataset[source]
Return merged or concatenated content of <aggregation> element.
- Parameters:
target (xr.Dataset) – Target dataset to be updated with <netcdf>’s content.
obj (Aggregation) – <aggregation> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.
- Returns:
Dataset holding variables and attributes defined in <aggregation> element.
- Return type:
xr.Dataset
- xncml.parser.read_attribute(target: Dataset | Variable, obj: Attribute, ref: Dataset = None)[source]
Update target dataset in place with new or modified attribute.
- Parameters:
target (xr.Dataset | xr.Variable) – Target dataset to be updated.
obj (Attribute instance) – <attribute> object description.
ref (xr.Dataset) – Reference dataset.
- xncml.parser.read_coord_value(nc: Netcdf, agg: Aggregation, dtypes: list = ())[source]
Read coordValue attribute of <netcdf> element.
- Parameters:
nc (Netcdf instance) – <netcdf> object description.
agg (Aggregation instance) – <aggregation> object description
dtypes (tuple) – List of preferred type for coordinate value.
- Returns:
Coordinate value cast to proper type.
- Return type:
str, np.array, scalar
Notes
The casting logic is most likely not up to spec.
- xncml.parser.read_dimension(obj: Dimension) Dimension[source]
Return dimension object with its length cast to an integer.
- xncml.parser.read_ds(obj: Netcdf, ncml: Path) Dataset[source]
Return dataset defined in <netcdf> element.
- Parameters:
obj (Netcdf) – <netcdf> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.
- Returns:
Dataset defined at <netcdf>’ location attribute.
- Return type:
xr.Dataset
- xncml.parser.read_enum(obj: EnumTypedef) dict[str, list][source]
Parse <enumTypeDef> element.
- Parameters:
obj (EnumTypeDef) – <enumTypeDef> object.
- Returns:
A dictionary describing the Enum.
- Return type:
dict
Examples
- xncml.parser.read_group(target: Dataset, ref: Dataset | None, obj: Group | Netcdf, groups_to_read: list[str], parent_group_path: str = '/', dims: dict = None, enums: dict = None) Dataset[source]
Parse <group> items, typically <dimension>, <variable>, <attribute> and <remove> elements.
- Parameters:
target (xr.Dataset) – Target dataset to be updated.
ref (xr.Dataset | None) – Reference dataset used to copy content into target.
groups_to_read (list[str]) – List of groups that must be read and included in target.
parent_group_path (str) – Path of parent group, by default the root group ‘/’.
dims (dict[str, Dimension]) – Dictionary of the dimensions of this dataset.
- Returns:
Dataset holding variables and attributes defined in <netcdf> element.
- Return type:
xr.Dataset
- xncml.parser.read_netcdf(target: Dataset, ref: Dataset, obj: Netcdf, ncml: Path, group: str) Dataset[source]
Return content of <netcdf> element.
- Parameters:
target (xr.Dataset) – Target dataset to be updated with <netcdf>’s content.
ref (xr.Dataset) – Reference dataset used to copy content into target.
obj (Netcdf) – <netcdf> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.
group (str) – Path of the group to parse within the ncml. The special value
*opens every group and flattens the variables into a single dataset.
- Returns:
Dataset holding variables and attributes defined in <netcdf> element.
- Return type:
xr.Dataset
- xncml.parser.read_remove(target: Dataset | Variable, obj: Remove) Dataset[source]
Remove item from dataset.
- Parameters:
target (xr.Dataset | xr.Variable) – Target dataset or variable to be updated.
obj (Remove instance) – <remove> object description.
- Returns:
Dataset with attribute, variable or dimension removed, or variable with attribute removed.
- Return type:
xr.Dataset or xr.Variable
- xncml.parser.read_scan(obj: Scan, ncml: Path) list[Dataset][source]
Return list of datasets defined in <scan> element.
- Parameters:
obj (Aggregation.Scan instance) – <scan> object description.
ncml (Path) – Path to NcML document, sometimes required to follow relative links.
- Returns:
List of datasets found by scan.
- Return type:
list
- xncml.parser.read_values(var_name: str, expected_size: int, values_tag: Values) list[source]
Read values for <variable> element.
- Parameters:
var_name (str) – The variable name.
size (int) – The variable expected size.
values_tag (Values instance) – <values> object description
- Returns:
A list filled with values from <values> element.
- Return type:
list
- xncml.parser.read_variable(target: Dataset, ref: Dataset, obj: Variable, dimensions: dict, enums: dict[str, dict[str, int]], group_path: str) Dataset[source]
Parse <variable> element.
- Parameters:
target (xr.Dataset) – Target dataset to be updated.
ref (xr.Dataset) – Reference dataset used to copy content into target.
obj (Variable) – <variable> object description.
dimensions (dict) – Dimension attributes keyed by name.
enums (dict[str, dict]) – The enums types that have been read in the parent groups.
group_path (str) – Path to the parent group.
- Returns:
Dataset holding variable defined in <variable> element.
- Return type:
xr.Dataset