Data versioning

Note

In verta==0.16.0, the dataset versioning interface was overhauled to be more flexible, robust, and consistent with other ModelDB entities. If you have been using an older version of the client, please refer to this quick guide on updating your code.

Classes

class verta._dataset_versioning.dataset.Dataset

Object representing a ModelDB dataset.

Changed in version 0.16.0: The dataset versioning interface was updated for flexibility, robustness, and consistency with other ModelDB entities.

This class provides read/write functionality for dataset metadata and access to its versions.

There should not be a need to instantiate this class directly; please use Client.create_dataset().

Variables:
  • id (str) – ID of this dataset.
  • name (str) – Name of this dataset.
  • workspace (str) – Workspace containing this dataset.
  • versions (DatasetVersions) – Versions of this dataset.
set_description(desc)

Sets the description of this dataset.

Parameters:desc (str) – Description to set.
get_description()

Gets the description of this dataset.

Returns:str – Description of this dataset.
add_tag(tag)

Adds a tag to this dataset.

Parameters:tag (str) – Tag to add.
add_tags(tags)

Adds multiple tags to this dataset.

Parameters:tags (list of str) – Tags to add.
get_tags()

Gets all tags from this dataset.

Returns:list of str – All tags.
del_tag(tag)

Deletes a tag from this dataset.

This method will not raise an error if the tag does not exist.

Parameters:tag (str) – Tag to delete.
add_attribute(key, value)

Adds an attribute to this dataset.

Parameters:
  • key (str) – Name of the attribute.
  • value (one of {None, bool, float, int, str, list, dict}) – Value of the attribute.
add_attributes(attrs)

Adds potentially multiple attributes to this dataset.

Parameters:attributes (dict of str to {None, bool, float, int, str, list, dict}) – Attributes.
get_attribute(key)

Gets the attribute with name key from this dataset.

Parameters:key (str) – Name of the attribute.
Returns:one of {None, bool, float, int, str} – Value of the attribute.
get_attributes()

Gets all attributes from this dataset.

Returns:dict of str to {None, bool, float, int, str} – Names and values of all attributes.
del_attribute(key)

Deletes the attribute with name key from this dataset.

This method will not raise an error if the attribute does not exist.

Parameters:key (str) – Name of the attribute.
create_version(content, desc=None, tags=None, attrs=None, date_created=None)

Creates a dataset version.

Parameters:
  • content (dataset blob subclass) – Dataset content.
  • desc (str, optional) – Description of the dataset version.
  • tags (list of str, optional) – Tags of the dataset version.
  • attrs (dict of str to {None, bool, float, int, str}, optional) – Attributes of the dataset version.
Returns:

DatasetVersion

Examples

from verta.dataset import Path
version = dataset.create_version(Path("data.csv"))
get_version(id)

Gets the specified dataset version.

Parameters:id (str) – Dataset version ID.
Returns:DatasetVersion
get_latest_version()

Gets the latest dataset version.

Returns:DatasetVersion
delete()

Deletes this dataset.

class verta._dataset_versioning.dataset_version.DatasetVersion

Object representing a ModelDB dataset version.

Changed in version 0.16.0: The dataset versioning interface was updated for flexibility, robustness, and consistency with other ModelDB entities.

This class provides read/write functionality for dataset version metadata and access to its content.

There should not be a need to instantiate this class directly; please use Dataset.create_version().

Variables:
  • id (str) – ID of this dataset version.
  • version (int) – Version number of this dataset version.
  • dataset_id (str) – ID of this version’s dataset.
  • parent_id (str) – ID of this version’s preceding version.
get_content()

Returns the content of this dataset version.

Returns:dataset blob subclass – Dataset content.
list_components()

Shorthand for get_content().list_components().

set_description(desc)

Sets the description of this dataset version.

Parameters:desc (str) – Description to set.
get_description()

Gets the description of this dataset version.

Returns:str – Description of this dataset version.
add_tag(tag)

Adds a tag to this dataset version.

Parameters:tag (str) – Tag to add.
add_tags(tags)

Adds multiple tags to this dataset version.

Parameters:tags (list of str) – Tags to add.
get_tags()

Gets all tags from this dataset version.

Returns:list of str – All tags.
del_tag(tag)

Deletes a tag from this dataset version.

This method will not raise an error if the tag does not exist.

Parameters:tag (str) – Tag to delete.
add_attribute(key, value)

Adds an attribute to this dataset version.

Parameters:
  • key (str) – Name of the attribute.
  • value (one of {None, bool, float, int, str, list, dict}) – Value of the attribute.
add_attributes(attrs)

Adds potentially multiple attributes to this dataset version.

Parameters:attributes (dict of str to {None, bool, float, int, str, list, dict}) – Attributes.
get_attribute(key)

Gets the attribute with name key from this dataset version.

Parameters:key (str) – Name of the attribute.
Returns:one of {None, bool, float, int, str} – Value of the attribute.
get_attributes()

Gets all attributes from this dataset version.

Returns:dict of str to {None, bool, float, int, str} – Names and values of all attributes.
del_attribute(key)

Deletes the attribute with name key from this dataset version.

This method will not raise an error if the attribute does not exist.

Parameters:key (str) – Name of the attribute.
delete()

Deletes this dataset version.

Collections

class verta._dataset_versioning.datasets.Datasets

list-like object containing Datasets.

This class provides functionality for filtering and sorting its contents.

There should not be a need to instantiate this class directly; please use Client.datasets.

Examples

datasets = client.datasets.find("tags ~= census")
dataset = datasets.sort("time_created", descending=True)[0]  # most recent
with_workspace(workspace_name=None)

Returns datasets in the specified workspace.

Parameters:workspace_name (str or None, default None) – Workspace name. If None, uses personal workspace.
Returns:Datasets – Filtered datasets.
with_ids(ids)

Returns datasets with the specified IDs.

Parameters:ids (list of str) – Dataset IDs.
Returns:Datasets – Filtered datasets.
find(*args)

Gets the results from this collection that match input predicates.

A predicate is a string containing a simple boolean expression consisting of:

  • a dot-delimited property such as metrics.accuracy
  • a Python boolean operator such as >=
  • a literal value such as .8
Parameters:*args (strs) – Predicates specifying results to get.
Returns:The same type of object given in the input.

Examples

runs.find("hyperparameters.hidden_size == 256",
           "metrics.accuracy >= .8")
# <ExperimentRuns containing 3 runs>
# alternatively:
runs.find(["hyperparameters.hidden_size == 256",
           "metrics.accuracy >= .8"])
# <ExperimentRuns containing 3 runs>
set_page_limit(limit)

Sets the number of entites to fetch per backend call during iteration.

By default, each call fetches a batch of 100 entities, but lowering this value may be useful for substantially larger responses.

Parameters:limit (int) – Number of entities to fetch per call.

Examples

runs = proj.expt_runs
runs.set_page_limit(10)
for run in runs:  # fetches 10 runs per backend call
    print(run.get_metric("accuracy"))
sort(key, descending=False)

Sorts the results from this collection by key.

A key is a string containing a dot-delimited property such as metrics.accuracy.

Parameters:
  • key (str) – Dot-delimited property.
  • descending (bool, default False) – Order in which to return sorted results.
Returns:

The same type of object given in the input.

Examples

runs.sort("metrics.accuracy")
# <ExperimentRuns containing 3 runs>
class verta._dataset_versioning.dataset_versions.DatasetVersions

list-like object containing DatasetVersions.

This class provides functionality for filtering and sorting its contents.

There should not be a need to instantiate this class directly; please use Dataset.versions.

Examples

versions = dataset.versions.find("tags ~= normalized")
version = versions.sort("time_created", descending=True)[0]  # most recent
with_dataset(dataset)

Returns versions of the specified dataset.

Parameters:dataset (Dataset or None) – Dataset. If None, returns versions across all datasets.
Returns:DatasetVersions – Filtered dataset versions.
find(*args)

Gets the results from this collection that match input predicates.

A predicate is a string containing a simple boolean expression consisting of:

  • a dot-delimited property such as metrics.accuracy
  • a Python boolean operator such as >=
  • a literal value such as .8
Parameters:*args (strs) – Predicates specifying results to get.
Returns:The same type of object given in the input.

Examples

runs.find("hyperparameters.hidden_size == 256",
           "metrics.accuracy >= .8")
# <ExperimentRuns containing 3 runs>
# alternatively:
runs.find(["hyperparameters.hidden_size == 256",
           "metrics.accuracy >= .8"])
# <ExperimentRuns containing 3 runs>
set_page_limit(limit)

Sets the number of entites to fetch per backend call during iteration.

By default, each call fetches a batch of 100 entities, but lowering this value may be useful for substantially larger responses.

Parameters:limit (int) – Number of entities to fetch per call.

Examples

runs = proj.expt_runs
runs.set_page_limit(10)
for run in runs:  # fetches 10 runs per backend call
    print(run.get_metric("accuracy"))
sort(key, descending=False)

Sorts the results from this collection by key.

A key is a string containing a dot-delimited property such as metrics.accuracy.

Parameters:
  • key (str) – Dot-delimited property.
  • descending (bool, default False) – Order in which to return sorted results.
Returns:

The same type of object given in the input.

Examples

runs.sort("metrics.accuracy")
# <ExperimentRuns containing 3 runs>