Data versioning¶
Note
In verta==0.16.0
, the dataset versioning interface was overhauled to be more flexible, robust, and consistent with other ModelDB entities. If you have been using an older version of
the client, please refer to
this quick guide
on updating your code.
Classes¶
-
class
verta._dataset_versioning.dataset.
Dataset
¶ Object representing a ModelDB dataset.
Changed in version 0.16.0: The dataset versioning interface was updated for flexibility, robustness, and consistency with other ModelDB entities.
This class provides read/write functionality for dataset metadata and access to its versions.
There should not be a need to instantiate this class directly; please use
Client.create_dataset()
.Variables: - id (str) – ID of this dataset.
- name (str) – Name of this dataset.
- workspace (str) – Workspace containing this dataset.
- versions (
DatasetVersions
) – Versions of this dataset.
-
set_description
(desc)¶ Sets the description of this dataset.
Parameters: desc (str) – Description to set.
-
get_description
()¶ Gets the description of this dataset.
Returns: str – Description of this dataset.
-
add_tag
(tag)¶ Adds a tag to this dataset.
Parameters: tag (str) – Tag to add.
Adds multiple tags to this dataset.
Parameters: tags (list of str) – Tags to add.
Gets all tags from this dataset.
Returns: list of str – All tags.
-
del_tag
(tag)¶ Deletes a tag from this dataset.
This method will not raise an error if the tag does not exist.
Parameters: tag (str) – Tag to delete.
-
add_attribute
(key, value)¶ Adds an attribute to this dataset.
Parameters: - key (str) – Name of the attribute.
- value (one of {None, bool, float, int, str, list, dict}) – Value of the attribute.
-
add_attributes
(attrs)¶ Adds potentially multiple attributes to this dataset.
Parameters: attributes (dict of str to {None, bool, float, int, str, list, dict}) – Attributes.
-
get_attribute
(key)¶ Gets the attribute with name key from this dataset.
Parameters: key (str) – Name of the attribute. Returns: one of {None, bool, float, int, str} – Value of the attribute.
-
get_attributes
()¶ Gets all attributes from this dataset.
Returns: dict of str to {None, bool, float, int, str} – Names and values of all attributes.
-
del_attribute
(key)¶ Deletes the attribute with name key from this dataset.
This method will not raise an error if the attribute does not exist.
Parameters: key (str) – Name of the attribute.
-
create_version
(content, desc=None, tags=None, attrs=None, date_created=None)¶ Creates a dataset version.
Parameters: - content (dataset blob subclass) – Dataset content.
- desc (str, optional) – Description of the dataset version.
- tags (list of str, optional) – Tags of the dataset version.
- attrs (dict of str to {None, bool, float, int, str}, optional) – Attributes of the dataset version.
Returns: Examples
from verta.dataset import Path version = dataset.create_version(Path("data.csv"))
-
get_version
(id)¶ Gets the specified dataset version.
Parameters: id (str) – Dataset version ID. Returns: DatasetVersion
-
get_latest_version
()¶ Gets the latest dataset version.
Returns: DatasetVersion
-
delete
()¶ Deletes this dataset.
-
class
verta._dataset_versioning.dataset_version.
DatasetVersion
¶ Object representing a ModelDB dataset version.
Changed in version 0.16.0: The dataset versioning interface was updated for flexibility, robustness, and consistency with other ModelDB entities.
This class provides read/write functionality for dataset version metadata and access to its content.
There should not be a need to instantiate this class directly; please use
Dataset.create_version()
.Variables: - id (str) – ID of this dataset version.
- version (int) – Version number of this dataset version.
- dataset_id (str) – ID of this version’s dataset.
- parent_id (str) – ID of this version’s preceding version.
-
get_content
()¶ Returns the content of this dataset version.
Returns: dataset blob subclass – Dataset content.
-
list_components
()¶ Shorthand for
get_content().list_components()
.
-
set_description
(desc)¶ Sets the description of this dataset version.
Parameters: desc (str) – Description to set.
-
get_description
()¶ Gets the description of this dataset version.
Returns: str – Description of this dataset version.
-
add_tag
(tag)¶ Adds a tag to this dataset version.
Parameters: tag (str) – Tag to add.
Adds multiple tags to this dataset version.
Parameters: tags (list of str) – Tags to add.
Gets all tags from this dataset version.
Returns: list of str – All tags.
-
del_tag
(tag)¶ Deletes a tag from this dataset version.
This method will not raise an error if the tag does not exist.
Parameters: tag (str) – Tag to delete.
-
add_attribute
(key, value)¶ Adds an attribute to this dataset version.
Parameters: - key (str) – Name of the attribute.
- value (one of {None, bool, float, int, str, list, dict}) – Value of the attribute.
-
add_attributes
(attrs)¶ Adds potentially multiple attributes to this dataset version.
Parameters: attributes (dict of str to {None, bool, float, int, str, list, dict}) – Attributes.
-
get_attribute
(key)¶ Gets the attribute with name key from this dataset version.
Parameters: key (str) – Name of the attribute. Returns: one of {None, bool, float, int, str} – Value of the attribute.
-
get_attributes
()¶ Gets all attributes from this dataset version.
Returns: dict of str to {None, bool, float, int, str} – Names and values of all attributes.
-
del_attribute
(key)¶ Deletes the attribute with name key from this dataset version.
This method will not raise an error if the attribute does not exist.
Parameters: key (str) – Name of the attribute.
-
delete
()¶ Deletes this dataset version.
Collections¶
-
class
verta._dataset_versioning.datasets.
Datasets
¶ list
-like object containingDataset
s.This class provides functionality for filtering and sorting its contents.
There should not be a need to instantiate this class directly; please use
Client.datasets
.Examples
datasets = client.datasets.find("tags ~= census") dataset = datasets.sort("time_created", descending=True)[0] # most recent
-
with_workspace
(workspace_name=None)¶ Returns datasets in the specified workspace.
Parameters: workspace_name (str or None, default None) – Workspace name. If None
, uses personal workspace.Returns: Datasets
– Filtered datasets.
-
with_ids
(ids)¶ Returns datasets with the specified IDs.
Parameters: ids (list of str) – Dataset IDs. Returns: Datasets
– Filtered datasets.
-
find
(*args)¶ Gets the results from this collection that match input predicates.
A predicate is a string containing a simple boolean expression consisting of:
- a dot-delimited property such as
metrics.accuracy
- a Python boolean operator such as
>=
- a literal value such as
.8
Parameters: *args (strs) – Predicates specifying results to get. Returns: The same type of object given in the input. Examples
runs.find("hyperparameters.hidden_size == 256", "metrics.accuracy >= .8") # <ExperimentRuns containing 3 runs> # alternatively: runs.find(["hyperparameters.hidden_size == 256", "metrics.accuracy >= .8"]) # <ExperimentRuns containing 3 runs>
- a dot-delimited property such as
-
set_page_limit
(limit)¶ Sets the number of entites to fetch per backend call during iteration.
By default, each call fetches a batch of 100 entities, but lowering this value may be useful for substantially larger responses.
Parameters: limit (int) – Number of entities to fetch per call. Examples
runs = proj.expt_runs runs.set_page_limit(10) for run in runs: # fetches 10 runs per backend call print(run.get_metric("accuracy"))
-
sort
(key, descending=False)¶ Sorts the results from this collection by key.
A key is a string containing a dot-delimited property such as
metrics.accuracy
.Parameters: - key (str) – Dot-delimited property.
- descending (bool, default False) – Order in which to return sorted results.
Returns: The same type of object given in the input.
Examples
runs.sort("metrics.accuracy") # <ExperimentRuns containing 3 runs>
-
-
class
verta._dataset_versioning.dataset_versions.
DatasetVersions
¶ list
-like object containingDatasetVersion
s.This class provides functionality for filtering and sorting its contents.
There should not be a need to instantiate this class directly; please use
Dataset.versions
.Examples
versions = dataset.versions.find("tags ~= normalized") version = versions.sort("time_created", descending=True)[0] # most recent
-
with_dataset
(dataset)¶ Returns versions of the specified dataset.
Parameters: dataset ( Dataset
or None) – Dataset. IfNone
, returns versions across all datasets.Returns: DatasetVersions
– Filtered dataset versions.
-
find
(*args)¶ Gets the results from this collection that match input predicates.
A predicate is a string containing a simple boolean expression consisting of:
- a dot-delimited property such as
metrics.accuracy
- a Python boolean operator such as
>=
- a literal value such as
.8
Parameters: *args (strs) – Predicates specifying results to get. Returns: The same type of object given in the input. Examples
runs.find("hyperparameters.hidden_size == 256", "metrics.accuracy >= .8") # <ExperimentRuns containing 3 runs> # alternatively: runs.find(["hyperparameters.hidden_size == 256", "metrics.accuracy >= .8"]) # <ExperimentRuns containing 3 runs>
- a dot-delimited property such as
-
set_page_limit
(limit)¶ Sets the number of entites to fetch per backend call during iteration.
By default, each call fetches a batch of 100 entities, but lowering this value may be useful for substantially larger responses.
Parameters: limit (int) – Number of entities to fetch per call. Examples
runs = proj.expt_runs runs.set_page_limit(10) for run in runs: # fetches 10 runs per backend call print(run.get_metric("accuracy"))
-
sort
(key, descending=False)¶ Sorts the results from this collection by key.
A key is a string containing a dot-delimited property such as
metrics.accuracy
.Parameters: - key (str) – Dot-delimited property.
- descending (bool, default False) – Order in which to return sorted results.
Returns: The same type of object given in the input.
Examples
runs.sort("metrics.accuracy") # <ExperimentRuns containing 3 runs>
-