Versioning

ExperimentRun

See ExperimentRun’s Versioning section.

Repository

Client.get_or_create_repository(name=None, workspace=None, id=None)

Gets or creates a Repository by name and workspace, or gets a Repository by id.

Parameters:
  • name (str) – Name of the Repository. This parameter cannot be provided alongside id.
  • workspace (str, optional) – Workspace under which the Repository with name name exists. If not provided, the current user’s personal workspace will be used.
  • id (str, optional) – ID of the Repository, to be provided instead of name.
Returns:

Repository – Specified Repository.

class verta._repository.Repository(conn, id_)

ModelDB Repository.

There should not be a need to instantiate this class directly; please use Client.get_or_create_repository().

Variables:
  • id (str) – ID of the Repository.
  • name (str) – Name of the Repository.
get_commit(branch=None, tag=None, id=None)

Returns the Commit with the specified branch, tag, or id.

If no arguments are passed, branch="master" is the default.

Parameters:
  • branch (str, optional) – Branch of the Commit.
  • tag (str, optional) – Tag of the Commit.
  • id (str, optional) – ID of the Commit.
Returns:

Commit – Specified Commit.

class verta._repository.commit.Commit(conn, repo, commit_msg, branch_name=None)

Commit within a ModelDB Repository.

There should not be a need to instantiate this class directly; please use Repository.get_commit().

Variables:id (str or None) – ID of the Commit, or None if the Commit has not yet been saved.
walk()

Generates folder names and blob names in this commit by walking through its folder tree.

Similar to the Python standard library’s os.walk(), the yielded folder_names can be modified in-place to remove subfolders from upcoming iterations or alter the order in which they are to be visited.

Note that, also similar to os.walk(), folder_names and blob_names are simply the names of those entities, and not their full paths.

Yields:
  • folder_path (str) – Path to current folder.
  • folder_names (list of str) – Names of subfolders in folder_path.
  • blob_names (list of str) – Names of blobs in folder_path.
update(path, blob)

Adds blob to this Commit at path.

If path is already in this Commit, it will be updated to the new blob.

Parameters:
  • path (str) – Location to add blob to.
  • blob (Blob) – ModelDB versioning blob.
get(path)

Retrieves the blob at path from this Commit.

Parameters:path (str) – Location of a blob.
Returns:blob (Blob) – ModelDB versioning blob.
Raises:LookupError – If path is not in this Commit.
remove(path)

Deletes the blob at path from this Commit.

Parameters:path (str) – Location of a blob.
Raises:LookupError – If path is not in this Commit.
save(message)

Saves this commit to ModelDB.

Note

If this commit contains new S3 datasets to be versioned by ModelDB, a very large temporary download may occur before uploading them to ModelDB.

Parameters:message (str) – Description of this Commit.
tag(tag)

Assigns a tag to this Commit.

Parameters:tag (str) – Tag.
Raises:RuntimeError – If this Commit has not yet been saved.
log()

Yields ancestors, starting from this Commit until the root of the Repository.

Analogous to git log.

Yields:commit (Commit) – Ancestor commit.
new_branch(branch)

Creates a branch at this Commit and returns the checked-out branch.

If branch already exists, it will be moved to this Commit.

Parameters:branch (str) – Branch name.
Returns:commit (Commit) – This Commit as the head of branch.
Raises:RuntimeError – If this Commit has not yet been saved.

Examples

master = repo.get_commit(branch="master")
dev = master.new_branch("development")
diff_from(reference=None)

Returns the diff from reference to self.

Parameters:reference (Commit, optional) – Commit to be compared to.
Returns:Diff – Commit diff.
Raises:RuntimeError – If this Commit or reference has not yet been saved, or if they do not belong to the same Repository.
apply_diff(diff, message, other_parents=[])

Applies a diff to this Commit.

This method creates a new Commit in ModelDB, and assigns a new ID to this object.

Parameters:
  • diff (Diff) – Commit diff.
  • message (str) – Description of the diff.
Raises:

RuntimeError – If this Commit has not yet been saved.

revert(other=None, message=None)

Reverts other.

This method creates a new Commit in ModelDB, and assigns a new ID to this object.

Parameters:
  • other (Commit, optional) – Commit to be reverted. If not provided, this Commit will be reverted.
  • message (str, optional) – Description of the revert. If not provided, a default message will be used.
Raises:

RuntimeError – If this Commit or other has not yet been saved, or if they do not belong to the same Repository.

merge(other, message=None)

Merges a branch headed by other into this Commit.

This method creates a new Commit in ModelDB, and assigns a new ID to this object.

Parameters:
  • other (Commit) – Commit to be merged.
  • message (str, optional) – Description of the merge. If not provided, a default message will be used.
Raises:

RuntimeError – If this Commit or other has not yet been saved, or if they do not belong to the same Repository.

Blobs

Code

class verta.code._git.Git(repo_url=None, branch=None, tag=None, commit_hash=None, _autocapture=True)

Captures metadata about the git commit with the specified branch, tag, or commit_hash.

Parameters:
  • repo_url (str, optional) – Remote repository URL. If not provided, it will automatically be determined.
  • branch (str, optional) – Branch name. If not provided, it will automatically be determined.
  • tag (str, optional) – Commit tag. If not provided, it will automatically be determined.
  • commit_hash (str, optional) – Commit hash. If not provided, it will automatically be determined.
  • _autocapture (bool, default True) – Whether to enable the automatic capturing behavior of parameters above.
Raises:

OSError – If git information cannot automatically be determined.

Examples

from verta.code import Git
code1 = Git()
code2 = Git(
    repo_url="git@github.com:VertaAI/modeldb.git",
    tag="client-v0.14.0",
)
code3 = Git(
    commit_hash="e4e0675",
)
class verta.code._notebook.Notebook(notebook_path=None, _autocapture=True)

Captures metadata about the Jupyter Notebook at notebook_path and the current git environment.

Note

If a git environment is detected, then the Notebook’s recorded filepath will be relative to the root of the repository.

Parameters:
  • notebook_path (str, optional) – Filepath of the Jupyter Notebook. If not provided, it will automatically be determined.
  • _autocapture (bool, default True) – Whether to enable the automatic capturing behavior of parameters above.
Raises:

OSError – If the Notebook filepath cannot automatically be determined.

Examples

from verta.code import Notebook
code1 = Notebook()
code2 = Notebook("Spam-Detection.ipynb")

Configuration

class verta.configuration._hyperparameters.Hyperparameters(hyperparameters=None, hyperparameter_ranges=None, hyperparameter_sets=None)

Captures hyperparameters.

Parameters:
  • hyperparameters (dict of name to value) – Hyperparameter names to individual values.
  • hyperparameter_ranges (dict of name to tuple of (start, stop, step)) – Hyperparameter names to a specified range of values.
  • hyperparameter_sets (dict of name to list of values) – Hyperparameter names to sets of specific values.

Examples

from verta.configuration import Hyperparameters
config1 = Hyperparameters(hyperparameters={
    'C': 1e-4,
    'penalty': 'l2',
})
config2 = Hyperparameters(hyperparameter_ranges={
    'C': (0, 1, 1e-2),
})
config3 = Hyperparameters(hyperparameter_sets={
    'penalty': ['l1', 'l2'],
})

Dataset

class verta.dataset.Path(paths, base_path=None, enable_mdb_versioning=False)

Captures metadata about files.

Note

If relative paths are passed in, they will not be converted to absolute paths.

Parameters:
  • paths (list of str) – List of filepaths or directory paths.
  • base_path (str, optional) – Directory path to be removed from the beginning of all components before saving to ModelDB.
  • enable_mdb_versioning (bool, default False) – Whether to upload the data itself to ModelDB to enable managed data versioning.

Examples

from verta.dataset import Path
dataset1 = Path([
    "../datasets/census-train.csv",
    "../datasets/census-test.csv",
])
dataset2 = Path([
    "../datasets",
])
download(component_path=None, download_to_path=None)

Downloads component_path from this dataset if ModelDB-managed versioning was enabled.

Parameters:
  • component_path (str, optional) – Original path of the file or directory in this dataset to download. If not provided, all files will be downloaded.
  • download_to_path (str, optional) – Path to download to. If not provided, the file(s) will be downloaded into a new path in the current directory. If provided and the path already exists, it will be overwritten.
Returns:

downloaded_to_path (str) – Absolute path where file(s) were downloaded to. Matches download_to_path if it was provided as an argument.

list_components()

Returns the components in this dataset.

Returns:components (list of Component) – Components.
list_paths()

Returns the paths of all components in this dataset.

Returns:component_paths (list of str) – Paths of all components.
class verta.dataset.S3(paths, enable_mdb_versioning=False)

Captures metadata about S3 objects.

If your S3 object requires additional information to identify it, such as its version ID, you can use S3.location().

Parameters:
  • paths (list) – List of S3 URLs of the form "s3://<bucket-name>" or "s3://<bucket-name>/<key>", or objects returned by S3.location().
  • enable_mdb_versioning (bool, default False) – Whether to upload the data itself to ModelDB to enable managed data versioning.

Examples

from verta.dataset import S3
dataset1 = S3([
    "s3://verta-starter/census-train.csv",
    "s3://verta-starter/census-test.csv",
])
dataset2 = S3([
    "s3://verta-starter",
])
dataset3 = S3([
    S3.location("s3://verta-starter/census-train.csv",
                version_id="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"),
])
static location(path, version_id=None)

Returns an object describing an S3 location that can be passed into a new S3.

Parameters:
  • path (str) – S3 URL of the form "s3://<bucket-name>" or "s3://<bucket-name>/<key>".
  • version_id (str, optional) – ID of an S3 object version.
Returns:

S3Location – A location in S3.

Raises:

ValueError – If version_id is provided but path represents a bucket rather than a single object.

download(component_path=None, download_to_path=None)

Downloads component_path from this dataset if ModelDB-managed versioning was enabled.

Parameters:
  • component_path (str, optional) – Original path of the file or directory in this dataset to download. If not provided, all files will be downloaded.
  • download_to_path (str, optional) – Path to download to. If not provided, the file(s) will be downloaded into a new path in the current directory. If provided and the path already exists, it will be overwritten.
Returns:

downloaded_to_path (str) – Absolute path where file(s) were downloaded to. Matches download_to_path if it was provided as an argument.

list_components()

Returns the components in this dataset.

Returns:components (list of Component) – Components.
list_paths()

Returns the paths of all components in this dataset.

Returns:component_paths (list of str) – Paths of all components.
class verta.dataset._dataset.Component

A dataset component returned by dataset.list_components().

Variables:
  • path (str) – File path.
  • size (int) – File size.
  • last_modified (int) – Unix time when this file was last modified.
  • sha256 (str) – SHA-256 checksum.
  • md5 (str) – MD5 checksum.

Environment

class verta.environment.Python(requirements=None, constraints=None, env_vars=None, _autocapture=True)

Captures metadata about Python, installed packages, and system environment variables.

Parameters:
  • requirements (list of str, optional) – List of PyPI package names. If not provided, all packages currently installed through pip will be captured.
  • constraints (list of str, optional) – List of PyPI package names with version specifiers. If not provided, nothing will be captured.
  • env_vars (list of str, optional) – Names of environment variables to capture. If not provided, nothing will be captured.
  • _autocapture (bool, default True) – Whether to enable the automatic capturing behavior of parameters above.

Examples

from verta.environment import Python
env1 = Python(requirements=Python.read_pip_file("../requirements.txt"))
env2 = Python(
    requirements=["tensorflow"],
    env_vars=["CUDA_VISIBLE_DEVICES"],
)
static read_pip_file(filepath)

Reads a pip requirements file into a list that can be passed into a new Python.

Parameters:filepath (str) – Path to a pip requirements or constraints file.
Returns:list of str – Requirement specifiers.
static read_pip_environment()

Reads package versions from pip into a list that can be passed into a new Python.

Returns:list of str – Requirement specifiers.