Welcome to CodeStamper’s documentation!

CodeStamper

Reliability Rating Maintainability Rating Quality Gate Status Code Coverage CI status Docs Pylint

CodeStamper aims to help the user in ensuring traceability between ML experiments and code.

1.1. Description

When running ML experiments one would want to be able to replicate a past experiment at any point in time. One aspect to achieve this(although not the only one) is to be able to run the exact same code version.

1.1.1. When things go wrong. An ML experiment is started but it might not be reproducible in the future because:

Issue

CodeStamper’s solution

The experiment does not contain any information related to the code with which it was produced

✅ Logs information related to last git commit

Code modifications were staged but not commited or not all modified files were commited

✅ Logs any local changes not caught in a commit as patches that can be restored.
✅ Can prevent running experiments before having all the local modifications versioned on git.

The code is commited, but the code never gets pushed

✅Can log contents of commits not already Pushed

The experiment does not contain exact information related to the python enviroment used.
Even if all the code is versioned re-running the same experiment 8 months from now might not work the same if the python package versions have changed(APIs/implementations of different algorithms might have changed).

✅ Logs current python environment state

1.2. Installing

pip install CodeStamper

1.3. Examples

1.3.1. Enforce a clean workspace

from codestamper import Gitstamp

GitStamp().raise_if_dirty()

1.3.2. Log the current code state

from codestamper import Gitstamp

GitStamp().log_state('./experiment/code_log', modified_as_patch=True, unpushed_as_patch=True)
📁experiments/code_log
|--🗎 code_state.json
|--🗎 mod.patch
|--🗎 unpushed<git-commit>-<git-commit>.patch
|--🗎 pip-packages.txt
|--🗎 conda_env.yaml
  • code_state.json

{
  "date": "03/08/2022 21:10:34",
  "git": {
    "hash": "75c88ba",
    "user": "git-usernmae",
    "email": "your-email-here@gmail.com"
  },
  "node": {
    "username": "gitpod",
    "node": "bmsan-gitstamp",
    "system": "Linux",
    "version": "#44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022",
    "release": "5.15.0-41-generic"
  },
  "python": {
    "version": "3.8.13 (default, Jul 26 2022, 01:36:30) \n[GCC 9.4.0]",
    "pip_packages": {
      "argon2-cffi": "21.3.0",
      "argon2-cffi-bindings": "21.2.0",
        
    }
  }
}
  • mod.patch

Contains modifications(staged/or unstaged) of git tracked files

The modifications can be applied in an workspace over the commit hash mentioned in the code_state.json

# Make sure we are at the right commit
git checkout <git.hash from code_state.json>

# Add uncommited changes to the workspace
git apply mod.patch
  • unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch

Contains the delta between the current commit and last pushed commit. This should be used only in the unlikely event when the unpushed commits get lost. It should be considered an experimental last resort feature.

# Make sure we are at the right commit
git checkout <last_pushed_commit_hash>

# Add uncommited changes to the workspace
git apply unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch

Documentation

High Level API

class codestamper.GitStamp(git_cmd: str = 'git')

Provides ways of logging & retriving data related to Workspace state & python env

get_state_info(git_usr=True, node_info=True, pip_info=True, conda_info=True)
Returns information related to:
  • git state

  • node(machine) state

  • python env

Parameters
  • git_usr – Include information related to the current git user, by default True

  • node_info – Include information related to the current machine, by default True

  • pip_info – Information related to python packages gathered through pip, by default True

  • conda_info – Information related to python packages in conda envs, by default True

log_state(folder, modified_as_patch=True, unpushed_as_patch=False, git_usr=True, node_info=True, pip_info=True, conda_info=True, poetry_info=True)

Logs Code & Env State. Generates a folder containing logged information.

  • code_state.json - contains git/platform/python information

  • mod.patch - contains diff between last commit and current code

  • unpatched<>-<>.patch - contains diff between last commit and last pushed commit

  • pip-packages.txt

  • conda_env.yaml - if conda is present

  • poetry.lock - if poetry is present

Parameters
  • folder – Folder where the state is logged

  • modified_as_patch – Save code modifications(since last commit) as a patch file [mod.patch], by default True

  • unpushed_as_patch – Save code modifications of unpushed commits as patch file [unpushed<hash1>-<hash2>.patch], by default False

  • git_usr – Save git info related to current git user, by default True

  • node_info – Save information related to the machine that the code is running on, by default True

  • pip_info – Information related to python packages gathered through pip, by default True

  • conda_info – Information related to python packages in conda envs, by default True

  • conda_info – Information related to python packages in poetry envs, by default True

raise_if_dirty(modified: bool = True, untracked: Union[bool, List[str]] = True)

Raise DirtyWorkspace exception if git workspace is dirty

Parameters
  • modified – check for modified but uncommited files, by default True

  • untracked – check for untracked git files. it can receive a list of targeted file extensions which can be given to it, by default True

Raises

DirtyWorkspace

exception codestamper.DirtyWorkspace

Git Workspace contains modified files and/or untracked files

exception codestamper.GitNotFound

Raised when git executable is not found

exception codestamper.LastPushedCommitNA

Cannot find a commit in history that is in sync with the git remote repo

Low Level API

class codestamper.pythonenv.PipEnv

Bases: codestamper.pythonenv.Env

Retrives python package information from PIP

get_env_info() dict

Get env information

load_env()

Extract env information

save_raw(fname)

Save env information to file

class codestamper.pythonenv.CondaEnv

Bases: codestamper.pythonenv.Env

Retrives Conda package information

get_env_info()

Get env information

load_env()

Extract env information

save_raw(fname)

Save env information to file

class codestamper.pythonenv.Env

Bases: abc.ABC

Base Class for python Enviroment extractors.

get_env_info() dict

Get env information

abstract load_env()

Extract env information

save_raw(fname)

Save env information to file

class codestamper.gitutils.Git(git_cmd: str = 'git')

Bases: object

Provides git information

cmd(args: List[str], to_file: Optional[str] = None)

Run a git command.

Parameters
  • args – Parameters to pass to git

  • to_file – Write restuls to the file named to_file, by default None

  • optional – Write restuls to the file named to_file, by default None

Return type

The command result

Raises

GitNotFound – If Git executable is not found

gen_mod_diff(out_folder=None, fname=None)

Generate a diff(patch) between the workspace and the last commit.

gen_unpushed_diff(out_folder=None, fname=None)

Generate a diff(patch) between the last commit and the last pushed commit.

get_config(param: str)

Return the value of the param argument from the git config

Parameters

param – Parameter name

Return type

Parameter value

get_hash()

Returns the hash of the latest commit

get_unpushed_start_end() Tuple[str, str]

Returns the hash of the last pushed commit & the last unpushed commit :raises LastPushedCommitNA: Raises an error if no commits were pushed

git_user_config()

Returns the username and email of the current git user

modified() List[str]
Return type

A list of filenames which are modified from the last commit

untracked(extensions: Optional[List[str]] = None)

Returns a list of untracked files

Parameters
  • extensions – List of targeted extensions. If None returns all untracked files, by default None

  • optional – List of targeted extensions. If None returns all untracked files, by default None

Return type

List of untracked fields

Indices and tables