Welcome to CodeStamper’s documentation!¶
CodeStamper¶
CodeStamper aims to help the user in ensuring traceability between ML experiments and code.
1.1. Description¶
When running ML experiments one would want to be able to replicate a past experiment at any point in time. One aspect to achieve this(although not the only one) is to be able to run the exact same code version.
1.1.1. When things go wrong. An ML experiment is started but it might not be reproducible in the future because:¶
Issue |
CodeStamper’s solution |
---|---|
The experiment does not contain any information related to the code with which it was produced |
✅ Logs information related to last git commit |
Code modifications were staged but not commited or not all modified files were commited |
✅ Logs any local changes not caught in a commit as patches that can be restored. |
The code is commited, but the code never gets pushed |
✅Can log contents of commits not already Pushed |
The experiment does not contain exact information related to the python enviroment used. |
✅ Logs current python environment state |
1.2. Installing¶
pip install CodeStamper
1.3. Examples¶
1.3.1. Enforce a clean workspace¶
from codestamper import Gitstamp
GitStamp().raise_if_dirty()
1.3.2. Log the current code state¶
from codestamper import Gitstamp
GitStamp().log_state('./experiment/code_log', modified_as_patch=True, unpushed_as_patch=True)
📁experiments/code_log
|--🗎 code_state.json
|--🗎 mod.patch
|--🗎 unpushed<git-commit>-<git-commit>.patch
|--🗎 pip-packages.txt
|--🗎 conda_env.yaml
code_state.json
{
"date": "03/08/2022 21:10:34",
"git": {
"hash": "75c88ba",
"user": "git-usernmae",
"email": "your-email-here@gmail.com"
},
"node": {
"username": "gitpod",
"node": "bmsan-gitstamp",
"system": "Linux",
"version": "#44-Ubuntu SMP Wed Jun 22 14:20:53 UTC 2022",
"release": "5.15.0-41-generic"
},
"python": {
"version": "3.8.13 (default, Jul 26 2022, 01:36:30) \n[GCC 9.4.0]",
"pip_packages": {
"argon2-cffi": "21.3.0",
"argon2-cffi-bindings": "21.2.0",
}
}
}
mod.patch
Contains modifications(staged/or unstaged) of git tracked files
The modifications can be applied in an workspace over the commit hash mentioned in the code_state.json
# Make sure we are at the right commit
git checkout <git.hash from code_state.json>
# Add uncommited changes to the workspace
git apply mod.patch
unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch
Contains the delta between the current commit and last pushed commit. This should be used only in the unlikely event when the unpushed commits get lost. It should be considered an experimental last resort feature.
# Make sure we are at the right commit
git checkout <last_pushed_commit_hash>
# Add uncommited changes to the workspace
git apply unpushed<last_pushed_commit_hash>-<last_unpushed_commit_hash>.patch
Documentation¶
High Level API¶
- class codestamper.GitStamp(git_cmd: str = 'git')¶
Provides ways of logging & retriving data related to Workspace state & python env
- get_state_info(git_usr=True, node_info=True, pip_info=True, conda_info=True)¶
- Returns information related to:
git state
node(machine) state
python env
- Parameters
git_usr – Include information related to the current git user, by default True
node_info – Include information related to the current machine, by default True
pip_info – Information related to python packages gathered through pip, by default True
conda_info – Information related to python packages in conda envs, by default True
- log_state(folder, modified_as_patch=True, unpushed_as_patch=False, git_usr=True, node_info=True, pip_info=True, conda_info=True, poetry_info=True)¶
Logs Code & Env State. Generates a folder containing logged information.
code_state.json - contains git/platform/python information
mod.patch - contains diff between last commit and current code
unpatched<>-<>.patch - contains diff between last commit and last pushed commit
pip-packages.txt
conda_env.yaml - if conda is present
poetry.lock - if poetry is present
- Parameters
folder – Folder where the state is logged
modified_as_patch – Save code modifications(since last commit) as a patch file [mod.patch], by default True
unpushed_as_patch – Save code modifications of unpushed commits as patch file [unpushed<hash1>-<hash2>.patch], by default False
git_usr – Save git info related to current git user, by default True
node_info – Save information related to the machine that the code is running on, by default True
pip_info – Information related to python packages gathered through pip, by default True
conda_info – Information related to python packages in conda envs, by default True
conda_info – Information related to python packages in poetry envs, by default True
- raise_if_dirty(modified: bool = True, untracked: Union[bool, List[str]] = True)¶
Raise DirtyWorkspace exception if git workspace is dirty
- Parameters
modified – check for modified but uncommited files, by default True
untracked – check for untracked git files. it can receive a list of targeted file extensions which can be given to it, by default True
- Raises
- exception codestamper.DirtyWorkspace¶
Git Workspace contains modified files and/or untracked files
- exception codestamper.GitNotFound¶
Raised when git executable is not found
- exception codestamper.LastPushedCommitNA¶
Cannot find a commit in history that is in sync with the git remote repo
Low Level API¶
- class codestamper.pythonenv.PipEnv¶
Bases:
codestamper.pythonenv.Env
Retrives python package information from PIP
- get_env_info() dict ¶
Get env information
- load_env()¶
Extract env information
- save_raw(fname)¶
Save env information to file
- class codestamper.pythonenv.CondaEnv¶
Bases:
codestamper.pythonenv.Env
Retrives Conda package information
- get_env_info()¶
Get env information
- load_env()¶
Extract env information
- save_raw(fname)¶
Save env information to file
- class codestamper.pythonenv.Env¶
Bases:
abc.ABC
Base Class for python Enviroment extractors.
- get_env_info() dict ¶
Get env information
- abstract load_env()¶
Extract env information
- save_raw(fname)¶
Save env information to file
- class codestamper.gitutils.Git(git_cmd: str = 'git')¶
Bases:
object
Provides git information
- cmd(args: List[str], to_file: Optional[str] = None)¶
Run a git command.
- Parameters
args – Parameters to pass to git
to_file – Write restuls to the file named to_file, by default None
optional – Write restuls to the file named to_file, by default None
- Return type
The command result
- Raises
GitNotFound – If Git executable is not found
- gen_mod_diff(out_folder=None, fname=None)¶
Generate a diff(patch) between the workspace and the last commit.
- gen_unpushed_diff(out_folder=None, fname=None)¶
Generate a diff(patch) between the last commit and the last pushed commit.
- get_config(param: str)¶
Return the value of the param argument from the git config
- Parameters
param – Parameter name
- Return type
Parameter value
- get_hash()¶
Returns the hash of the latest commit
- get_unpushed_start_end() Tuple[str, str] ¶
Returns the hash of the last pushed commit & the last unpushed commit :raises LastPushedCommitNA: Raises an error if no commits were pushed
- git_user_config()¶
Returns the username and email of the current git user
- modified() List[str] ¶
- Return type
A list
offilenames which are modified from the last commit
- untracked(extensions: Optional[List[str]] = None)¶
Returns a list of untracked files
- Parameters
extensions – List of targeted extensions. If None returns all untracked files, by default None
optional – List of targeted extensions. If None returns all untracked files, by default None
- Return type
List
ofuntracked fields