Basic demo#
This notebook gives a basic demonstration of how to use pydoit-nb.
Imports#
from collections.abc import Iterable
from pathlib import Path
from attrs import field, frozen
import pydoit_nb
import pydoit_nb.serialization
from pydoit_nb.attrs_helpers import make_attrs_validator_compatible_single_input
from pydoit_nb.config_handling import get_config_for_step_id
from pydoit_nb.config_helpers import assert_step_config_ids_are_unique
from pydoit_nb.notebook import ConfiguredNotebook, UnconfiguredNotebook
from pydoit_nb.notebook_step import UnconfiguredNotebookBasedStep
print(f"You are using pydoit_nb version {pydoit_nb.__version__}")
You are using pydoit_nb version 0.3.5a0
Notebooks#
Unconfigured#
The basic object is an UnconfiguredNotebook. This stores raw information about the notebook.
unconfigured = UnconfiguredNotebook(
notebook_path=Path("/to") / "somewhere",
raw_notebook_ext=".py",
summary="Great notebook that does something",
doc="More details can go here",
)
unconfigured
UnconfiguredNotebook(notebook_path=PosixPath('/to/somewhere'), raw_notebook_ext='.py', summary='Great notebook that does something', doc='More details can go here')
Configured#
To actually run the notebook, we have to configure it. You can just hard-code such configuration like the below.
configured = ConfiguredNotebook(
unconfigured_notebook=unconfigured,
dependencies=(Path("/to") / "some" / "file.txt",),
targets=(Path("/to") / "some" / "file" / "the" / "notebook" / "creates.csv",),
step_config_id="id-that-defines-which-step-in-the-workflow-this-notebook-belongs-to",
config_file=Path("/to") / "config" / "file" / "to" / "pass" / "into" / "the" / "notebook.yaml",
# For more on parameterising notebooks,
# see papermill's documentation: https://papermill.readthedocs.io/en/latest/index.html
)
configured
ConfiguredNotebook(unconfigured_notebook=UnconfiguredNotebook(notebook_path=PosixPath('/to/somewhere'), raw_notebook_ext='.py', summary='Great notebook that does something', doc='More details can go here'), dependencies=(PosixPath('/to/some/file.txt'),), targets=(PosixPath('/to/some/file/the/notebook/creates.csv'),), config_file=PosixPath('/to/config/file/to/pass/into/the/notebook.yaml'), step_config_id='id-that-defines-which-step-in-the-workflow-this-notebook-belongs-to', configuration=None)
However, normally this configuration of dependencies, targetes etc.
is done at runtime because many details can only be completed at runtime
(e.g. where we actually want to write the outputs
usually depends on the exact environment we’re running in).
As a result, we tend to use UnconfiguredNotebookBasedStep instead.
Defining a notebook-based step#
To make the flow from UnconfiguredNotebook to ConfiguredNotebook a bit clearer and more flexible,
we generally use UnconfiguredNotebookBasedStep objects as part of our workflow.
These allow us to define sets of notebooks which make up different steps and to configure them at run time.
The example below shows how you can go from a UnconfiguredNotebook
and a configuration function and get out a task specification
which can be understood by pydoit.
The first step in this process is defining your configuration.
At its simplest, you need to be able to create a configuration object
and a configuration bundle which looks like the below (for more information,
see pydoit_nb.typing.ConfigBundleLike).
This is then used for configuring notebooks
and keeping the configuration for a run all in one place.
As you can see, it is extremely flexible and project specific
which is why we don’t provide generalised tooling for it.
@frozen # using frozen makes the class hashable, which is handy
class PlotConfig:
"""
Each step will have its own configuration
For example, this would define the configuration for
the plotting step.
"""
step_config_id: str
"""
An ID which defines this configuration for the step, unique within the workflow
"""
colourscheme: str
"""
Colourscheme
This is just an example of how configuration can be stored
"""
@frozen
class Config:
"""
Configuration
This is passed to all the notebooks. It can contain
whatever you want.
"""
plot: list[PlotConfig] = field(
validator=[
# This validator can help you avoid confusing clashes
# which are hard to debug later
make_attrs_validator_compatible_single_input(assert_step_config_ids_are_unique)
]
)
"""Configurations to use with the plotting step"""
@frozen
class ConfigBundle:
"""
Configuration bundle
Has all key components in one place
"""
run_id: str
"""ID for the run"""
config_hydrated: Config
"""Hydrated config"""
config_hydrated_path: Path
"""Path in/from which to read/write ``config_hydrated``"""
root_dir_output_run: Path
"""Root output directory for this run"""
The next step is to define your function that will configure the notebooks. Defining it this way gives you full flexibility to do whatever you want to config the notebooks as you want them.
def configure_notebooks(
unconfigured_notebooks: Iterable[UnconfiguredNotebook],
config_bundle: ConfigBundle,
step_name: str,
step_config_id: str,
) -> list[ConfiguredNotebook]:
"""
Configure the notebooks based on runtime information
Parameters
----------
unconfigured_notebooks
Unconfigured notebooks
config_bundle
Configuration bundle from which to take configuration values
step_name
Name of the step
step_config_id
Step config ID to use when configuring the notebook
Returns
-------
Configured notebooks
"""
uc_nbs_dict = {nb.notebook_path: nb for nb in unconfigured_notebooks}
config = config_bundle.config_hydrated
config_step = get_config_for_step_id(config=config, step=step_name, step_config_id=step_config_id)
configured_notebooks = [
ConfiguredNotebook(
unconfigured_notebook=uc_nbs_dict[Path("/to") / "somewhere"],
# Dependencies and targets can come from config, other functions,
# whatever.
dependencies=(),
targets=(),
configuration=(config_step.colourscheme,),
config_file=config_bundle.config_hydrated_path,
step_config_id=step_config_id,
),
]
return configured_notebooks
The last step is to put it altogether in a UnconfiguredNotebookBasedStep.
unconfigured_step = UnconfiguredNotebookBasedStep(
step_name="plot",
unconfigured_notebooks=[unconfigured],
configure_notebooks=configure_notebooks,
)
This unconfigured step is now ready to be configured at run time.
An example of this is below.
config_hydrated = Config(plot=[PlotConfig(step_config_id="blue", colourscheme="tab:blue")])
# This is normally hydrated based on run-time variables.
# Providing a complete working project is in our to-do list.
# In the meantime, see `tests/test-data/example-project`.
config_bundle = ConfigBundle(
config_hydrated=config_hydrated,
run_id="notebook_example",
root_dir_output_run=Path("/to") / "output" / "directory",
config_hydrated_path=Path("/to")
/ "output"
/ "directory"
/ "location-in-which-to-write-config-for-the-run.yaml",
)
notebook_tasks_generator = unconfigured_step.gen_notebook_tasks(
config_bundle=config_bundle,
root_dir_raw_notebooks=Path("notebooks"),
converter=pydoit_nb.serialization.converter_yaml,
# a pre-configured option, you can obviously make your own too
)
notebook_tasks_generator
<generator object UnconfiguredNotebookBasedStep.gen_notebook_tasks at 0x7f28b2b30040>
The result is a generator, so we have to empty it to see what it actually does.
for task in notebook_tasks_generator:
print(f"Task base name: {task['basename']}")
print(f"Task name: {task['name']}")
print(task)
print()
Task base name: (/to/somewhere) Great notebook that does something
Task name: None
{'basename': '(/to/somewhere) Great notebook that does something', 'name': None, 'doc': 'More details can go here'}
Task base name: (/to/somewhere) Great notebook that does something
Task name: blue
{'basename': '(/to/somewhere) Great notebook that does something', 'name': 'blue', 'doc': "More details can go here. step_config_id='blue'", 'actions': [(<function run_notebook at 0x7f28b2e3c860>, [], {'raw_notebook': PosixPath('/to/somewhere.py'), 'unexecuted_notebook': PosixPath('/to/output/directory/notebooks-executed/plot/blue/somewhere_unexecuted.ipynb'), 'executed_notebook': PosixPath('/to/output/directory/notebooks-executed/plot/blue/somewhere.ipynb'), 'notebook_parameters': {'config_file': '/to/output/directory/location-in-which-to-write-config-for-the-run.yaml', 'step_config_id': 'blue'}})], 'targets': (), 'file_dep': [PosixPath('/to/somewhere.py')], 'clean': True, 'uptodate': (<doit.tools.config_changed object at 0x7f28dc44ab50>,)}
The return value is a list of tasks which can be passed straight to pydoit.