photon_mosaic.dataset_discovery#

Dataset discovery module.

This module provides functions to discover datasets using regex patterns. All filtering and transformations are handled through regex substitutions.

Functions

discover_datasets(base_path[, pattern, ...])

Discover datasets and their TIFF files in a directory using regex patterns.

photon_mosaic.dataset_discovery.discover_datasets(base_path, pattern='.*', exclude_patterns=None, substitutions=None, tiff_patterns=['*.tif'])[source]#

Discover datasets and their TIFF files in a directory using regex patterns.

Parameters:
  • base_path (str or Path) – Base path to search for datasets.

  • pattern (str, optional) – Regex pattern to match dataset names, defaults to “.*” (all directories).

  • exclude_patterns (List[str], optional) – List of regex patterns for datasets to exclude.

  • substitutions (List[Dict[str, str]], optional) – List of regex substitution pairs to transform dataset names. Each dict should have ‘pattern’ and ‘repl’ keys for re.sub().

  • tiff_patterns (list, optional) – List of glob patterns for TIFF files. Each pattern corresponds to a session. Defaults to [”*.tif”] for a single session.

Returns:

  • List of original dataset names (sorted)

  • List of transformed dataset names (sorted)

  • Dictionary mapping original dataset names to their TIFF files by session (session index as key)

  • List of all TIFF files found across all datasets

Return type:

Tuple[List[str], List[str], Dict[str, Dict[int, List[str]]], List[str]]

Notes

  • Datasets without any TIFF files are automatically excluded from the results

  • Both original and transformed dataset lists are sorted alphabetically

  • Sessions are numbered starting from 0 based on the order in tiff_patterns

  • Empty sessions (no files found) are included with empty lists