create_dataset_splits

These scripts are not super tidy, but they only need to be reproducible.

_create_train_val_test_split_dict(paths: list[Path], group_ids: ndarray[int] | list[int] | None = None, split_percentages: tuple[float, float, float] = (0.8, 0.1, 0.1), processes: int = 1, all_test: bool = False)[source]

Create a train, val, test split from the dataset.

Parameters:
  • paths – List of paths to the files containing .zarr labels.

  • ids – List of ids to use for the split. If None, the ids are generated from the paths.

  • split_percentages – Percentages of the dataset to use for train, val and test.

  • processes – Number of processes to use for counting scf iterations.

Returns:

train, val, test split.

_iteration_count(path: Path) int[source]

Opens a zarr file and returns the scf iterations saved in it.

Parameters:

path – the path to the zarr file.

Returns:

The number of scf iteration

_train_val_test_split(ids: ndarray, split_percentages: tuple[float, float, float]) tuple[list, list, list][source]

Split the ids into train, val and test sets.

check_paths_and_save(yaml_dict: dict, yaml_path: Path, pickle_path: Path, override: bool = False)[source]

Check if split file already exists, if so, check if it is the same.

If it’s the same or override is False, return. else, save the new split file to the given paths.

create_split_file(dataset: str, yaml_path: str | Path | None, pickle_path: str | Path | None, override: bool, split_percentages: tuple[float, float, float], processes: int, all_test: bool = False)[source]

Create a split file for any dataset by reading the zarr files inside the labels directory.

get_path_list_for_split_file(paths: list[Path], iterations: ndarray) list[list][source]

Turn list of paths and iterations into required format of dataset, name, iterations.

split_non_grouped(ids: ndarray, split_percentages: tuple[float, float, float]) tuple[source]

Split ids into train, val and test sets based on the split percentages.