API Reference

This section provides an overview of the Simod API.

Usage

To use Simod in your Python code, import the main components:

from pathlib import Path

from simod.event_log.event_log import EventLog
from simod.settings.simod_settings import SimodSettings
from simod.simod import Simod

# Initialize 'output' folder and read configuration file
output = Path("<path>/<to>/<outputs>/<folder>")
configuration_path = Path("<path>/<to>/<configuration>.yml")
settings = SimodSettings.from_path(configuration_path)

# Read and preprocess event log
event_log = EventLog.from_path(
    log_ids=settings.common.log_ids,
    train_log_path=settings.common.train_log_path,
    test_log_path=settings.common.test_log_path,
    preprocessing_settings=settings.preprocessing,
    need_test_partition=settings.common.perform_final_evaluation,
)

# Instantiate and run SIMOD
simod = Simod(settings=settings, event_log=event_log, output_dir=output)
simod.run()

Modules Overview

Simod’s codebase is organized into several key modules:

  • simod: The main class that orchestrates the overall functionality.

  • settings: Handles the parsing and validation of configuration files.

  • event_log: Manages the IO operations of an event log as well as its preprocessing.

  • control_flow: Utilities to discover and manage the control-flow model of a BPS model.

  • resource_model: Utilities to discover and manage the resource model of a BPS model.

  • extraneous_delays: Utilities to discover and manage the extraneous delays model of a BPS model.

  • simulation: Manages the data model of a BPS model and its simulation and quality assessment.

Detailed Module Documentation

Below is the detailed documentation for each module:

SIMOD class

class simod.simod.Simod(settings: SimodSettings, event_log: EventLog, output_dir: Path | None = None)[source]

Class to run the full pipeline of SIMOD in order to discover a BPS model from an event log.

settings

Configuration to run SIMOD and all its stages.

Type:

SimodSettings

event_log

EventLog class storing the preprocessed training, validation, and (optionally) test partitions.

Type:

EventLog

output_dir

Path to the folder where to write all the SIMOD outputs.

Type:

Path

final_bps_model

Instance of the best BPS model discovered by SIMOD.

Type:

BPSModel

run(runtimes: RuntimeMeter | None = None)[source]

Executes the SIMOD pipeline to discover the BPS model that better reflects the behavior recorded in the input event log based on the specified configuration.

Parameters:

runtimes (RuntimeMeter, optional) – Instance for tracking the runtime of the different stages in the SIMOD pipeline. When provided, SIMOD pipeline stages will be tracked and reported along with stages previously tracked in the instance (e.g., preprocessing). If not provided, the runtime tracking reported will only contain SIMOD stages.

Returns:

The method performs in-place execution of the pipeline and does not return a value.

Return type:

None

Notes

  • This method generates all output files under the folder [output_dir]/<latest_run>/best_result/.

  • This method updates internal attributes of the class, such as final_bps_model, with the best BPS model found during the pipeline execution.

Settings Module

SIMOD settings

class simod.settings.simod_settings.SimodSettings(*, common: ~simod.settings.common_settings.CommonSettings = CommonSettings(train_log_path=PosixPath('default_path.csv'), log_ids=EventLogIDs(case='case_id', activity='activity', resource='resource', start_time='start_time', end_time='end_time', enabled_time='enabled_time', enabling_activity='enabling_activity', available_time='available_time', estimated_start_time='estimated_start_time', batch_id='batch_instance_id', batch_type='batch_instance_type'), test_log_path=None, process_model_path=None, perform_final_evaluation=False, num_final_evaluations=10, evaluation_metrics=[], use_observed_arrival_distribution=False, clean_intermediate_files=True, discover_data_attributes=False), preprocessing: ~simod.settings.preprocessing_settings.PreprocessingSettings = PreprocessingSettings(multitasking=False, enable_time_concurrency_threshold=0.5, concurrency_thresholds=ConcurrencyThresholds(df=0.75, l2l=0.9, l1l=0.9)), control_flow: ~simod.settings.control_flow_settings.ControlFlowSettings = ControlFlowSettings(optimization_metric=<Metric.THREE_GRAM_DISTANCE: 'three_gram_distance'>, num_iterations=10, num_evaluations_per_iteration=3, gateway_probabilities=<GatewayProbabilitiesDiscoveryMethod.DISCOVERY: 'discovery'>, mining_algorithm=<ProcessModelDiscoveryAlgorithm.SPLIT_MINER_V1: 'sm1'>, epsilon=(0.0, 1.0), eta=(0.0, 1.0), discover_branch_rules=False, f_score=0.7, replace_or_joins=False, prioritize_parallelism=False), resource_model: ~simod.settings.resource_model_settings.ResourceModelSettings = ResourceModelSettings(optimization_metric=<Metric.CIRCADIAN_EMD: 'circadian_event_distribution'>, num_iterations=10, num_evaluations_per_iteration=3, discovery_type=<CalendarType.UNDIFFERENTIATED: 'undifferentiated'>, granularity=(15, 60), confidence=(0.5, 0.85), support=(0.01, 0.3), participation=0.4, discover_prioritization_rules=False, discover_batching_rules=False, fuzzy_angle=(0.1, 0.9)), extraneous_activity_delays: ~simod.settings.extraneous_delays_settings.ExtraneousDelaysSettings | None = None, version: int = 5)[source]

SIMOD configuration v5 with the settings for all the stages and optimizations. If configuration is provided in v2 or v4, it is automatically translated to v5.

common

General configuration parameters of SIMOD and parameters common to all pipeline stages.

Type:

CommonSettings

preprocessing

Configuration parameters for the preprocessing stage of SIMOD.

Type:

PreprocessingSettings

control_flow

Configuration parameters for the control-flow model discovery stage.

Type:

ControlFlowSettings

resource_model

Configuration parameters for the resource model discovery stage.

Type:

ResourceModelSettings

extraneous_activity_delays

Configuration parameters for the extraneous delays model discovery stage. If not provided, the extraneous delays are not discovered.

Type:

ExtraneousDelaysSettings

version

SIMOD version.

Type:

int

static default() SimodSettings[source]

Default configuration for SIMOD.

Returns:

Instance of the SIMOD configuration with the default values.

Return type:

SimodSettings

static from_path(file_path: Path) SimodSettings[source]

Instantiates the SIMOD configuration from a YAML file.

Parameters:

file_path (Path) – Path to the YAML file storing the configuration.

Returns:

Instance of the SIMOD configuration for the specified YAML file.

Return type:

SimodSettings

static from_yaml(config: dict, config_dir: Path | None = None) SimodSettings[source]

Instantiates the SIMOD configuration from a dictionary following the expected YAML structure.

Parameters:
  • config (dict) – Dictionary with the configuration values for each of the SIMOD elements.

  • config_dir (Path, optional) – If the path to the event log(s) is specified in a relative manner, [config_dir] is used to complete such paths. If None, relative paths are complemented with the current directory.

Returns:

Instance of the SIMOD configuration for the specified dictionary values.

Return type:

SimodSettings

static one_shot() SimodSettings[source]

Configuration for SIMOD one-shot. This mode runs SIMOD without optimizing each BPS model component (i.e., directly discover each BPS model component with default parameters).

Returns:

Instance of the SIMOD configuration for one-shot mode.

Return type:

SimodSettings

to_dict() dict[source]

Translate the SIMOD configuration stored in this instance into a dictionary.

Returns:

Python dictionary storing this configuration.

Return type:

dict

to_yaml(output_dir: Path) Path[source]

Saves the configuration to a YAML file in the provided output directory.

Parameters:

output_dir (Path) – Path to the output directory where to store the YAML file with the configuration.

Returns:

Path to the YAML file with the configuration.

Return type:

Path

Common settings

class simod.settings.common_settings.CommonSettings(*, train_log_path: ~pathlib.Path = PosixPath('default_path.csv'), log_ids: ~pix_framework.io.event_log.EventLogIDs = EventLogIDs(case='case_id', activity='activity', resource='resource', start_time='start_time', end_time='end_time', enabled_time='enabled_time', enabling_activity='enabling_activity', available_time='available_time', estimated_start_time='estimated_start_time', batch_id='batch_instance_id', batch_type='batch_instance_type'), test_log_path: ~pathlib.Path | None = None, process_model_path: ~pathlib.Path | None = None, perform_final_evaluation: bool = False, num_final_evaluations: int = 10, evaluation_metrics: ~typing.List[~simod.settings.common_settings.Metric] = <factory>, use_observed_arrival_distribution: bool = False, clean_intermediate_files: bool = True, discover_data_attributes: bool = False)[source]

General configuration parameters of SIMOD and parameters common to all pipeline stages

train_log_path

Path to the training log (the one used to discover the BPS model).

Type:

Path

log_ids

Dataclass storing the mapping between the column names in the CSV and their role (case_id, activity, etc.).

Type:

EventLogIDs

test_log_path

Path to the event log to perform the final evaluation of the discovered BPS model (if desired).

Type:

Path, optional

process_model_path

Path to the BPMN model for the control-flow (skip its discovery and use this one).

Type:

Path, optional

perform_final_evaluation

Boolean indicating whether to perform the final evaluation of the discovered BPS model. If true, either use the event log in [test_log_path] if specified, or split the training log to obtain a testing set.

Type:

bool

num_final_evaluations

Number of replications of the final evaluation to perform.

Type:

int

evaluation_metrics

List of Metric evaluation metrics to use in the final evaluation.

Type:

list

use_observed_arrival_distribution

Boolean indicating whether to use the distribution of observed case arrival times (true), or to discover a probability distribution function to model them (false).

Type:

bool

clean_intermediate_files

Boolean indicating whether to delete all intermediate created files.

Type:

bool

discover_data_attributes

Boolean indicating whether to discover data attributes and their creation/update rules.

Type:

bool

static from_dict(config: dict, config_dir: Path | None = None) CommonSettings[source]

Instantiates the SIMOD common configuration from a dictionary.

Parameters:
  • config (dict) – Dictionary with the configuration values for the SIMOD common parameters.

  • config_dir (Path, optional) – If the path to the event log(s) is specified in a relative manner, [config_dir] is used to complete such paths. If None, relative paths are complemented with the current directory.

Returns:

Instance of the SIMOD common configuration for the specified dictionary values.

Return type:

CommonSettings

to_dict() dict[source]

Translate the common configuration stored in this instance into a dictionary.

Returns:

Python dictionary storing this configuration.

Return type:

dict

class simod.settings.common_settings.Metric(value)[source]

Enum class storing the metrics used to evaluate the quality of a BPS model.

DL

Control-flow Log Distance metric based in the Damerau-Levenshtein distance.

Type:

str

TWO_GRAM_DISTANCE

Two-gram distance metric.

Type:

str

THREE_GRAM_DISTANCE

Three-gram distance metric.

Type:

str

CIRCADIAN_EMD

Earth Mover’s Distance (EMD) for circadian event distribution.

Type:

str

CIRCADIAN_WORKFORCE_EMD

EMD for circadian workforce distribution.

Type:

str

ARRIVAL_EMD

EMD for arrival event distribution.

Type:

str

RELATIVE_EMD

EMD for relative event distribution.

Type:

str

ABSOLUTE_EMD

EMD for absolute event distribution.

Type:

str

CYCLE_TIME_EMD

EMD for cycle time distribution.

Type:

str

classmethod from_str(value: str | List[str]) Metric | List[Metric][source]

Converts a string (or list of strings) representing metric names into an instance (or list of instances) of the Metric enum.

Parameters:

value (Union[str, List[str]]) – A string representing a metric name or a list of metric names.

Returns:

An instance of Metric if a single string is provided, or a list of Metric instances if a list of strings is provided.

Return type:

Union[Metric, List[Metric]]

Raises:

ValueError – If the provided string does not match any metric name.

Preprocessing settings

class simod.settings.preprocessing_settings.PreprocessingSettings(*, multitasking: bool = False, enable_time_concurrency_threshold: float = 0.5, concurrency_thresholds: ConcurrencyThresholds = ConcurrencyThresholds(df=0.75, l2l=0.9, l1l=0.9))[source]

Configuration for event log preprocessing.

This class defines parameters used to preprocess event logs before SIMOD main pipeline, including concurrency threshold settings and multitasking options.

multitasking

Whether to preprocess the event log to handle resources working in more than one activity at a time.

Type:

bool

enable_time_concurrency_threshold

Threshold for determining concurrent events (for computing enabled) time based on the ratio of overlapping w.r.t. their occurrences. Ranges from 0 to 1 (0.3 means that two activities will be considered concurrent when their execution overlaps in 30% or more of the cases).

Type:

float

concurrency_thresholds

Thresholds for the computation of the start times (if missing) based on the Heuristics miner algorithm, including direct-follows (df), length-2-loops (l2l), and length-1-loops (l1l).

Type:

ConcurrencyThresholds

static from_dict(config: dict) PreprocessingSettings[source]

Instantiates SIMOD preprocessing configuration from a dictionary.

Parameters:

config (dict) – Dictionary with the configuration values for the preprocessing parameters.

Returns:

Instance of SIMOD preprocessing configuration for the specified dictionary values.

Return type:

PreprocessingSettings

to_dict() dict[source]

Translate the preprocessing configuration stored in this instance into a dictionary.

Returns:

Python dictionary storing this configuration.

Return type:

dict

Control-flow model settings

class simod.settings.control_flow_settings.ControlFlowSettings(*, optimization_metric: Metric = Metric.THREE_GRAM_DISTANCE, num_iterations: int = 10, num_evaluations_per_iteration: int = 3, gateway_probabilities: GatewayProbabilitiesDiscoveryMethod | List[GatewayProbabilitiesDiscoveryMethod] = GatewayProbabilitiesDiscoveryMethod.DISCOVERY, mining_algorithm: ProcessModelDiscoveryAlgorithm | None = ProcessModelDiscoveryAlgorithm.SPLIT_MINER_V1, epsilon: float | Tuple[float, float] | None = (0.0, 1.0), eta: float | Tuple[float, float] | None = (0.0, 1.0), discover_branch_rules: bool | None = False, f_score: float | Tuple[float, float] | None = 0.7, replace_or_joins: bool | List[bool] | None = False, prioritize_parallelism: bool | List[bool] | None = False)[source]

Control-flow model configuration parameters.

This class defines the ranges of the configurable parameters for optimizing the control-flow structure of a discovered process model, including metric selection, iteration settings, and various discovery algorithm parameters. In each iteration of the optimization process, the parameters are sampled from these values or ranges.

optimization_metric

The metric used to evaluate process model quality at each iteration of the optimization process (i.e., loss function).

Type:

Metric

num_iterations

The number of optimization iterations to perform.

Type:

int

num_evaluations_per_iteration

The number of replications for the evaluations of each iteration.

Type:

int

gateway_probabilities

Fixed method or list of methods to use in each iteration to discover gateway probabilities.

Type:

Union[GatewayProbabilitiesDiscoveryMethod, List[GatewayProbabilitiesDiscoveryMethod]]

mining_algorithm

The process model discovery algorithm to use.

Type:

ProcessModelDiscoveryAlgorithm, optional

epsilon

Fixed number or range for the number of concurrent relations between events to be captured in the discovery algorithm (between 0.0 and 1.0).

Type:

Union[float, Tuple[float, float]], optional

eta

Fixed number or range for the threshold for filtering the incoming and outgoing edges in the discovery algorithm (between 0.0 and 1.0).

Type:

Union[float, Tuple[float, float]], optional

replace_or_joins

Fixed value or list for whether to replace non-trivial OR joins.

Type:

Union[bool, List[bool]], optional

prioritize_parallelism

Fixed value or list for whether to prioritize parallelism over loops.

Type:

Union[bool, List[bool]], optional

discover_branch_rules

Whether to discover branch rules for gateways.

Type:

bool, optional

f_score

Fixed value or range for the minimum f-score value to consider the discovered data-aware branching rules.

Type:

Union[float, Tuple[float, float]], optional

static from_dict(config: dict) ControlFlowSettings[source]

Instantiates the control-flow model configuration from a dictionary.

Parameters:

config (dict) – Dictionary with the configuration values for the control-flow model parameters.

Returns:

Instance of the control-flow model configuration for the specified dictionary values.

Return type:

ControlFlowSettings

static one_shot() ControlFlowSettings[source]

Instantiates the control-flow model configuration for the one-shot mode (i.e., no optimization, one single iteration).

Returns:

Instance of the control-flow model configuration for the one-shot mode.

Return type:

ControlFlowSettings

to_dict() dict[source]

Translate the control-flow model configuration stored in this instance into a dictionary.

Returns:

Python dictionary storing this configuration.

Return type:

dict

class simod.settings.control_flow_settings.ProcessModelDiscoveryAlgorithm(value)[source]

Enumeration of process model discovery algorithms.

This enum defines the available algorithms for discovering process models from event logs.

SPLIT_MINER_V1

Represents the first version of the Split Miner algorithm (“sm1”).

Type:

str

SPLIT_MINER_V2

Represents the second version of the Split Miner algorithm (“sm2”).

Type:

str

classmethod from_str(value: str) ProcessModelDiscoveryAlgorithm[source]

Converts a string representation of a process model discovery algorithm into the corresponding ProcessModelDiscoveryAlgorithm instance.

This method allows flexible input formats for each algorithm, supporting multiple variations of their names.

Parameters:

value (str) – A string representing a process model discovery algorithm.

Returns:

The corresponding enum instance for the given algorithm name.

Return type:

ProcessModelDiscoveryAlgorithm

Raises:

ValueError – If the provided string does not match any known algorithm.

Resource model settings

class simod.settings.resource_model_settings.ResourceModelSettings(*, optimization_metric: Metric = Metric.CIRCADIAN_EMD, num_iterations: int = 10, num_evaluations_per_iteration: int = 3, discovery_type: CalendarType = CalendarType.UNDIFFERENTIATED, granularity: int | Tuple[int, int] | None = (15, 60), confidence: float | Tuple[float, float] | None = (0.5, 0.85), support: float | Tuple[float, float] | None = (0.01, 0.3), participation: float | Tuple[float, float] | None = 0.4, discover_prioritization_rules: bool = False, discover_batching_rules: bool = False, fuzzy_angle: float | Tuple[float, float] | None = (0.1, 0.9))[source]

Configuration settings for resource model optimization.

This class defines parameters for optimizing resource allocation and scheduling in process simulations, including optimization metrics, discovery methods, and statistical thresholds. In each iteration of the optimization process, the parameters are sampled from these values or ranges.

optimization_metric

The metric used to evaluate the quality of resource model optimization in each iteration (i.e., loss function).

Type:

Metric

num_iterations

The number of optimization iterations to perform.

Type:

int

num_evaluations_per_iteration

The number of replications for the evaluations of each iteration.

Type:

int

discovery_type

Type of calendar discovery method used for resource modeling.

Type:

CalendarType

granularity

Fixed value or range for the time granularity for calendar discovery, measured in minutes per granule (e.g., 60 will imply discovering resource calendars with slots of 1 hour). Must be divisible by 1,440 (number of minutes in a day).

Type:

Union[int, Tuple[int, int]], optional

confidence

Fixed value or range for the minimum confidence of the intervals in the discovered calendar of a resource or set of resources (between 0.0 and 1.0).

Type:

Union[float, Tuple[float, float]], optional

support

Fixed value or range for the minimum support of the intervals in the discovered calendar of a resource or set of resources (between 0.0 and 1.0).

Type:

Union[float, Tuple[float, float]], optional

participation

Fixed value or range for the participation of a resource in the process to discover a calendar for them, gathered together otherwise (between 0.0 and 1.0).

Type:

Union[float, Tuple[float, float]], optional

fuzzy_angle

Fixed value or range for the angle of the fuzzy trapezoid when computing the availability probability for an activity (angle from start to end).

Type:

Union[float, Tuple[float, float]], optional

discover_prioritization_rules

Whether to discover case prioritization rules.

Type:

bool

discover_batching_rules

Whether to discover batching rules for resource allocation.

Type:

bool

static from_dict(config: dict) ResourceModelSettings[source]

Instantiates the resource model configuration from a dictionary.

Parameters:

config (dict) – Dictionary with the configuration values for the resource model parameters.

Returns:

Instance of the resource model configuration for the specified dictionary values.

Return type:

ResourceModelSettings

static one_shot() ResourceModelSettings[source]

Instantiates the resource model configuration for the one-shot mode (i.e., no optimization, one single iteration).

Returns:

Instance of the resource model configuration for the one-shot mode.

Return type:

ResourceModelSettings

to_dict() dict[source]

Translate the resource model configuration stored in this instance into a dictionary.

Returns:

Python dictionary storing this configuration.

Return type:

dict

Extraneous delays settings

class simod.settings.extraneous_delays_settings.ExtraneousDelaysSettings(*, optimization_metric: OptimizationMetric = OptimizationMetric.RELATIVE_EMD, discovery_method: DiscoveryMethod = DiscoveryMethod.COMPLEX, num_iterations: int = 1, num_evaluations_per_iteration: int = 3)[source]

Configuration settings for extraneous delay optimization.

This class defines parameters for discovering and optimizing extraneous delays in process simulations, including optimization metrics, discovery methods, and iteration settings. In each iteration of the optimization process, the parameters are sampled from these values or ranges.

optimization_metric

The metric used to evaluate process model quality at each iteration of the optimization process (i.e., loss function).

Type:

ExtraneousDelaysOptimizationMetric

num_iterations

The number of optimization iterations to perform.

Type:

int

num_evaluations_per_iteration

The number of replications for the evaluations of each iteration.

Type:

int

discovery_method

The method used to discover extraneous delays.

Type:

ExtraneousDelaysDiscoveryMethod

static from_dict(config: dict) ExtraneousDelaysSettings[source]

Instantiates the extraneous delays model configuration from a dictionary.

Parameters:

config (dict) – Dictionary with the configuration values for the extraneous delays model parameters.

Returns:

Instance of the extraneous delays model configuration for the specified dictionary values.

Return type:

ExtraneousDelaysSettings

to_dict() dict[source]

Translate the extraneous delays model configuration stored in this instance into a dictionary.

Returns:

Python dictionary storing this configuration.

Return type:

dict

Event Log Module

class simod.event_log.event_log.EventLog(log_train: DataFrame, log_validation: DataFrame, log_train_validation: DataFrame, log_test: DataFrame, log_ids: EventLogIDs, process_name: str | None = None)[source]

Represents an event log containing process execution data and its partitioned subsets.

This class provides functionality for storing and managing an event log, including training, validation, and test partitions. It also supports exporting logs to XES format and loading event logs from files.

train_partition

DataFrame containing the training partition of the event log.

Type:

pandas.DataFrame

validation_partition

DataFrame containing the validation partition of the event log.

Type:

pandas.DataFrame

train_validation_partition

DataFrame containing both training and validation data.

Type:

pandas.DataFrame

test_partition

DataFrame containing the test partition of the event log, if available.

Type:

pandas.DataFrame

log_ids

Identifiers for mapping column names in the event log.

Type:

EventLogIDs

process_name

The name of the business process associated with the event log, primarily used for file naming.

Type:

str

static from_path(train_log_path: Path, log_ids: EventLogIDs, preprocessing_settings: PreprocessingSettings = PreprocessingSettings(multitasking=False, enable_time_concurrency_threshold=0.5, concurrency_thresholds=ConcurrencyThresholds(df=0.75, l2l=0.9, l1l=0.9)), need_test_partition: bool | None = False, process_name: str | None = None, test_log_path: Path | None = None, split_ratio: float = 0.8) EventLog[source]

Loads an event log from a file and performs partitioning into training, validation, and test subsets.

Parameters:
  • train_log_path (pathlib.Path) – Path to the training event log file (CSV or CSV.GZ).

  • log_ids (EventLogIDs) – Identifiers for mapping column names in the event log.

  • preprocessing_settings (PreprocessingSettings, optional) – Settings for preprocessing the event log.

  • need_test_partition (bool, optional) – Whether to create a test partition if a separate test log is not provided.

  • process_name (str, optional) – Name of the business process. If not provided, it is inferred from the file name.

  • test_log_path (pathlib.Path, optional) – Path to the test event log file (CSV or CSV.GZ). If provided, the test log is loaded separately.

  • split_ratio (float, default=0.8) – Ratio for splitting training and validation partitions.

Returns:

An instance of EventLog with training, validation, and test partitions.

Return type:

EventLog

Raises:

ValueError – If the specified training or test log has an unsupported file extension.

test_to_xes(path: Path, only_complete_events: bool = False)[source]

Saves the test log to an XES file.

Parameters:
  • path (pathlib.Path) – Destination path for the XES file.

  • only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.

train_to_xes(path: Path, only_complete_events: bool = False)[source]

Saves the training log to an XES file.

Parameters:
  • path (pathlib.Path) – Destination path for the XES file.

  • only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.

train_validation_to_xes(path: Path, only_complete_events: bool = False)[source]

Saves the combined training and validation log to an XES file.

Parameters:
  • path (pathlib.Path) – Destination path for the XES file.

  • only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.

validation_to_xes(path: Path, only_complete_events: bool = False)[source]

Saves the validation log to an XES file.

Parameters:
  • path (pathlib.Path) – Destination path for the XES file.

  • only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.

class simod.event_log.preprocessor.Preprocessor(log: DataFrame, log_ids: EventLogIDs)[source]

Handles event log pre-processing by executing various transformations to estimate missing timestamps and adjust data for multitasking.

This class modifies an input event log based on the specified settings and returns the pre-processed log.

log

The event log stored as a DataFrame.

Type:

pandas.DataFrame

log_ids

Identifiers for mapping column names in the event log.

Type:

EventLogIDs

run(multitasking: bool = False, concurrency_thresholds: ConcurrencyThresholds = ConcurrencyThresholds(df=0.9, l2l=0.9, l1l=0.9), enable_time_concurrency_threshold: float = 0.75) DataFrame[source]

Executes event log pre-processing steps based on the specified parameters.

This includes estimating missing start times, adjusting timestamps for multitasking scenarios, and computing enabled times.

Parameters:
  • multitasking (bool) – Whether to adjust the timestamps for multitasking.

  • concurrency_thresholds (ConcurrencyThresholds, optional) – Thresholds for the Heuristics Miner to estimate start times.

  • enable_time_concurrency_threshold (float) – Threshold for estimating enabled times.

Returns:

The pre-processed event log.

Return type:

pandas.DataFrame

Control-flow Model Module

class simod.control_flow.settings.HyperoptIterationParams(output_dir: Path, provided_model_path: Path | None, project_name: str, optimization_metric: Metric, gateway_probabilities_method: GatewayProbabilitiesDiscoveryMethod, mining_algorithm: ProcessModelDiscoveryAlgorithm, epsilon: float | None, eta: float | None, replace_or_joins: bool | None, prioritize_parallelism: bool | None, f_score: float | None = None)[source]

Parameters for a single iteration of the Control-Flow optimization process.

This class defines the configuration settings used during an iteration of the optimization process, including process model discovery, optimization metric, and gateway probability discovery.

output_dir

Directory where all output files for the current iteration will be stored.

Type:

pathlib.Path

provided_model_path

Path to a provided BPMN model, if available (no discovery needed).

Type:

pathlib.Path, optional

project_name

Name of the project, mainly used for file naming.

Type:

str

optimization_metric

Metric used to evaluate the candidate process model in this iteration.

Type:

Metric

gateway_probabilities_method

Method for discovering gateway probabilities.

Type:

GatewayProbabilitiesDiscoveryMethod

mining_algorithm

Algorithm used for process model discovery, if necessary.

Type:

ProcessModelDiscoveryAlgorithm

epsilon

Number of concurrent relations between events to be captured in the discovery algorithm (between 0.0 and 1.0).

Type:

float, optional

eta

Threshold for filtering the incoming and outgoing edges in the discovery algorithm (between 0.0 and 1.0).

Type:

float, optional

replace_or_joins

Whether to replace non-trivial OR joins in the discovered model.

Type:

bool, optional

prioritize_parallelism

Whether to prioritize parallelism or loops for model discovery.

Type:

bool, optional

f_score

Minimum f-score value to consider the discovered data-aware branching rules.

Type:

float], default=Non, optional

Notes

  • If provided_model_path is specified, process model discovery will be skipped.

to_dict() dict[source]

Converts the instance into a dictionary representation of the optimization parameters.

The returned dictionary is structured based on whether a process model needs to be discovered or if a pre-existing model is provided.

Returns:

A dictionary containing the optimization parameters for this iteration.

Return type:

dict

class simod.control_flow.optimizer.ControlFlowOptimizer(event_log: EventLog, bps_model: BPSModel, settings: ControlFlowSettings, base_directory: Path)[source]

Optimizes the control-flow of a business process model using hyperparameter optimization.

This class performs iterative optimization to refine the structure of a process model and discover optimal gateway probabilities. It evaluates different configurations to improve the process model based on a given metric.

The search space is built based on the parameters ranges in [settings].

event_log

Event log containing train and validation partitions.

Type:

EventLog

initial_bps_model

Business process simulation (BPS) model to use as a base, by replacing its control-flow model with the discovered one in each iteration.

Type:

BPSModel

settings

Configuration settings to build the search space for the optimization process.

Type:

ControlFlowSettings

base_directory

Root directory where output files will be stored.

Type:

pathlib.Path

best_bps_model

Best discovered BPS model after the optimization process.

Type:

BPSModel, optional

evaluation_measurements

Quality measures recorded for each hyperopt iteration.

Type:

pandas.DataFrame

Notes

  • If no process model is provided, a discovery method will be used.

  • Optimization is performed using TPE-hyperparameter optimization.

run() HyperoptIterationParams[source]

Runs the control-flow optimization process.

This method defines the hyperparameter search space and executes a TPE-hyperparameter optimization process to discover the best control-flow model. It evaluates multiple iterations and selects the best-performing set of parameters for its discovery.

Returns:

The parameters of the best iteration of the optimization process.

Return type:

HyperoptIterationParams

Raises:

AssertionError – If the best discovered process model path does not exist after optimization.

simod.control_flow.discovery.discover_process_model(log_path: Path, output_model_path: Path, params: HyperoptIterationParams)[source]

Runs the specified process model discovery algorithm to extract a process model from an event log and save it to the given output path.

This function supports Split Miner V1 and Split Miner V2 as discovery algorithms.

Parameters:
  • log_path (pathlib.Path) – Path to the event log in XES format, required for Split Miner algorithms.

  • output_model_path (pathlib.Path) – Path to save the discovered process model.

  • params (HyperoptIterationParams) – Configuration containing the process model discovery algorithm and its parameters.

Raises:

ValueError – If the specified process model discovery algorithm is unknown.

simod.control_flow.discovery.post_process_bpmn_self_loops(bpmn_model_path: Path)[source]

Resource Model Module

class simod.resource_model.settings.HyperoptIterationParams(output_dir: Path, process_model_path: Path, project_name: str, optimization_metric: Metric, calendar_discovery_params: CalendarDiscoveryParameters, discover_prioritization_rules: bool = False, discover_batching_rules: bool = False)[source]

Parameters for a single iteration of the Resource Model optimization process.

This class defines the necessary parameters for optimizing the resource model of the BPS model. It includes the parameter values for the discovery of resource profiles, calendars, etc.

output_dir

Directory where all files of the current iteration will be stored.

Type:

pathlib.Path

process_model_path

Path to the BPMN process model used for optimization.

Type:

pathlib.Path

project_name

Name of the project for file naming purposes.

Type:

str

optimization_metric

Metric used to evaluate the quality of the current iteration’s candidate.

Type:

Metric

calendar_discovery_params

Parameters for the resource calendar (i.e., working schedules) discovery.

Type:

CalendarDiscoveryParameters

discover_prioritization_rules

Whether to attempt discovering prioritization rules (default: False).

Type:

bool, optional

discover_batching_rules

Whether to attempt discovering batching rules (default: False).

Type:

bool, optional

to_dict() dict[source]

Converts the parameters of the current iteration into a dictionary format.

Returns:

A dictionary containing the iteration parameters.

Return type:

dict

class simod.resource_model.optimizer.ResourceModelOptimizer(event_log: EventLog, bps_model: BPSModel, settings: ResourceModelSettings, base_directory: Path, model_activities: list[str] | None = None)[source]

Optimizes the resource model of a business process model using hyperparameter optimization.

This class performs iterative optimization to refine the resource model and discover optimal resource profiles and availability calendars. It evaluates different configurations to improve the process model based on a given metric.

The search space is built based on the parameters ranges in [settings].

event_log

Event log containing train and validation partitions.

Type:

EventLog

initial_bps_model

Business process simulation (BPS) model to use as a base, by replacing its resource model with the discovered one in each iteration.

Type:

BPSModel

settings

Configuration settings to build the search space for the optimization process.

Type:

ResourceModelSettings

base_directory

Root directory where output files will be stored.

Type:

pathlib.Path

best_bps_model

Best discovered BPS model after the optimization process.

Type:

BPSModel, optional

evaluation_measurements

Quality measures recorded for each hyperopt iteration.

Type:

pandas.DataFrame

Notes

  • Optimization is performed using TPE-hyperparameter optimization.

run() HyperoptIterationParams[source]

Runs the resource model optimization process.

This method defines the hyperparameter search space and executes a TPE-hyperparameter optimization process to discover the best resource model. It evaluates multiple iterations and selects the best-performing set of parameters for its discovery.

Returns:

The parameters of the best iteration of the optimization process.

Return type:

HyperoptIterationParams

Extraneous Delays Model Module

class simod.extraneous_delays.optimizer.ExtraneousDelaysOptimizer(event_log: EventLog, bps_model: BPSModel, settings: ExtraneousDelaysSettings, base_directory: Path)[source]

Optimizer for the discovery of the extraneous delays model.

This class performs either a direct discovery of the extraneous delays of the process, or launches an iterative optimization that first discovers the extraneous delays and then adjusts their size to better reflect reality.

event_log

The event log containing the train and validation data.

Type:

EventLog

bps_model

The business process simulation model to enhance with extraneous delays, including the BPMN representation.

Type:

BPSModel

settings

Configuration settings for extraneous delay discovery.

Type:

ExtraneousDelaysSettings

base_directory

Directory where output files will be stored.

Type:

pathlib.Path

run() List[ExtraneousDelay][source]

Executes the extraneous delay discovery process.

This method configures the optimization process, applies either a direct enhancement or a hyperparameter optimization approach to identify delays, and returns the best detected delays as a list of ExtraneousDelay objects.

Returns:

A list of detected extraneous delays, each containing activity names, delay IDs, and their corresponding duration distributions.

Return type:

List[ExtraneousDelay]

class simod.extraneous_delays.types.ExtraneousDelay(activity_name: str, delay_id: str, duration_distribution: DurationDistribution)[source]

Represents an extraneous delay within a business process activity.

This class encapsulates the details of an identified extraneous delay, including the affected activity, a unique delay identifier, and the duration distribution of the delay.

activity_name

The name of the activity where the extraneous delay occurs.

Type:

str

delay_id

A unique identifier for the delay event.

Type:

str

duration_distribution

The statistical distribution representing the delay duration.

Type:

DurationDistribution

static from_dict(delay: dict) ExtraneousDelay[source]

Creates an ExtraneousDelay instance from a dictionary.

This method reconstructs an ExtraneousDelay object from a dictionary containing activity name, delay identifier, and duration distribution.

Parameters:

delay (dict) – A dictionary representation of an extraneous delay.

Returns:

An instance of ExtraneousDelay with the extracted attributes.

Return type:

ExtraneousDelay

to_dict() dict[source]

Converts the extraneous delay into a dictionary format.

The dictionary representation is compatible with the Prosimos simulation engine, containing activity details, a unique event identifier, and the delay duration distribution.

Returns:

A dictionary representation of the extraneous delay.

Return type:

dict

simod.extraneous_delays.utilities.add_timers_to_bpmn_model(process_model: Path, delays: List[ExtraneousDelay], timer_placement: TimerPlacement = TimerPlacement.BEFORE)[source]

Enhances a BPMN model by adding timers before or after specified activities.

This function modifies a given BPMN process model by inserting timers before or after activities that have identified extraneous delays.

Parameters:
  • process_model (pathlib.Path) – Path to the BPMN process model file to enhance.

  • delays (List[ExtraneousDelay]) – A list of extraneous delays, where each delay specifies an activity and the corresponding timer configuration.

  • timer_placement (TimerPlacement, optional) – Specifies whether the timers should be placed BEFORE (indicating the delay happens before an activity instance) or AFTER (indicating the delay happens afterward). Default is TimerPlacement.BEFORE.

Notes

  • This function modifies the BPMN file in place.

  • The method searches for tasks within the BPMN model and inserts timers based on the provided delays.

Raises:

ValueError – If the BPMN model file does not contain any tasks.

Simulation Module

class simod.simulation.parameters.BPS_model.BPSModel(process_model: Path | None = None, gateway_probabilities: List[GatewayProbabilities] | None = None, case_arrival_model: CaseArrivalModel | None = None, resource_model: ResourceModel | None = None, extraneous_delays: List[ExtraneousDelay] | None = None, case_attributes: List[CaseAttribute] | None = None, global_attributes: List[GlobalAttribute] | None = None, event_attributes: List[EventAttribute] | None = None, prioritization_rules: List[PrioritizationRule] | None = None, batching_rules: List[BatchingRule] | None = None, branch_rules: List[BranchRules] | None = None, calendar_granularity: int | None = None)[source]

Represents a Business Process Simulation (BPS) model containing all necessary components to simulate a business process.

This class manages various elements such as the BPMN process model, resource configurations, extraneous delays, case attributes, and prioritization/batching rules. It provides methods to convert the model into a format compatible with Prosimos and handle activity ID mappings.

process_model

Path to the BPMN process model file.

Type:

pathlib.Path, optional

gateway_probabilities

Probabilities for gateway-based process routing.

Type:

List[GatewayProbabilities], optional

case_arrival_model

Model for the arrival of new cases in the simulation.

Type:

CaseArrivalModel, optional

resource_model

Model for the resources involved in the process, their working schedules, etc.

Type:

ResourceModel, optional

extraneous_delays

A list of delays representing extraneous waiting times before/after activities.

Type:

List[ExtraneousDelay], optional

case_attributes

Case-level attributes and their update rules.

Type:

List[CaseAttribute], optional

global_attributes

Global attributes and their update rules.

Type:

List[GlobalAttribute], optional

event_attributes

Event-level attributes and their update rules.

Type:

List[EventAttribute], optional

prioritization_rules

A set of case prioritization rules for process execution.

Type:

List[PrioritizationRule], optional

batching_rules

Rules defining how activities are batched together.

Type:

List[BatchingRule], optional

branch_rules

Branching rules defining conditional flow behavior in decision points.

Type:

List[BranchRules], optional

calendar_granularity

Granularity of the resource calendar, expressed in minutes.

Type:

int, optional

Notes

  • to_prosimos_format transforms the model into a dictionary format used by Prosimos.

  • replace_activity_names_with_ids modifies activity references to use BPMN IDs instead of names.

deep_copy() BPSModel[source]

Creates a deep copy of the current BPSModel instance.

This ensures that modifying the copied instance does not affect the original.

Returns:

A new, independent copy of the current BPSModel instance.

Return type:

BPSModel

Notes

This method uses Python’s copy.deepcopy() to create a full recursive copy of the model.

replace_activity_names_with_ids()[source]

Replaces activity names with their corresponding IDs from the BPMN process model.

Prosimos requires activity references to be identified by their BPMN node IDs instead of activity labels. This method updates:

  • Resource associations in the resource profiles.

  • Activity-resource distributions.

  • Event attributes referencing activity names.

Raises:

KeyError – If an activity name does not exist in the BPMN model.

Notes

  • This method modifies the model in place.

  • It ensures compatibility with Prosimos by aligning activity references with BPMN IDs.

to_json(output_dir: Path, process_name: str) Path[source]

Saves the BPS model in a Prosimos-compatible JSON format.

This method generates a structured JSON file containing all necessary simulation parameters, ensuring that the model can be directly used by the Prosimos engine.

Parameters:
  • output_dir (pathlib.Path) – The directory where the JSON file should be saved.

  • process_name (str) – The name of the process, used for naming the output file.

Returns:

The full path to the generated JSON file.

Return type:

pathlib.Path

Notes

  • The JSON file is created in output_dir with a filename based on process_name.

  • Uses json.dump() to serialize the model into a structured format.

  • Ensures all attributes are converted into a valid Prosimos format before writing.

to_prosimos_format() dict[source]

Converts the BPS model into a dictionary format compatible with the Prosimos simulation engine.

This method extracts all relevant process simulation attributes, including resource models, delays, prioritization rules, and activity mappings, and structures them in a format understood by Prosimos.

Returns:

A dictionary representation of the BPS model, ready for simulation in Prosimos.

Return type:

dict

Notes

  • If the resource model contains a fuzzy calendar, the model type is set to “FUZZY”; otherwise, it defaults to “CRISP”.

  • The function ensures activity labels are properly linked to their respective BPMN IDs.

class simod.simulation.prosimos.ProsimosSettings(bpmn_path: Path, parameters_path: Path, output_log_path: Path, num_simulation_cases: int, simulation_start: Timestamp)[source]

Configuration settings for running a Prosimos simulation.

bpmn_path

Path to the BPMN process model.

Type:

pathlib.Path

parameters_path

Path to the Prosimos simulation parameters JSON file.

Type:

pathlib.Path

output_log_path

Path to store the generated simulation log.

Type:

pathlib.Path

num_simulation_cases

Number of cases to simulate.

Type:

int

simulation_start

Start timestamp for the simulation.

Type:

pandas.Timestamp

simod.simulation.prosimos.simulate(settings: ProsimosSettings)[source]

Runs a Prosimos simulation with the provided settings.

Parameters:

settings (ProsimosSettings) – Configuration settings containing paths and parameters for the simulation.

Notes

  • The function prints the simulation settings and invokes run_simulation().

  • The labels of the start event, end event, and event timers are**not** recorded to the output log.

  • The simulation generates a process log stored in settings.output_log_path.

simod.simulation.prosimos.simulate_and_evaluate(process_model_path: Path, parameters_path: Path, output_dir: Path, simulation_cases: int, simulation_start_time: Timestamp, validation_log: DataFrame, validation_log_ids: EventLogIDs, metrics: List[Metric], num_simulations: int = 1) List[dict][source]

Simulates a process model using Prosimos multiple times and evaluates the results.

This function runs the simulation num_simulations times in parallel, compares the generated logs with a validation log, and evaluates them using provided metrics.

Parameters:
  • process_model_path (pathlib.Path) – Path to the BPMN process model.

  • parameters_path (pathlib.Path) – Path to the Prosimos simulation parameters JSON file.

  • output_dir (pathlib.Path) – Directory where simulated logs will be stored.

  • simulation_cases (int) – Number of cases to simulate per run.

  • simulation_start_time (pandas.Timestamp) – Start timestamp for the simulation.

  • validation_log (pandas.DataFrame) – The actual event log to compare against.

  • validation_log_ids (EventLogIDs) – Column mappings for identifying events in the validation log.

  • metrics (List[Metric]) – A list of metrics used to evaluate the simulated logs.

  • num_simulations (int, optional) – Number of parallel simulation runs (default is 1).

Returns:

A list of evaluation results, one for each simulated log.

Return type:

List[dict]

Notes

  • Uses multiprocessing to speed up simulation when num_simulations > 1.

  • Simulated logs are automatically compared with validation_log.