API Reference
This section provides an overview of the Simod API.
Usage
To use Simod in your Python code, import the main components:
from pathlib import Path
from simod.event_log.event_log import EventLog
from simod.settings.simod_settings import SimodSettings
from simod.simod import Simod
# Initialize 'output' folder and read configuration file
output = Path("<path>/<to>/<outputs>/<folder>")
configuration_path = Path("<path>/<to>/<configuration>.yml")
settings = SimodSettings.from_path(configuration_path)
# Read and preprocess event log
event_log = EventLog.from_path(
log_ids=settings.common.log_ids,
train_log_path=settings.common.train_log_path,
test_log_path=settings.common.test_log_path,
preprocessing_settings=settings.preprocessing,
need_test_partition=settings.common.perform_final_evaluation,
)
# Instantiate and run SIMOD
simod = Simod(settings=settings, event_log=event_log, output_dir=output)
simod.run()
Modules Overview
Simod’s codebase is organized into several key modules:
simod: The main class that orchestrates the overall functionality.
settings: Handles the parsing and validation of configuration files.
event_log: Manages the IO operations of an event log as well as its preprocessing.
control_flow: Utilities to discover and manage the control-flow model of a BPS model.
resource_model: Utilities to discover and manage the resource model of a BPS model.
extraneous_delays: Utilities to discover and manage the extraneous delays model of a BPS model.
simulation: Manages the data model of a BPS model and its simulation and quality assessment.
Detailed Module Documentation
Below is the detailed documentation for each module:
SIMOD class
- class simod.simod.Simod(settings: SimodSettings, event_log: EventLog, output_dir: Path | None = None)[source]
Class to run the full pipeline of SIMOD in order to discover a BPS model from an event log.
- settings
Configuration to run SIMOD and all its stages.
- Type:
- event_log
EventLog class storing the preprocessed training, validation, and (optionally) test partitions.
- Type:
- run(runtimes: RuntimeMeter | None = None)[source]
Executes the SIMOD pipeline to discover the BPS model that better reflects the behavior recorded in the input event log based on the specified configuration.
- Parameters:
runtimes (
RuntimeMeter, optional) – Instance for tracking the runtime of the different stages in the SIMOD pipeline. When provided, SIMOD pipeline stages will be tracked and reported along with stages previously tracked in the instance (e.g., preprocessing). If not provided, the runtime tracking reported will only contain SIMOD stages.- Returns:
The method performs in-place execution of the pipeline and does not return a value.
- Return type:
None
Notes
This method generates all output files under the folder
[output_dir]/<latest_run>/best_result/.This method updates internal attributes of the class, such as final_bps_model, with the best BPS model found during the pipeline execution.
Settings Module
SIMOD settings
- class simod.settings.simod_settings.SimodSettings(*, common: ~simod.settings.common_settings.CommonSettings = CommonSettings(train_log_path=PosixPath('default_path.csv'), log_ids=EventLogIDs(case='case_id', activity='activity', resource='resource', start_time='start_time', end_time='end_time', enabled_time='enabled_time', enabling_activity='enabling_activity', available_time='available_time', estimated_start_time='estimated_start_time', batch_id='batch_instance_id', batch_type='batch_instance_type'), test_log_path=None, process_model_path=None, perform_final_evaluation=False, num_final_evaluations=10, evaluation_metrics=[], use_observed_arrival_distribution=False, clean_intermediate_files=True, discover_data_attributes=False), preprocessing: ~simod.settings.preprocessing_settings.PreprocessingSettings = PreprocessingSettings(multitasking=False, enable_time_concurrency_threshold=0.5, concurrency_thresholds=ConcurrencyThresholds(df=0.75, l2l=0.9, l1l=0.9)), control_flow: ~simod.settings.control_flow_settings.ControlFlowSettings = ControlFlowSettings(optimization_metric=<Metric.THREE_GRAM_DISTANCE: 'three_gram_distance'>, num_iterations=10, num_evaluations_per_iteration=3, gateway_probabilities=<GatewayProbabilitiesDiscoveryMethod.DISCOVERY: 'discovery'>, mining_algorithm=<ProcessModelDiscoveryAlgorithm.SPLIT_MINER_V1: 'sm1'>, epsilon=(0.0, 1.0), eta=(0.0, 1.0), discover_branch_rules=False, f_score=0.7, replace_or_joins=False, prioritize_parallelism=False), resource_model: ~simod.settings.resource_model_settings.ResourceModelSettings = ResourceModelSettings(optimization_metric=<Metric.CIRCADIAN_EMD: 'circadian_event_distribution'>, num_iterations=10, num_evaluations_per_iteration=3, discovery_type=<CalendarType.UNDIFFERENTIATED: 'undifferentiated'>, granularity=(15, 60), confidence=(0.5, 0.85), support=(0.01, 0.3), participation=0.4, discover_prioritization_rules=False, discover_batching_rules=False, fuzzy_angle=(0.1, 0.9)), extraneous_activity_delays: ~simod.settings.extraneous_delays_settings.ExtraneousDelaysSettings | None = None, version: int = 5)[source]
SIMOD configuration v5 with the settings for all the stages and optimizations. If configuration is provided in v2 or v4, it is automatically translated to v5.
- common
General configuration parameters of SIMOD and parameters common to all pipeline stages.
- Type:
- preprocessing
Configuration parameters for the preprocessing stage of SIMOD.
- Type:
- control_flow
Configuration parameters for the control-flow model discovery stage.
- Type:
- resource_model
Configuration parameters for the resource model discovery stage.
- Type:
- extraneous_activity_delays
Configuration parameters for the extraneous delays model discovery stage. If not provided, the extraneous delays are not discovered.
- Type:
- static default() SimodSettings[source]
Default configuration for SIMOD.
- Returns:
Instance of the SIMOD configuration with the default values.
- Return type:
- static from_path(file_path: Path) SimodSettings[source]
Instantiates the SIMOD configuration from a YAML file.
- Parameters:
file_path (
Path) – Path to the YAML file storing the configuration.- Returns:
Instance of the SIMOD configuration for the specified YAML file.
- Return type:
- static from_yaml(config: dict, config_dir: Path | None = None) SimodSettings[source]
Instantiates the SIMOD configuration from a dictionary following the expected YAML structure.
- Parameters:
- Returns:
Instance of the SIMOD configuration for the specified dictionary values.
- Return type:
- static one_shot() SimodSettings[source]
Configuration for SIMOD one-shot. This mode runs SIMOD without optimizing each BPS model component (i.e., directly discover each BPS model component with default parameters).
- Returns:
Instance of the SIMOD configuration for one-shot mode.
- Return type:
Common settings
- class simod.settings.common_settings.CommonSettings(*, train_log_path: ~pathlib.Path = PosixPath('default_path.csv'), log_ids: ~pix_framework.io.event_log.EventLogIDs = EventLogIDs(case='case_id', activity='activity', resource='resource', start_time='start_time', end_time='end_time', enabled_time='enabled_time', enabling_activity='enabling_activity', available_time='available_time', estimated_start_time='estimated_start_time', batch_id='batch_instance_id', batch_type='batch_instance_type'), test_log_path: ~pathlib.Path | None = None, process_model_path: ~pathlib.Path | None = None, perform_final_evaluation: bool = False, num_final_evaluations: int = 10, evaluation_metrics: ~typing.List[~simod.settings.common_settings.Metric] = <factory>, use_observed_arrival_distribution: bool = False, clean_intermediate_files: bool = True, discover_data_attributes: bool = False)[source]
General configuration parameters of SIMOD and parameters common to all pipeline stages
- log_ids
Dataclass storing the mapping between the column names in the CSV and their role (case_id, activity, etc.).
- Type:
EventLogIDs
- test_log_path
Path to the event log to perform the final evaluation of the discovered BPS model (if desired).
- Type:
Path, optional
- process_model_path
Path to the BPMN model for the control-flow (skip its discovery and use this one).
- Type:
Path, optional
- perform_final_evaluation
Boolean indicating whether to perform the final evaluation of the discovered BPS model. If true, either use the event log in [test_log_path] if specified, or split the training log to obtain a testing set.
- Type:
- use_observed_arrival_distribution
Boolean indicating whether to use the distribution of observed case arrival times (true), or to discover a probability distribution function to model them (false).
- Type:
- clean_intermediate_files
Boolean indicating whether to delete all intermediate created files.
- Type:
- discover_data_attributes
Boolean indicating whether to discover data attributes and their creation/update rules.
- Type:
- static from_dict(config: dict, config_dir: Path | None = None) CommonSettings[source]
Instantiates the SIMOD common configuration from a dictionary.
- Parameters:
config (dict) – Dictionary with the configuration values for the SIMOD common parameters.
config_dir (
Path, optional) – If the path to the event log(s) is specified in a relative manner,[config_dir]is used to complete such paths. IfNone, relative paths are complemented with the current directory.
- Returns:
Instance of the SIMOD common configuration for the specified dictionary values.
- Return type:
- class simod.settings.common_settings.Metric(value)[source]
Enum class storing the metrics used to evaluate the quality of a BPS model.
- classmethod from_str(value: str | List[str]) Metric | List[Metric][source]
Converts a string (or list of strings) representing metric names into an instance (or list of instances) of the
Metricenum.- Parameters:
value (Union[str, List[str]]) – A string representing a metric name or a list of metric names.
- Returns:
An instance of
Metricif a single string is provided, or a list ofMetricinstances if a list of strings is provided.- Return type:
- Raises:
ValueError – If the provided string does not match any metric name.
Preprocessing settings
- class simod.settings.preprocessing_settings.PreprocessingSettings(*, multitasking: bool = False, enable_time_concurrency_threshold: float = 0.5, concurrency_thresholds: ConcurrencyThresholds = ConcurrencyThresholds(df=0.75, l2l=0.9, l1l=0.9))[source]
Configuration for event log preprocessing.
This class defines parameters used to preprocess event logs before SIMOD main pipeline, including concurrency threshold settings and multitasking options.
- multitasking
Whether to preprocess the event log to handle resources working in more than one activity at a time.
- Type:
- enable_time_concurrency_threshold
Threshold for determining concurrent events (for computing enabled) time based on the ratio of overlapping w.r.t. their occurrences. Ranges from 0 to 1 (0.3 means that two activities will be considered concurrent when their execution overlaps in 30% or more of the cases).
- Type:
- concurrency_thresholds
Thresholds for the computation of the start times (if missing) based on the Heuristics miner algorithm, including direct-follows (df), length-2-loops (l2l), and length-1-loops (l1l).
- Type:
ConcurrencyThresholds
- static from_dict(config: dict) PreprocessingSettings[source]
Instantiates SIMOD preprocessing configuration from a dictionary.
- Parameters:
config (dict) – Dictionary with the configuration values for the preprocessing parameters.
- Returns:
Instance of SIMOD preprocessing configuration for the specified dictionary values.
- Return type:
Control-flow model settings
- class simod.settings.control_flow_settings.ControlFlowSettings(*, optimization_metric: Metric = Metric.THREE_GRAM_DISTANCE, num_iterations: int = 10, num_evaluations_per_iteration: int = 3, gateway_probabilities: GatewayProbabilitiesDiscoveryMethod | List[GatewayProbabilitiesDiscoveryMethod] = GatewayProbabilitiesDiscoveryMethod.DISCOVERY, mining_algorithm: ProcessModelDiscoveryAlgorithm | None = ProcessModelDiscoveryAlgorithm.SPLIT_MINER_V1, epsilon: float | Tuple[float, float] | None = (0.0, 1.0), eta: float | Tuple[float, float] | None = (0.0, 1.0), discover_branch_rules: bool | None = False, f_score: float | Tuple[float, float] | None = 0.7, replace_or_joins: bool | List[bool] | None = False, prioritize_parallelism: bool | List[bool] | None = False)[source]
Control-flow model configuration parameters.
This class defines the ranges of the configurable parameters for optimizing the control-flow structure of a discovered process model, including metric selection, iteration settings, and various discovery algorithm parameters. In each iteration of the optimization process, the parameters are sampled from these values or ranges.
- optimization_metric
The metric used to evaluate process model quality at each iteration of the optimization process (i.e., loss function).
- Type:
- num_evaluations_per_iteration
The number of replications for the evaluations of each iteration.
- Type:
- gateway_probabilities
Fixed method or list of methods to use in each iteration to discover gateway probabilities.
- Type:
Union[
GatewayProbabilitiesDiscoveryMethod, List[GatewayProbabilitiesDiscoveryMethod]]
- mining_algorithm
The process model discovery algorithm to use.
- Type:
ProcessModelDiscoveryAlgorithm, optional
- epsilon
Fixed number or range for the number of concurrent relations between events to be captured in the discovery algorithm (between 0.0 and 1.0).
- eta
Fixed number or range for the threshold for filtering the incoming and outgoing edges in the discovery algorithm (between 0.0 and 1.0).
- replace_or_joins
Fixed value or list for whether to replace non-trivial OR joins.
- prioritize_parallelism
Fixed value or list for whether to prioritize parallelism over loops.
- f_score
Fixed value or range for the minimum f-score value to consider the discovered data-aware branching rules.
- static from_dict(config: dict) ControlFlowSettings[source]
Instantiates the control-flow model configuration from a dictionary.
- Parameters:
config (dict) – Dictionary with the configuration values for the control-flow model parameters.
- Returns:
Instance of the control-flow model configuration for the specified dictionary values.
- Return type:
- static one_shot() ControlFlowSettings[source]
Instantiates the control-flow model configuration for the one-shot mode (i.e., no optimization, one single iteration).
- Returns:
Instance of the control-flow model configuration for the one-shot mode.
- Return type:
- class simod.settings.control_flow_settings.ProcessModelDiscoveryAlgorithm(value)[source]
Enumeration of process model discovery algorithms.
This enum defines the available algorithms for discovering process models from event logs.
- classmethod from_str(value: str) ProcessModelDiscoveryAlgorithm[source]
Converts a string representation of a process model discovery algorithm into the corresponding
ProcessModelDiscoveryAlgorithminstance.This method allows flexible input formats for each algorithm, supporting multiple variations of their names.
- Parameters:
value (str) – A string representing a process model discovery algorithm.
- Returns:
The corresponding enum instance for the given algorithm name.
- Return type:
- Raises:
ValueError – If the provided string does not match any known algorithm.
Resource model settings
- class simod.settings.resource_model_settings.ResourceModelSettings(*, optimization_metric: Metric = Metric.CIRCADIAN_EMD, num_iterations: int = 10, num_evaluations_per_iteration: int = 3, discovery_type: CalendarType = CalendarType.UNDIFFERENTIATED, granularity: int | Tuple[int, int] | None = (15, 60), confidence: float | Tuple[float, float] | None = (0.5, 0.85), support: float | Tuple[float, float] | None = (0.01, 0.3), participation: float | Tuple[float, float] | None = 0.4, discover_prioritization_rules: bool = False, discover_batching_rules: bool = False, fuzzy_angle: float | Tuple[float, float] | None = (0.1, 0.9))[source]
Configuration settings for resource model optimization.
This class defines parameters for optimizing resource allocation and scheduling in process simulations, including optimization metrics, discovery methods, and statistical thresholds. In each iteration of the optimization process, the parameters are sampled from these values or ranges.
- optimization_metric
The metric used to evaluate the quality of resource model optimization in each iteration (i.e., loss function).
- Type:
Metric
- num_evaluations_per_iteration
The number of replications for the evaluations of each iteration.
- Type:
- discovery_type
Type of calendar discovery method used for resource modeling.
- Type:
CalendarType
- granularity
Fixed value or range for the time granularity for calendar discovery, measured in minutes per granule (e.g., 60 will imply discovering resource calendars with slots of 1 hour). Must be divisible by 1,440 (number of minutes in a day).
- confidence
Fixed value or range for the minimum confidence of the intervals in the discovered calendar of a resource or set of resources (between 0.0 and 1.0).
- support
Fixed value or range for the minimum support of the intervals in the discovered calendar of a resource or set of resources (between 0.0 and 1.0).
- participation
Fixed value or range for the participation of a resource in the process to discover a calendar for them, gathered together otherwise (between 0.0 and 1.0).
- fuzzy_angle
Fixed value or range for the angle of the fuzzy trapezoid when computing the availability probability for an activity (angle from start to end).
- static from_dict(config: dict) ResourceModelSettings[source]
Instantiates the resource model configuration from a dictionary.
- Parameters:
config (dict) – Dictionary with the configuration values for the resource model parameters.
- Returns:
Instance of the resource model configuration for the specified dictionary values.
- Return type:
- static one_shot() ResourceModelSettings[source]
Instantiates the resource model configuration for the one-shot mode (i.e., no optimization, one single iteration).
- Returns:
Instance of the resource model configuration for the one-shot mode.
- Return type:
Extraneous delays settings
- class simod.settings.extraneous_delays_settings.ExtraneousDelaysSettings(*, optimization_metric: OptimizationMetric = OptimizationMetric.RELATIVE_EMD, discovery_method: DiscoveryMethod = DiscoveryMethod.COMPLEX, num_iterations: int = 1, num_evaluations_per_iteration: int = 3)[source]
Configuration settings for extraneous delay optimization.
This class defines parameters for discovering and optimizing extraneous delays in process simulations, including optimization metrics, discovery methods, and iteration settings. In each iteration of the optimization process, the parameters are sampled from these values or ranges.
- optimization_metric
The metric used to evaluate process model quality at each iteration of the optimization process (i.e., loss function).
- Type:
ExtraneousDelaysOptimizationMetric
- num_evaluations_per_iteration
The number of replications for the evaluations of each iteration.
- Type:
- discovery_method
The method used to discover extraneous delays.
- Type:
ExtraneousDelaysDiscoveryMethod
- static from_dict(config: dict) ExtraneousDelaysSettings[source]
Instantiates the extraneous delays model configuration from a dictionary.
- Parameters:
config (dict) – Dictionary with the configuration values for the extraneous delays model parameters.
- Returns:
Instance of the extraneous delays model configuration for the specified dictionary values.
- Return type:
Event Log Module
- class simod.event_log.event_log.EventLog(log_train: DataFrame, log_validation: DataFrame, log_train_validation: DataFrame, log_test: DataFrame, log_ids: EventLogIDs, process_name: str | None = None)[source]
Represents an event log containing process execution data and its partitioned subsets.
This class provides functionality for storing and managing an event log, including training, validation, and test partitions. It also supports exporting logs to XES format and loading event logs from files.
- train_partition
DataFrame containing the training partition of the event log.
- Type:
- validation_partition
DataFrame containing the validation partition of the event log.
- Type:
- train_validation_partition
DataFrame containing both training and validation data.
- Type:
- test_partition
DataFrame containing the test partition of the event log, if available.
- Type:
- log_ids
Identifiers for mapping column names in the event log.
- Type:
EventLogIDs
- process_name
The name of the business process associated with the event log, primarily used for file naming.
- Type:
- static from_path(train_log_path: Path, log_ids: EventLogIDs, preprocessing_settings: PreprocessingSettings = PreprocessingSettings(multitasking=False, enable_time_concurrency_threshold=0.5, concurrency_thresholds=ConcurrencyThresholds(df=0.75, l2l=0.9, l1l=0.9)), need_test_partition: bool | None = False, process_name: str | None = None, test_log_path: Path | None = None, split_ratio: float = 0.8) EventLog[source]
Loads an event log from a file and performs partitioning into training, validation, and test subsets.
- Parameters:
train_log_path (
pathlib.Path) – Path to the training event log file (CSV or CSV.GZ).log_ids (
EventLogIDs) – Identifiers for mapping column names in the event log.preprocessing_settings (
PreprocessingSettings, optional) – Settings for preprocessing the event log.need_test_partition (bool, optional) – Whether to create a test partition if a separate test log is not provided.
process_name (str, optional) – Name of the business process. If not provided, it is inferred from the file name.
test_log_path (
pathlib.Path, optional) – Path to the test event log file (CSV or CSV.GZ). If provided, the test log is loaded separately.split_ratio (float, default=0.8) – Ratio for splitting training and validation partitions.
- Returns:
An instance of
EventLogwith training, validation, and test partitions.- Return type:
- Raises:
ValueError – If the specified training or test log has an unsupported file extension.
- test_to_xes(path: Path, only_complete_events: bool = False)[source]
Saves the test log to an XES file.
- Parameters:
path (
pathlib.Path) – Destination path for the XES file.only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.
- train_to_xes(path: Path, only_complete_events: bool = False)[source]
Saves the training log to an XES file.
- Parameters:
path (
pathlib.Path) – Destination path for the XES file.only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.
- train_validation_to_xes(path: Path, only_complete_events: bool = False)[source]
Saves the combined training and validation log to an XES file.
- Parameters:
path (
pathlib.Path) – Destination path for the XES file.only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.
- validation_to_xes(path: Path, only_complete_events: bool = False)[source]
Saves the validation log to an XES file.
- Parameters:
path (
pathlib.Path) – Destination path for the XES file.only_complete_events (bool) – If true, generate XES file containing only events corresponding to the end of each activity instance.
- class simod.event_log.preprocessor.Preprocessor(log: DataFrame, log_ids: EventLogIDs)[source]
Handles event log pre-processing by executing various transformations to estimate missing timestamps and adjust data for multitasking.
This class modifies an input event log based on the specified settings and returns the pre-processed log.
- log
The event log stored as a DataFrame.
- Type:
- log_ids
Identifiers for mapping column names in the event log.
- Type:
EventLogIDs
- run(multitasking: bool = False, concurrency_thresholds: ConcurrencyThresholds = ConcurrencyThresholds(df=0.9, l2l=0.9, l1l=0.9), enable_time_concurrency_threshold: float = 0.75) DataFrame[source]
Executes event log pre-processing steps based on the specified parameters.
This includes estimating missing start times, adjusting timestamps for multitasking scenarios, and computing enabled times.
- Parameters:
- Returns:
The pre-processed event log.
- Return type:
Control-flow Model Module
- class simod.control_flow.settings.HyperoptIterationParams(output_dir: Path, provided_model_path: Path | None, project_name: str, optimization_metric: Metric, gateway_probabilities_method: GatewayProbabilitiesDiscoveryMethod, mining_algorithm: ProcessModelDiscoveryAlgorithm, epsilon: float | None, eta: float | None, replace_or_joins: bool | None, prioritize_parallelism: bool | None, f_score: float | None = None)[source]
Parameters for a single iteration of the Control-Flow optimization process.
This class defines the configuration settings used during an iteration of the optimization process, including process model discovery, optimization metric, and gateway probability discovery.
- output_dir
Directory where all output files for the current iteration will be stored.
- Type:
- provided_model_path
Path to a provided BPMN model, if available (no discovery needed).
- Type:
pathlib.Path, optional
- optimization_metric
Metric used to evaluate the candidate process model in this iteration.
- Type:
Metric
- gateway_probabilities_method
Method for discovering gateway probabilities.
- Type:
GatewayProbabilitiesDiscoveryMethod
- mining_algorithm
Algorithm used for process model discovery, if necessary.
- Type:
ProcessModelDiscoveryAlgorithm
- epsilon
Number of concurrent relations between events to be captured in the discovery algorithm (between 0.0 and 1.0).
- Type:
float, optional
- eta
Threshold for filtering the incoming and outgoing edges in the discovery algorithm (between 0.0 and 1.0).
- Type:
float, optional
- replace_or_joins
Whether to replace non-trivial OR joins in the discovered model.
- Type:
bool, optional
- prioritize_parallelism
Whether to prioritize parallelism or loops for model discovery.
- Type:
bool, optional
- f_score
Minimum f-score value to consider the discovered data-aware branching rules.
- Type:
float], default=Non, optional
Notes
If provided_model_path is specified, process model discovery will be skipped.
- to_dict() dict[source]
Converts the instance into a dictionary representation of the optimization parameters.
The returned dictionary is structured based on whether a process model needs to be discovered or if a pre-existing model is provided.
- Returns:
A dictionary containing the optimization parameters for this iteration.
- Return type:
- class simod.control_flow.optimizer.ControlFlowOptimizer(event_log: EventLog, bps_model: BPSModel, settings: ControlFlowSettings, base_directory: Path)[source]
Optimizes the control-flow of a business process model using hyperparameter optimization.
This class performs iterative optimization to refine the structure of a process model and discover optimal gateway probabilities. It evaluates different configurations to improve the process model based on a given metric.
The search space is built based on the parameters ranges in [settings].
- event_log
Event log containing train and validation partitions.
- Type:
EventLog
- initial_bps_model
Business process simulation (BPS) model to use as a base, by replacing its control-flow model with the discovered one in each iteration.
- Type:
BPSModel
- settings
Configuration settings to build the search space for the optimization process.
- Type:
ControlFlowSettings
- base_directory
Root directory where output files will be stored.
- Type:
- best_bps_model
Best discovered BPS model after the optimization process.
- Type:
BPSModel, optional
- evaluation_measurements
Quality measures recorded for each hyperopt iteration.
- Type:
Notes
If no process model is provided, a discovery method will be used.
Optimization is performed using TPE-hyperparameter optimization.
- run() HyperoptIterationParams[source]
Runs the control-flow optimization process.
This method defines the hyperparameter search space and executes a TPE-hyperparameter optimization process to discover the best control-flow model. It evaluates multiple iterations and selects the best-performing set of parameters for its discovery.
- Returns:
The parameters of the best iteration of the optimization process.
- Return type:
- Raises:
AssertionError – If the best discovered process model path does not exist after optimization.
- simod.control_flow.discovery.discover_process_model(log_path: Path, output_model_path: Path, params: HyperoptIterationParams)[source]
Runs the specified process model discovery algorithm to extract a process model from an event log and save it to the given output path.
This function supports Split Miner V1 and Split Miner V2 as discovery algorithms.
- Parameters:
log_path (
pathlib.Path) – Path to the event log in XES format, required for Split Miner algorithms.output_model_path (
pathlib.Path) – Path to save the discovered process model.params (
HyperoptIterationParams) – Configuration containing the process model discovery algorithm and its parameters.
- Raises:
ValueError – If the specified process model discovery algorithm is unknown.
Resource Model Module
- class simod.resource_model.settings.HyperoptIterationParams(output_dir: Path, process_model_path: Path, project_name: str, optimization_metric: Metric, calendar_discovery_params: CalendarDiscoveryParameters, discover_prioritization_rules: bool = False, discover_batching_rules: bool = False)[source]
Parameters for a single iteration of the Resource Model optimization process.
This class defines the necessary parameters for optimizing the resource model of the BPS model. It includes the parameter values for the discovery of resource profiles, calendars, etc.
- output_dir
Directory where all files of the current iteration will be stored.
- Type:
- process_model_path
Path to the BPMN process model used for optimization.
- Type:
- optimization_metric
Metric used to evaluate the quality of the current iteration’s candidate.
- Type:
- calendar_discovery_params
Parameters for the resource calendar (i.e., working schedules) discovery.
- Type:
CalendarDiscoveryParameters
- discover_prioritization_rules
Whether to attempt discovering prioritization rules (default: False).
- Type:
bool, optional
- class simod.resource_model.optimizer.ResourceModelOptimizer(event_log: EventLog, bps_model: BPSModel, settings: ResourceModelSettings, base_directory: Path, model_activities: list[str] | None = None)[source]
Optimizes the resource model of a business process model using hyperparameter optimization.
This class performs iterative optimization to refine the resource model and discover optimal resource profiles and availability calendars. It evaluates different configurations to improve the process model based on a given metric.
The search space is built based on the parameters ranges in [settings].
- initial_bps_model
Business process simulation (BPS) model to use as a base, by replacing its resource model with the discovered one in each iteration.
- Type:
- settings
Configuration settings to build the search space for the optimization process.
- Type:
- base_directory
Root directory where output files will be stored.
- Type:
- evaluation_measurements
Quality measures recorded for each hyperopt iteration.
- Type:
Notes
Optimization is performed using TPE-hyperparameter optimization.
- run() HyperoptIterationParams[source]
Runs the resource model optimization process.
This method defines the hyperparameter search space and executes a TPE-hyperparameter optimization process to discover the best resource model. It evaluates multiple iterations and selects the best-performing set of parameters for its discovery.
- Returns:
The parameters of the best iteration of the optimization process.
- Return type:
Extraneous Delays Model Module
- class simod.extraneous_delays.optimizer.ExtraneousDelaysOptimizer(event_log: EventLog, bps_model: BPSModel, settings: ExtraneousDelaysSettings, base_directory: Path)[source]
Optimizer for the discovery of the extraneous delays model.
This class performs either a direct discovery of the extraneous delays of the process, or launches an iterative optimization that first discovers the extraneous delays and then adjusts their size to better reflect reality.
- bps_model
The business process simulation model to enhance with extraneous delays, including the BPMN representation.
- Type:
- settings
Configuration settings for extraneous delay discovery.
- Type:
- base_directory
Directory where output files will be stored.
- Type:
- run() List[ExtraneousDelay][source]
Executes the extraneous delay discovery process.
This method configures the optimization process, applies either a direct enhancement or a hyperparameter optimization approach to identify delays, and returns the best detected delays as a list of ExtraneousDelay objects.
- Returns:
A list of detected extraneous delays, each containing activity names, delay IDs, and their corresponding duration distributions.
- Return type:
List[
ExtraneousDelay]
- class simod.extraneous_delays.types.ExtraneousDelay(activity_name: str, delay_id: str, duration_distribution: DurationDistribution)[source]
Represents an extraneous delay within a business process activity.
This class encapsulates the details of an identified extraneous delay, including the affected activity, a unique delay identifier, and the duration distribution of the delay.
- duration_distribution
The statistical distribution representing the delay duration.
- Type:
DurationDistribution
- static from_dict(delay: dict) ExtraneousDelay[source]
Creates an ExtraneousDelay instance from a dictionary.
This method reconstructs an ExtraneousDelay object from a dictionary containing activity name, delay identifier, and duration distribution.
- Parameters:
delay (dict) – A dictionary representation of an extraneous delay.
- Returns:
An instance of ExtraneousDelay with the extracted attributes.
- Return type:
- to_dict() dict[source]
Converts the extraneous delay into a dictionary format.
The dictionary representation is compatible with the Prosimos simulation engine, containing activity details, a unique event identifier, and the delay duration distribution.
- Returns:
A dictionary representation of the extraneous delay.
- Return type:
- simod.extraneous_delays.utilities.add_timers_to_bpmn_model(process_model: Path, delays: List[ExtraneousDelay], timer_placement: TimerPlacement = TimerPlacement.BEFORE)[source]
Enhances a BPMN model by adding timers before or after specified activities.
This function modifies a given BPMN process model by inserting timers before or after activities that have identified extraneous delays.
- Parameters:
process_model (
pathlib.Path) – Path to the BPMN process model file to enhance.delays (List[
ExtraneousDelay]) – A list of extraneous delays, where each delay specifies an activity and the corresponding timer configuration.timer_placement (
TimerPlacement, optional) – Specifies whether the timers should be placed BEFORE (indicating the delay happens before an activity instance) or AFTER (indicating the delay happens afterward). Default is TimerPlacement.BEFORE.
Notes
This function modifies the BPMN file in place.
The method searches for tasks within the BPMN model and inserts timers based on the provided delays.
- Raises:
ValueError – If the BPMN model file does not contain any tasks.
Simulation Module
- class simod.simulation.parameters.BPS_model.BPSModel(process_model: Path | None = None, gateway_probabilities: List[GatewayProbabilities] | None = None, case_arrival_model: CaseArrivalModel | None = None, resource_model: ResourceModel | None = None, extraneous_delays: List[ExtraneousDelay] | None = None, case_attributes: List[CaseAttribute] | None = None, global_attributes: List[GlobalAttribute] | None = None, event_attributes: List[EventAttribute] | None = None, prioritization_rules: List[PrioritizationRule] | None = None, batching_rules: List[BatchingRule] | None = None, branch_rules: List[BranchRules] | None = None, calendar_granularity: int | None = None)[source]
Represents a Business Process Simulation (BPS) model containing all necessary components to simulate a business process.
This class manages various elements such as the BPMN process model, resource configurations, extraneous delays, case attributes, and prioritization/batching rules. It provides methods to convert the model into a format compatible with Prosimos and handle activity ID mappings.
- process_model
Path to the BPMN process model file.
- Type:
pathlib.Path, optional
- gateway_probabilities
Probabilities for gateway-based process routing.
- Type:
List[
GatewayProbabilities], optional
- case_arrival_model
Model for the arrival of new cases in the simulation.
- Type:
CaseArrivalModel, optional
- resource_model
Model for the resources involved in the process, their working schedules, etc.
- Type:
ResourceModel, optional
- extraneous_delays
A list of delays representing extraneous waiting times before/after activities.
- Type:
List[
ExtraneousDelay], optional
- case_attributes
Case-level attributes and their update rules.
- Type:
List[
CaseAttribute], optional
- global_attributes
Global attributes and their update rules.
- Type:
List[
GlobalAttribute], optional
- event_attributes
Event-level attributes and their update rules.
- Type:
List[
EventAttribute], optional
- prioritization_rules
A set of case prioritization rules for process execution.
- Type:
List[
PrioritizationRule], optional
- batching_rules
Rules defining how activities are batched together.
- Type:
List[
BatchingRule], optional
- branch_rules
Branching rules defining conditional flow behavior in decision points.
- Type:
List[
BranchRules], optional
- calendar_granularity
Granularity of the resource calendar, expressed in minutes.
- Type:
int, optional
Notes
to_prosimos_format transforms the model into a dictionary format used by Prosimos.
replace_activity_names_with_ids modifies activity references to use BPMN IDs instead of names.
- deep_copy() BPSModel[source]
Creates a deep copy of the current BPSModel instance.
This ensures that modifying the copied instance does not affect the original.
- Returns:
A new, independent copy of the current BPSModel instance.
- Return type:
Notes
This method uses Python’s copy.deepcopy() to create a full recursive copy of the model.
- replace_activity_names_with_ids()[source]
Replaces activity names with their corresponding IDs from the BPMN process model.
Prosimos requires activity references to be identified by their BPMN node IDs instead of activity labels. This method updates:
Resource associations in the resource profiles.
Activity-resource distributions.
Event attributes referencing activity names.
- Raises:
KeyError – If an activity name does not exist in the BPMN model.
Notes
This method modifies the model in place.
It ensures compatibility with Prosimos by aligning activity references with BPMN IDs.
- to_json(output_dir: Path, process_name: str) Path[source]
Saves the BPS model in a Prosimos-compatible JSON format.
This method generates a structured JSON file containing all necessary simulation parameters, ensuring that the model can be directly used by the Prosimos engine.
- Parameters:
output_dir (
pathlib.Path) – The directory where the JSON file should be saved.process_name (str) – The name of the process, used for naming the output file.
- Returns:
The full path to the generated JSON file.
- Return type:
Notes
The JSON file is created in output_dir with a filename based on process_name.
Uses json.dump() to serialize the model into a structured format.
Ensures all attributes are converted into a valid Prosimos format before writing.
- to_prosimos_format() dict[source]
Converts the BPS model into a dictionary format compatible with the Prosimos simulation engine.
This method extracts all relevant process simulation attributes, including resource models, delays, prioritization rules, and activity mappings, and structures them in a format understood by Prosimos.
- Returns:
A dictionary representation of the BPS model, ready for simulation in Prosimos.
- Return type:
Notes
If the resource model contains a fuzzy calendar, the model type is set to “FUZZY”; otherwise, it defaults to “CRISP”.
The function ensures activity labels are properly linked to their respective BPMN IDs.
- class simod.simulation.prosimos.ProsimosSettings(bpmn_path: Path, parameters_path: Path, output_log_path: Path, num_simulation_cases: int, simulation_start: Timestamp)[source]
Configuration settings for running a Prosimos simulation.
- bpmn_path
Path to the BPMN process model.
- Type:
- parameters_path
Path to the Prosimos simulation parameters JSON file.
- Type:
- output_log_path
Path to store the generated simulation log.
- Type:
- simulation_start
Start timestamp for the simulation.
- Type:
- simod.simulation.prosimos.simulate(settings: ProsimosSettings)[source]
Runs a Prosimos simulation with the provided settings.
- Parameters:
settings (
ProsimosSettings) – Configuration settings containing paths and parameters for the simulation.
Notes
The function prints the simulation settings and invokes run_simulation().
The labels of the start event, end event, and event timers are**not** recorded to the output log.
The simulation generates a process log stored in settings.output_log_path.
- simod.simulation.prosimos.simulate_and_evaluate(process_model_path: Path, parameters_path: Path, output_dir: Path, simulation_cases: int, simulation_start_time: Timestamp, validation_log: DataFrame, validation_log_ids: EventLogIDs, metrics: List[Metric], num_simulations: int = 1) List[dict][source]
Simulates a process model using Prosimos multiple times and evaluates the results.
This function runs the simulation num_simulations times in parallel, compares the generated logs with a validation log, and evaluates them using provided metrics.
- Parameters:
process_model_path (
pathlib.Path) – Path to the BPMN process model.parameters_path (
pathlib.Path) – Path to the Prosimos simulation parameters JSON file.output_dir (
pathlib.Path) – Directory where simulated logs will be stored.simulation_cases (int) – Number of cases to simulate per run.
simulation_start_time (
pandas.Timestamp) – Start timestamp for the simulation.validation_log (
pandas.DataFrame) – The actual event log to compare against.validation_log_ids (
EventLogIDs) – Column mappings for identifying events in the validation log.metrics (List[
Metric]) – A list of metrics used to evaluate the simulated logs.num_simulations (int, optional) – Number of parallel simulation runs (default is 1).
- Returns:
A list of evaluation results, one for each simulated log.
- Return type:
List[dict]
Notes
Uses multiprocessing to speed up simulation when num_simulations > 1.
Simulated logs are automatically compared with validation_log.