Evaluation

The fdsim package has built-in functionality to analyze simulation log files and extract useful performance measures. This functionality is provided in one flexible class that can be configured to output a wide variety of performance indicators.

For example, you can filter the deployments or incidents that should be taken into account when calculating performance and you can determine which descriptors, such as quantile values, of a certain performance measure (e.g., the response time) you want to calculate. In addition, you can configure the Evaluator class to use multiple performance metrics and reuse the same object among different simulation setups to compare the results.

class evaluation.Evaluator(response_time_col='response_time', target_col='target', run_col='run', prio_col='priority', location_col='location', vehicle_col='vehicle_type', incident_type_col='incident_type', object_col='object_function', incident_id_col='t', datetime_col='time', by_run=True, confidence=0.95, verbose=True)[source]

Class that evaluates simulation runs, i.e., extracts metrics from the simulation log.

Multiple metrics can be set up in one Evaluator object, so that all metrics are calculated upon every call to Evaluator.evaluate. This way, the evaluator only has to be initialized once in order to run simulation experiments with multiple input configurations.

Parameters:
  • target_col, run_col, prio_col, location_col, vehicle_col, incident_type_col, object_col (response_time_col,) – The columns in the simulation log(s) that will refer to varying aspects of an incident or deployment. Specifically, the columns represent, respectively: the response time, the response time targets, the simulation run/iteration respectively, the priority of the incident, the demand location id of the incident, the vehicle type, the incident type, and the object function. Defaults are “response_time”, “target”, “run”, “priority”, “location”, “vehicle_type”, “incident_type”, and “object_function” respectively.
  • by_run (boolean, optional, default=True) – Whether to calculate metrics per simulation run (True) or over the whole dataset.

Notes

The Evaluator class was developed with flexibility as one of the most important criteria. To support this flexibility, while maintaining a simple API, metrics are not defined upon initialization, but using the .add_metric() method.

add_metric(measure, name=None, description=None, count=True, mean=True, std=True, missing=True, quantiles=[0.5, 0.75, 0.9, 0.95, 0.98, 0.99], prios=None, locations=None, vehicles=None, incident_types=None, objects=None, hours=None, days_of_week=None, first_only=False)[source]

Add metrics that should be evaluated.

Parameters:
  • measure (str, one of ["response_time", "on_time", "delay"]) – The measure to evaluate.
  • name (str, optional, default=None) – How to name the set of metrics for reference in outputs. If None, a standard name is given (i.e., ‘metric set 1’, ‘metric set 2’).
  • description (str, optional, default=None) – A description of the set of evaluation metrics. This can be used to explain, e.g., the applied filtering in a more elaborate way, whereas the ‘name’ property should be kept concise.
  • mean, std, missing (count,) – Whether to describe the measure by its count, mean, standard deviation and proportion of missing (NaN) values. Note that a missing response time means the response was carried out by an external vehicle.
  • quantiles (array(float), optional, default=[0.5, 0.75, 0.90, 0.95, 0.98, 0.99])) – Which quantiles to describe the measure with. Set to None to not use any quantiles.
  • prios (int or array-like of ints, optional, default=None) – Which priority levels to include during evaluation. If None, uses all levels.
  • vehicles, incident_types, objects (locations,) – Which locations, vehicles types, incident types and object functions to include during evaluation. If None, uses all values.
  • hours (array-like of ints or None, optional, default=None) – Which hours of dat to incorporate during evaluation. Values must be integers in [0, 23].
  • days_of_week (array-like of ints or None, optional, default=None) – Which days of the week to incorporate during evaluation. Monday = 0, …, Sunday = 6.
  • first_only (boolean, optional, default=False) – Whether to calculate the metrics for only the first arriving vehicle per incident (True) or to evaluate all vehicles (False).
evaluate(log)[source]

Evaluate a given simulation output on all set metrics.

Parameters:log (pd.DataFrame) – The raw simulation output/log.
Returns:metrics – The calculated metrics.
Return type:pd.DataFrame
plot(metric_set_name, *datasets, return_fig=True, labels=None, **kwargs)[source]

Plot the distributions of a measure in various simulation results logs.

Parameters:
  • metric_set_name (str) – The name of the metric set to plot. Filters are applied specified for this metric set. Note that metrics are not computed, but the filtered measure is plotted as a continous variable. Hence, the measure of the metric set should be either response time or delay and cannot be ‘on time’.
  • *datasets (pd.DataFrames) – Simulation logs from experiments that should be plotted in the same chart.
  • return_fig (boolean, optional, default=True) – Whether to return the figure object (True) or to plot directly (False).