Predictors¶

class predictors.BaseIncidentPredictor(load_forecast=True, fc_dir='data/forecasts', verbose=True)[source]¶

Base class for incident predictors. Not useful to instantiate on its own.

create_sampling_dict(start_time=None, end_time=None, incident_types=None)[source]¶

Create a dictionary that can conveniently be used for sampling random incidents based on the forecast.

Parameters:	start_time (Timestamp or str convertible to Timestamp) – The earliest time that should be included in the dictionary. end_time (Timestamp or str convertible to Timestamp) – The latest time that should be included in the dictionary. incident_types (array-like of strings) – The incident types to forecast for. Defaults to None. If None, uses all incident types in the forecast.
Returns:	sampling_dict – The sampling dictionary as described below.
Return type:	dict,

Notes

Stores three results:

-self.sampling_dict, a dictionary like:: {t -> {‘type_distribution’ -> probs, ‘beta’ -> expected interarrival time in minutes, ‘time’ -> the timestamp corresponding to start_time+t}} where t is an integer representing the time_units since the start_time.
-self.sampling_start_time, timestamp of earliest time: in the dictionary.
-self.sampling_end_time, timestamp of the latest time: in the dictionary.

static evaluate(y_true, y_predict, metric='RMSE')[source]¶

Evaluate a given prediction.

Parameters:	y_true (array,) – The ground truth values. y_predict (array) – The predicted labels / forecasted values. metric (str, one of ['MAE', 'RMSE'], optional (default: 'RMSE'),) – The evaluation metric. Uses the Root Mean Squared Error for ‘RMSE’ and the Mean Absolute Error for ‘MAE’.
Returns:	score – The error score(s) of the prediction. If y is multi-output, outputs a list of scores (one score per variable).
Return type:	float,

fit(data)[source]¶: Fit the model on the data.

get_forecast()[source]¶: Return the DataFrame with the forecast.

get_sampling_dict()[source]¶: Return the dictionary from which to sample.

predict(data)[source]¶: Predict using the fitted model.

save_forecast()[source]¶: Save forecasted incident rate to csv.

ts_cross_validate(data, n_splits=5, types=None, last_n_years=True, metric='MAE')[source]¶: Perform n-fold time series cross validation to evaluate the forecast method.

class predictors.BasicLambdaForecaster(ignore_dates=None, id_col='dim_incident_id', type_col='dim_incident_incident_type', date_col='dim_datum_datum', month_col='dim_datum_maand_nr', month_day_col='dim_datum_maand_dag_nr', day_name_col='dim_datum_dag_naam_nl', hour_col='dim_tijd_uur', file_name='basic_lambda_forecast.csv', **kwargs)[source]¶

Forecast arrival rates of incidents based on historic averages.

Arrival rates are obtained for every hour in the week, per month, per type of incident. So, different weeks in the same month always get the same arrival rates, but weeks in different months have different rates. Rates are determined as the average number of arrivals in a similar period.

For example, the rate for a Monday in January between 8:00 and 9:00 is calculated as the average number of incidents between 8:00 and 9:00 of all Mondays in January in the time range of the data.

Parameters:

ignore_dates (array-like of datetime objects,) – Dates that are considered ‘out of the ordinary’ in terms of number of incidents and should not be taken into account when calculating average incident rates. Typically, this list includes days with storms and impactful events such as New Year’s Eve and perhaps Kingsday.
date_col, month_col, day_name_col, hour_col (id_col,) – The column names indicating respectively the id of the incident, the date, month number, name of the week day, hour of day in [0, 24).
**kwargs (dict,) – Parameters passed to BaseIncidentPredictor.

fit(data, last_n_years=8, fit_nye=True)[source]¶

Obtain arrival rates from the data.

Fits arrival rates per incident type, month, day of the week, and hour of the day. Saves the results under self.lambdas and self.nye_lambdas (if fit_nye == True). Sets self.fitted = True when fit procedure is completed.

Parameters:	data (pd.DataFrame,) – The incident data. last_n_years (int, optional (default: 8),) – How many years to use to estimate the arrival rates. It uses the latest ‘last_n_years’ years. fit_nye (boolean, optional (default: True),) – Whether to fit New Year’s Eve separately (True) or to treat it as a regular day.

predict(start, end, predict_nye=True, save=False)[source]¶

Forecast arrival rates for a given future period and save it under ‘self.forecast’.

Parameters:	end (start,) – The start and end dates and times (rounded to the whole hour) for the period to forecast. predict_nye (boolean, optional (default: True),) – Whether to predict NYE with high activity like in reality (True) or ignore it and forecast a regular day instead (False).

class predictors.ProphetIncidentPredictor(**kwargs)[source]¶

Class that forecasts incident rate for different incident types.

Uses Facebook’s Prophet to create a forecast of the incident rate. It does so by calculating the hourly arrivals per incident type, then treating this as a signal/time series and decomposing it into trend, yearly pattern, weekly pattern, and daily pattern.

Example

>>> predictor = ProphetIncidentPredictor(load_forecast=False)
>>> predictor.fit(incident_data)
>>> predictor.predict(periods=365*24, freq="H", save=True)
>>> forecast = predictor.get_forecast()
>>> forecast.head()

Parameters:	load_forecast (boolean) – Whether to load a pre-existing forecast from disk. Defaults to True, since recomputing forecasts is costly. fc_dir (str) – The directory in which forecasts should be saved and from which they should be loaded if applicable. Defaults to ‘./data/forecasts/’. verbose (boolean) – Whether to print what is happening, defaults to True.

fit(data, types=None)[source]¶

Perform time series decomposition using Prophet.

This function first prepares the data and saves the prepared data as ‘self.incidents’. then it creates a dictionary of Prophet() objects, where the keys equal the incident types and the corresponding model is fitted to the data of that type. The dictionary of models is stored as ‘self.models_dict’ and used when predict is called.

Notes

This function does not return anything.

Parameters:	data (pd.DataFrame) – The incidents to train the models on. types (Sequence(str)) – The incident types to fit models for. If None, uses all incident types in the data, except ‘nan’ and ‘NVT’. Defaults to None.

predict(periods=8760, freq='H', save=False, future=None)[source]¶

Forecast the incident rate using Prophet.

Notes

Can only be called after calling ‘.fit()’, throws assertion error otherwise. Does not return anything, since it’s main use cases are sampling from directly from this predictor and saving predictions to file. The result of this method can be obtained by calling ‘get_forecast()’ afterwards.

Parameters:	periods (int) – The number of periods to forecast. freq (str,) – The frequency to predict the incident rates at. Accepts any valid frequency for pd.date_range, such as ‘H’ (default), ‘D’, or ‘M’. save (boolean) – Whether to save the forecast to a csv file. Optional, defaults to false.

class predictors.YearSplitter(n_splits=3, obs_per_year=8760)[source]¶: Split data on whole years to provide constant evaluation metric.