Fine tuning – IBM Granite

Overview

In order to be generally applicable to many time series datasets, the pretrained Granite-TimeSeries model is trained in a channel independent way. Under channel independence, the model learns a common dynamics model across all channels without considering interactions. In many cases, there is important information in a time series dataset related to how different channels interact with each other. In order to capture these dependencies, we need to fine tune and enable channel mixing in the decoder.

The fine-tuning process requires the following steps:

Load and prepare data using the time series preprocessor. Since we are fine-tuning the model, we have more options to include things like cross-channel interactions, exogenous features and categorical values.
Load the model and freeze the backbone
Create a Hugging Face Trainer and train the model
Evaluate the model

For the example code-snippets below, we assume that a dataset is available with a timestamp column (“date”), and three value columns (“value1”, “value2”, “value3”). The columns “value1” and “value2” will serve as targets of the forecasting exercise, while the “value3” column will serve as a control column. A control column is a type of exogenous column where it is assumed that its value is known into the future.

Prerequisites

Please make sure your python environment is properly set up. The code snippets below are illustrative examples of how various components work with our Granite-TimeSeries models, and require the installation of the Granite TSFM library. Please see the setup instructions.

Data loading

Pandas dataframes are required in the preprocessing and forecasting pipeline components. We simply read the csv into a pandas dataframe, making sure to format the timestamp column properly.

data = pd.read_csv(
  dataset_path,
  parse_dates=["date"],
)
Copy to clipboard

Data preparation

In the zero-shot case, we can only handle datasets with id columns and one or more target columns, advanced use cases involving exogenous or categorical features are not yet supported. In the fine-tuning case, we are free to specify additional columns, like control_columns. We first specify the parameters of the preprocessor, enabling scaling (normalization) of the data. Then we use the preprocessor to extract train, validation, and test datasets from the original data.

tsp = TimeSeriesPreprocessor(
  id_columns=[],
  timestamp_column="date"
  target_columns=["value1", "value2"],
  control_columns=["value3"],
  prediction_length=96,
  context_length=512,
  scaling=True,
  scaling_type="standard",
Copy to clipboard

Loading the pretrained granite time series model

Here we load the model using from_pretrained but pass some extra configuration arguments. Note the presence of “decoder_mode” and the other index related arguments. The TimeSeriesPreprocessor provides some helper functions to make passing these parameters easier.

model = TinyTimeMixerForPrediction.from_pretrained("ibm-granite/granite-timeseries-ttm-v1",
                                                   num_input_channels=tsp.num_input_channels,
                                                   prediction_channel_indices=tsp.prediction_channel_indices,
                                                   exogenous_channel_indices=tsp.exogenous_channel_indices,
                                                   decoder_mode="mix_channel")

for param in model.backbone.parameters():
  param.requires_grad = False
Copy to clipboard

The key here is the “mix-channel” decoder mode. This enables the decoder to consider inter-channel dependencies.

Creating a hugging face trainer and training the model

We create a hugging face Trainer, passing the train and validation datasets that were created earlier. The first step is to define the training arguments:

finetune_forecast_args = TrainingArguments(
        output_dir="train_output",
        overwrite_output_dir=True,
        learning_rate=0.001,
        num_train_epochs=100,
        do_eval=True,
        evaluation_strategy="epoch",
        per_device_train_batch_size=64,
        per_device_eval_batch_size=64,
Copy to clipboard

Many of the arguments above, e.g., learning rate, batch size, are dataset specific and will need to be adjusted for your specific use case. Next we define the callbacks. We add early stopping, which stops training when no improvement is seen, and a tracking callback, which reports training time statistics.

    early_stopping_callback = EarlyStoppingCallback(
        early_stopping_patience=10,  # Number of epochs with no improvement after which to stop
        early_stopping_threshold=0.0,  # Minimum improvement required to consider as improvement
    )
    tracking_callback = TrackingCallback()

Copy to clipboard

Next we create the optimizer and the learning rate scheduler. We use AdamW as the optimizer and the 1cycle learning rate policy which adapts the learning rate quickly.

    optimizer = AdamW(finetune_forecast_model.parameters(), lr=learning_rate)
    scheduler = OneCycleLR(
        optimizer,
        learning_rate,
        epochs=num_epochs,
        steps_per_epoch=math.ceil(len(dset_train) / (batch_size)),
    )
Copy to clipboard

Finally, we construct the trainer, using the components we created above:

finetune_forecast_trainer = Trainer(
        model=model,
        args=finetune_forecast_args,
        train_dataset=dset_train,
        eval_dataset=dset_val,
        callbacks=[early_stopping_callback, tracking_callback],
        optimizers=(optimizer, scheduler),
    )
finetune_forecast_trainer.train()
Copy to clipboard

Evaluating the fine-tuned model

We make use of the trainer to evaluate on the test dataset created earlier. In the output of the command below “eval_loss” refers to the mean squared error on the normalized test data.

fewshot_output = finetune_forecast_trainer.evaluate(dset_test)
Copy to clipboard

Run Granite: Linux

How-to guides: Quantization