Create a kernel file for an inference service

Create a kernel file for an inference service.

What is a kernel file

A model kernel is a driver that loads a model and controls its behavior when the inference model is started, when the inference request is processed and when the inference model is stopped. The model kernel works together with the model file and weight file. It must be specified when creating a new inference service. Only published models can be used as an inference service, and these models must specify a kernel file and environment attributes which are used during model initialization.

High level functions

Kernel files can be written in Python or C++, and the kernel file must inherit from the kernel base class and it must include key functions to interface with the deep learning kernel.

Key functions include:

on_kernel_start: a function that is called during the inference service starting phase; loading it for future inference purposes.
on_task_invoke: a function that is called when a real inference request comes in. It runs inference after a model is started by on_kernel_start for one or more samples that the model can handle.

on_kernel_shutdown: a function that cleans up any temporary working files if needed. Example:h

class MatchKernel(Kernel):
def on_kernel_start(self, kernel_context);
def on_task_invoke(self, task_context);
def on_kernel_shutdown(self);

Kernel file data structure

Kernel context and task context only pass parameters to a user-defined model kernel for a particular purpose of each kernel function.

Kernel context:

Context information for model initialization with some important parameters like model weight file. It is defined during inference service creation and can be updated when inference service is stopped.

Class: KernelContext
Method: get_instance_id(self);
Description: Get model instance id.

Class: KernelContext
Method: get_model_description(self);
Description: Get key model information including attribute and environment information. 
Below are some key details that can be used in a model kernel. 
 {
  "name": "resnet18-pytorch",
  "uid": "e9796818-c2df-470d-a10b-1d065cb8052c",
  "tag": "",
  "size": 45208474,
  "weight_path": "./",
  "model_path": "/opt/wml-edi/repo/resnet18-pytorch/resnet18-pytorch-20201105-064932",
  "create_time": 1604558972,
  "last_updated_time": 1604558972,
  "started_at": 1604559078,
  "creator": "admin",
  "runtime": "dlipy3",
  "kernel_path": "kernel.py",
  "service_uri": "",
  "attributes": [],
  "mk_environments": [],
  "schema_version": "1"
}

Task context:

Context information includes the actual user input data and also a container for inference output. The task context could include the task input or output data for multiple tasks so it provides interfaces to go through those tasks. Each inference request can specify different input context.

Class: TextContext

Method: get_version(self);
Description: Get model version number.

Method: get_id(self); 
Description: Get current task ID.

Method: get_input_data(self);
Description: Get input data.

Method: set_output_data(self, data);
Description: Set response data.

Method: next(self);
Description: Get next task context if batch mode.

Method: prev(self);
Description: Get previous task context.

How to create a kernel for an inference service

When creating a kernel file for an inference service consider the following sections of code:

Create your kernel class. All model kernels must inherit from the base Kernel class defined in the elastic distributed inference SDK (redhareapi).

import redhareapiversion
from redhareapi import Kernel
class MatchKernel(Kernel):

...

if __name__ == "__main__":
    kernel = MatchKernel()
    kernel.run()

Considerations for model start:
When starting an inference service, the model kernel initializes model with the weight file, attributes and environments through following interface:
```
def on_kernel_start(self, kernel_context)
```
The input kernel_context passes parameters which can be used to initialize the model.
```
model_desc = json.loads(kernel_context.get_model_description())
```
The primary work is to have PyTorch resnet18 model to load the model file:
```
model_file = os.path.join(model_desc['model_path'], "model_epoch_final.pth")
self.model = models.__dict__["resnet18"]()
self.model.fc = nn.Linear(512, 10)
self.model.load_state_dict(torch.load(model_file))
```

Considerations for model invoke:

Once the inference service is started, users can submit inference requests which are handled as tasks in the model kernel through following interface:

def on_task_invoke(self, task_context);

The intention of this interface is to do the actual inference job with the loaded model from the model start phase. Here we show a generic flow to achieve this purpose.

Loop through task context to solve all the combined inference tasks.

while task_context != None:
...
task_context_vec.append(task_context)
task_context = task_context.next()

For each task, retrieve the input data and some specific attributes for that task and send prediction results back.

input_data = json.loads(task_context.get_input_data())
req_id = input_data['id']
d = input_data["inputs"][0]
#input_name = d["name"]
data = d["data"]
image_data = base64.b64decode(data)
image_data = BytesIO(image_data)
image_t = data_transforms(Image.open(image_data)).float()
image_t = image_t.unsqueeze(0)
outs = self.model(image_t)
output_data = {"name":"output0", "datatype":"FP32", "shape": [1, 10], "data": outs.data[0].numpy().tolist()}
task_context.set_output_data(json.dumps({"id": req_id, "outputs":[output_data]}))

Considerations for model shutdown:
If you want to do anything during inference service stop phase, add it to the on_kernel_shutdown method. In this example, the kernel shutdown is recorded in the log file.
```
def on_kernel_shutdown(self):
Kernel.log_info('on_kernel_shutdown')
```
Considerations for kernel logging include the following log interfaces. Logs are saved at <EGO_path>/dlim/logs:
```
log_info(msg)
log_warn(msg)
log_debug(msg)
log_error(msg)
```