Training and saving a model

Below is a short script that takes some data and estimates a model. It also creates a function for preprocessing the data. This isn’t really needed in the case below, but it is common to need some form of preprocessing (e.g. creating dummy variables, some estimators need features that have smaller numerical ranges etc).

Finally, the script stores the preprocessing function and the trained model as two “rds”-files. This is R-version of pickle in Python.

library(dplyr)
library(magrittr)

df <-
  tibble(x = rnorm(100)*100) %>% 
  mutate(y = 5 + x + rnorm(100))

preprocessData <-
  function(df) {
    df$x_scaled = df$x/100
    return(df)
  }

model <- 
  df %>% 
  preprocessData() %>% 
  lm(y~x, data=.)

saveRDS(model, "model.rds")
saveRDS(preprocessData, "preprocess_data.rds")

Setting up a webservice

Plumber and docker

We next want to use the model to generate predictions in an API. To do this we first use two local scripts. The first defines the contents of the API. This file is names “score.r”.

Need to be careful adding comments in the script due to syntax requirements of Plumber, but the gist of it is

  • Loads the required libraries (dplyr and magrittr)
  • Imports the serialized model and preprocessing function
  • Define in/out parameters for the api
  • Define a function get_predicted_score, that
    • Collects parameters in a data frame
    • Transforms data frame using preprocessing function
    • Predicts predicted value for a given x as well as the scoring date.
library(dplyr)
library(magrittr)

model <- readRDS("model.rds")
preprocessData <- readRDS("preprocess_data.rds")

#' @get /get_predicted_score
#' @param x       

get_predicted_score <- 
  function(
    x
  ){
    
    df <- 
      data.frame(x = as.numeric(x)) %>% 
      preprocessData(.)

    data.frame(
      scoring_date = as.Date(Sys.Date()),
      predY = 
        predict(
          model, 
          newdat = df)
      )
  }

The next file, called “main.r”, initiates a local webservice exposed at port 80.

library(plumber)
r <- plumb("score.R")
r$run(port=80, host="0.0.0.0")

The scripts above wont work when run on a Windows machine due to encoding of the “ø”. Luckily, we can use Docker to avoid this issue.

Define a simple dockerfile, that:

  • copies model files and scripts into image
  • installs caret
  • runs the main.r-script, and exposes port 80.

This file is simply called dockerfile, with no extension. The dockerfile needs to be located in the same folder as the main.r, score.r, model.rds and preprocess_data.rds.

FROM trestletech/plumber
MAINTAINER Hong Ooi <hongooi@microsoft.com>

RUN R -e 'install.packages(c("dplyr", "magrittr"))'

RUN mkdir /data
COPY model.rds /data
COPY preprocess_data.rds /data
COPY score.R /data
COPY main.R /data
WORKDIR /data

EXPOSE 80
ENTRYPOINT ["Rscript", "main.R"]

From cmd in Windows, we go to the path of the dockerfile, and issue the command below. This builds the docker image above, named “modelimage”

docker build -t modelimage .

When this is built (takes some minutes!), we can test the service locally. First, fire up the image by issuing the command below in cmd:

docker run --rm -p 80:80 modelimage

Testing response

I test the response with Python.

import requests
payload = {
    "x": 200
}
r = requests.get("http://127.0.0.1/get_predicted_score", params = payload)
r.json()

This returns the desired response:

[{'scoring_date': '2021-10-31', 'predY': 205.0965}]

Playing a bit with parameters to see that the scoring works as intended:

payload = {
    "x": -2000
}
r = requests.get("http://127.0.0.1/get_predicted_score", params = payload)
r.json()
[{'scoring_date': '2021-10-31', 'predY': -1995.6117}]

For a serious deployment, I would consider the following exensions:

  • The script ought to be made more robust wrt unexpected input, and have more informative error handling
  • Add optional arguments in the API for model type and model version.

In principle, the docker image can be deployed to Azure to set up a webservice.