Four commonly used architectures for ML deployment

There are 4 commonly used ways / architectures of deploying an ML model:

  1. Model Embedded in Application (detailed post)
  2. Dedicated model API (detailed post)
  3. Model published as Data (Streaming)
  4. Offline predictions

Depending upon the type of the final application, utility, flexibility and easy of use a choice is made. There’s no such as something is bad over the other, though the mode of offline predictions has started becoming slightly outdated.