In order to benchmark results—and in particular the training performance in terms of execution time—I’ve also recreated an exact replica of the same CNN model using TensorFlow 2.0.

The snippet Python code below illustrates the same model architecture in TF and the summary of output shapes of each layers.

You can notice here that the layers, layer shapes, and convolution filters and pooling sizes are exactly the same as in the Core ML model, which we created on-device with the SwiftCoreMLTools library.

Important Note — The SwiftCoreMLTools library also provides a similar programmatic API for adding layers to a model at runtime, very much like in Keras, but in the source code shared in this article, I’m using the DSL / function builder approach. Please reference the SwiftCoreMLTools GitHub project for further documentation.

Before looking at the training execution time, it’s important to note that both the Core ML and the TensorFlow model trained for the same number of epochs (10), with the same hyperparameters, obtaining very similar accuracy on the same 10,000 test images.

In particular, you can see from the Python code snippet below that the TensorFlow model, trained with the same Adam optimizer and categorical cross-entropy loss function, resulted in a final accuracy result on the test case greater than 0.98.

For the Core ML model, you can see from the iPhone app screenshot below that, training and testing with the same optimizer, loss function and of course the same train and test datasets, it also obtains a final accuracy result greater than 0.98.

MNIST LeNet CNN training result with Core ML on iPhone App

For on-device Core ML model training, I’ve executed tests on macOS and on both an iOS emulator and real Apple devices. Doing this, I’ve noticed once again that training Core ML models on modern iPhone/iPad devices are really much more optimized than on a MacBook Pro with an i7 CPU, a Radeon GPU, and lots of memory.

To provide some real numbers about how good and promising on-device training is on current-gen iPhones I can say I was able to train the 60,000 MNIST samples for 10 epochs in about:

  • 248 seconds on a iPhone 11 with the Core ML model
  • 158 seconds using TensorFlow 2.0 on a i7 Mac Book Pro (using CPU only of course).

Of course ,there is a very huge gap between 248 seconds and 158 seconds. Basically an optimization of over 60%, even without considering using GPU— but the real point here is not to compare apples with oranges, but to have a glimpse at what mobile and wearable devices can do in the context of training locally, on device, very sensitive and personal data.

In particular, I think it’s important to reflect that training a single epoch on a mobile device with 585,958 parameters and 60,000 data points required something around 20 seconds.

Considering scenarios such as distributed training, and in particular federated learning, I really think these are very promising numbers. I’ll continue testing more on my long journey towards this federated learning platform.

By the way, if you want to contribute in any way—for example, by testing or implementing missing functionalities on the SwiftCoreMLTools library—please be my guest.

One final note here about how easy it is to integrate Core ML training with a powerful user interface tool such as SwiftUI + Combine.

Jupyter Notebooks and even other tools like TensorFlow.js are very good for building real real-time experiments, but I have to say that the opportunity that Core ML + SwiftUI offers for real on-device experimentation is really amazing.

In my very simple use case for this article about training the MNIST dataset on a iPhone, it was very easy for me to add a minimal touch interface to directly let users draw new digits on the screen and test them live.

The SwiftCoreMLTools library, with a Swift DSL implemented with the same Swift function builder used by SwiftUI, offers a really coherent and similar approach for building the model and experimenting with the interaction between the UI and the model in real-time scenarios.