Label new images with a Core ML model through Vision on iOS and macOS

Overview

This article is useful because it shows the exact handoff point between a trained `.mlmodel` file and runnable app code.

Part 1 of the series trains a small day-and-night classifier. This article answers the next practical question: how do you actually feed a new image into that model and get a label back inside Swift?

The answer on Apple platforms is not to call the model directly in isolation. The tutorial wraps it in Vision, uses VNCoreMLRequest to define the job, and then executes that request through a VNImageRequestHandler.

Source focus The example model classifies anime scenes as day or night, but the real value of the article is the reusable Vision pipeline, not the toy label set.

Series Context

This post sits between training the model and using the result for wallpaper automation.

The original series flows like this:

1. Train a Create ML classifier from labeled images.
2. Run that model against new images through Vision.
3. Use the detected day or night label to change wallpapers by time of day.

The companion posts are: Part 1 and Part 3.

Why Vision

`Vision` is the layer that turns your model into an image-analysis request with a standard Apple-side result format.

The post introduces Vision as a general-purpose computer-vision framework that can do far more than classification. It is also used for text detection, rectangles, faces, and QR codes. For this article, the important piece is that Vision knows how to drive a Core ML image model and return structured results.

The API shape the article highlights is:

init(model: VNCoreMLModel, completionHandler: VNRequestCompletionHandler?)

That matters because VNCoreMLRequest is not the image container. It is the request definition plus the completion callback that receives the result.

Step 1

Copy the trained `.mlmodel` into the app target, then wrap the generated model class in `VNCoreMLModel`.

Xcode generates a Swift class for the model when the file is part of the target. The article uses that generated class as the bridge into Vision:

guard let model = try? VNCoreMLModel(for: AnimeDayNight().model) else {
    fatalError("Cannot load the ML model")
}

Replace AnimeDayNight with whatever generated class name your own model uses. This article uses a Japanese placeholder name here because the tutorial is explaining the pattern, not hard-coding one exact project name.

Step 2

Build a `VNCoreMLRequest`, then read the top `VNClassificationObservation` in the completion handler.

The request callback is where the prediction becomes useful. The result list is cast to [VNClassificationObservation], the highest-confidence result is read, and its identifier becomes the label your app reacts to.

let request = VNCoreMLRequest(model: model) { request, error in
    guard let results = request.results as? [VNClassificationObservation],
          let topResult = results.first else {
        fatalError("No classification result")
    }

    let detectedResult = topResult.identifier

    if detectedResult == "day" {
        // Handle daytime image
    } else if detectedResult == "night" {
        // Handle night image
    }
}

Important detail The identifier strings must match the labels used when the model was trained. If your model was trained with non-English labels, compare against those exact strings instead of English placeholders.

Step 3

Use `VNImageRequestHandler` to provide the actual image, then perform the request off the main thread.

This is the part this article emphasizes most clearly: the request is not the image. The handler is the object that carries image data into the Vision pipeline.

The article points out three common initializer forms:

init(cgImage: CGImage, options: [VNImageOption: Any])
init(ciImage: CIImage, options: [VNImageOption: Any])
init(cvPixelBuffer: CVPixelBuffer, options: [VNImageOption: Any])

It also notes that orientation can be passed in the options when appropriate. For the concrete sample, the article shows getting a CIImage from a UIImage:

let ciImage = UIImage(named: "test.png")!.ciImage

Then the request is executed like this:

let handler = VNImageRequestHandler(ciImage: image)
DispatchQueue.global(qos: .userInteractive).async {
    do {
        try handler.perform([request])
    } catch {
        print(error)
    }
}

Once that finishes, the completion handler from the previous step runs and receives the classification result.

Other Models

If you swap in a different classifier, the biggest code change is usually your result handling.

The article makes a useful point here: once the request pipeline is in place, most of the structure stays the same across models. What changes most often is the shape of the output and the label names you compare against in the completion handler.

Xcode helps by showing the model's input and output details when you click the model file in the project navigator.

Xcode model inspector showing the inputs and outputs for a Core ML model file — Xcode's model inspector is the fastest way to confirm what the model expects and what it returns.

Demo Apps

This article ends with working iOS and macOS sample projects, plus a warning that the tutorial model itself is intentionally low quality.

The author notes that the example classifier was trained on only about fifteen images, so it is not meant as a serious production model. It is there to demonstrate the mechanics of the Vision pipeline.

Demo app selecting an image and running a Vision Core ML classification request — The sample app picks an image from the gallery and shows the model-driven result.

The linked sample code from the article:

iOS code file: ViewController.swift
iOS project: SwiftVision iOS sample
macOS code file: ViewController.swift
macOS project: SwiftVision macOS sample

Wrap Up

The practical takeaway is that the pipeline is three steps: wrap the model, define the request, then feed image data through a handler.

That is the durable part of the article. The labels can change, the model can change, and the UI can change, but the Vision integration path stays mostly the same across image-classification apps on Apple platforms.

The next article in the original series takes exactly that output and uses it to change wallpapers by time of day. This page stops one step earlier, at the point where raw images become structured labels.

Label new images with a Core ML model through Vision on iOS and macOS

This article is useful because it shows the exact handoff point between a trained .mlmodel file and runnable app code.

This post sits between training the model and using the result for wallpaper automation.

Vision is the layer that turns your model into an image-analysis request with a standard Apple-side result format.

Copy the trained .mlmodel into the app target, then wrap the generated model class in VNCoreMLModel.

Build a VNCoreMLRequest, then read the top VNClassificationObservation in the completion handler.

Use VNImageRequestHandler to provide the actual image, then perform the request off the main thread.