Displaying ML Model Probabilities in iOS: Building Trust with Confidence Scores

Machine learning models don’t just make predictions - they also know how confident they are. When building ML-powered features in iOS apps, showing these confidence scores to users can make your app feel more transparent and trustworthy.

Why Show Prediction Probabilities?

For Users:

  • Transparency: Users see when the model is uncertain
  • Trust: Showing “85% confident this is a clothing question” builds credibility
  • Fallback options: When confidence is low, you can offer manual alternatives

For Developers:

  • Debug classification issues: Identify edge cases where the model struggles
  • Set confidence thresholds: Decide when to show predictions vs. fallbacks
  • Improve training data: Find patterns in low-confidence predictions

The Problem: CreateML Text Classifiers Don’t Expose Probabilities

If you’ve searched for how to get probabilities from CreateML text classifiers, you’ve probably seen code like this:

let prediction = try model.prediction(text: text)
let confidence = prediction.labelProbability[prediction.label] ?? 0.0

This doesn’t work. When you add a CreateML Text Classifier to Xcode and inspect the auto-generated Swift interface, you’ll find:

class ClothingQuestionClassifierOutput {
    var label: String { ... }  // Only this is available
    // No labelProbability property!
}

The model only provides the predicted label - no confidence scores.

The Solution: Wrap CoreML in NLModel

Apple’s Natural Language framework provides NLModel, a wrapper designed specifically for text classification. While the raw CoreML interface only exposes the top label, NLModel provides:

  • predictedLabel(for:) - Get the top prediction
  • predictedLabelHypotheses(for:maximumCount:) - Get all labels with probabilities

This isn’t a workaround - it’s the intended API for text classification models created with CreateML.

What is NLModel?

NLModel is part of the Natural Language framework (import NaturalLanguage). It wraps CoreML models that were trained for text classification tasks and exposes NLP-specific functionality that CoreML’s generic interface doesn’t provide.

When you create a text classifier in CreateML, the underlying model architecture is optimized for the Natural Language framework. The probability distribution is computed during inference, but CoreML’s auto-generated interface simply doesn’t expose it. NLModel does.

Step 1: Train Your CreateML Model

Create a text classifier in CreateML with your training data:

clothing_question_training_data.csv:

text,label
Do you have size M?,valid
Any Nike?,valid
Got hoodies?,valid
What time is it?,invalid
Hello,invalid
Do you sell laptops?,invalid

Train the model in CreateML (Xcode → Open Developer Tool → Create ML), export the .mlmodel file, and add it to your Xcode project.

Step 2: Create a Classifier Using NLModel

Here’s a clean implementation that provides real confidence scores:

import Foundation
import CoreML
import NaturalLanguage

class ClothingQuestionDetector {

    // MARK: - Types

    enum ClassificationLabel: String {
        case valid
        case invalid

        var description: String {
            switch self {
            case .valid: return "Valid clothing question"
            case .invalid: return "Not a clothing question"
            }
        }

        var icon: String {
            switch self {
            case .valid: return ""
            case .invalid: return ""
            }
        }
    }

    struct ClassificationResult {
        let label: ClassificationLabel
        let confidence: Double
        let allPredictions: [String: Double]

        var isValid: Bool {
            label == .valid
        }

        var confidencePercentage: String {
            String(format: "%.0f%%", confidence * 100)
        }

        var displayText: String {
            "\(label.icon) \(label.description) (\(confidencePercentage))"
        }
    }

    enum ClassificationError: Error {
        case modelNotLoaded
        case noLabelPredicted
    }

    // MARK: - Properties

    private let nlModel: NLModel

    // MARK: - Initialization

    init() throws {
        let config = MLModelConfiguration()
        let coreMLModel = try ClothingQuestionClassifier(configuration: config)

        guard let nlModel = try? NLModel(mlModel: coreMLModel.model) else {
            throw ClassificationError.modelNotLoaded
        }

        self.nlModel = nlModel
    }

    // MARK: - Public API

    func classify(_ text: String) -> Result<ClassificationResult, ClassificationError> {
        guard let predictedLabel = nlModel.predictedLabel(for: text) else {
            return .failure(.noLabelPredicted)
        }

        let result = buildClassificationResult(predictedLabel: predictedLabel, text: text)
        return .success(result)
    }

    func isClothingQuestion(_ text: String, minimumConfidence: Double = 0.6) -> Bool {
        guard case .success(let result) = classify(text) else {
            return false
        }
        return result.isValid && result.confidence >= minimumConfidence
    }

    // MARK: - Private Helpers

    private func buildClassificationResult(predictedLabel: String, text: String) -> ClassificationResult {
        let label = ClassificationLabel(rawValue: predictedLabel) ?? .invalid

        // This is the key: predictedLabelHypotheses provides all probabilities
        let hypotheses = nlModel.predictedLabelHypotheses(for: text, maximumCount: 10)
        let allPredictions = Dictionary(uniqueKeysWithValues: hypotheses.map { ($0.key, $0.value) })
        let confidence = allPredictions[predictedLabel] ?? 0.0

        return ClassificationResult(
            label: label,
            confidence: confidence,
            allPredictions: allPredictions
        )
    }
}

Key points:

  1. The init throws - If the model can’t load, you know immediately. No silent failures with fake confidence values.

  2. nlModel is non-optional - Once initialized, the detector is guaranteed to work. No optional unwrapping throughout the code.

  3. classify returns Result - Explicit success/failure handling. The caller decides how to handle errors.

  4. predictedLabelHypotheses - This is the magic method. It returns a [String: Double] dictionary with all labels and their probabilities.

Step 3: Displaying Probabilities in SwiftUI

Simple Confidence Badge

struct ConfidenceBadge: View {
    let confidence: Double

    var confidenceColor: Color {
        switch confidence {
        case 0.8...1.0: return .green
        case 0.5..<0.8: return .orange
        default: return .red
        }
    }

    var body: some View {
        HStack(spacing: 4) {
            Image(systemName: "checkmark.seal.fill")
                .font(.caption)
            Text("\(Int(confidence * 100))%")
                .font(.caption)
                .fontWeight(.semibold)
        }
        .foregroundColor(confidenceColor)
        .padding(.horizontal, 8)
        .padding(.vertical, 4)
        .background(confidenceColor.opacity(0.15))
        .cornerRadius(8)
    }
}

Full Prediction View

struct ClassificationPredictionView: View {
    let result: ClothingQuestionDetector.ClassificationResult

    var topPredictions: [(label: String, confidence: Double)] {
        result.allPredictions
            .sorted { $0.value > $1.value }
            .map { (label: $0.key, confidence: $0.value) }
    }

    var body: some View {
        VStack(alignment: .leading, spacing: 16) {
            // Top prediction
            HStack {
                Text(result.label.description)
                    .font(.title3)
                    .fontWeight(.semibold)
                Spacer()
                ConfidenceBadge(confidence: result.confidence)
            }

            if !result.allPredictions.isEmpty {
                Divider()

                // All predictions breakdown
                ForEach(topPredictions, id: \.label) { prediction in
                    HStack {
                        Text(prediction.label.capitalized)
                        Spacer()
                        Text("\(Int(prediction.confidence * 100))%")
                            .foregroundColor(.secondary)
                    }
                }
            }
        }
        .padding()
        .background(Color(.systemBackground))
        .cornerRadius(12)
    }
}

Example Usage

struct ContentView: View {
    @State private var userInput = ""
    @State private var result: Result<ClothingQuestionDetector.ClassificationResult, ClothingQuestionDetector.ClassificationError>?

    private let detector: ClothingQuestionDetector?

    init() {
        detector = try? ClothingQuestionDetector()
    }

    var body: some View {
        VStack(spacing: 20) {
            TextField("Ask about clothing...", text: $userInput)
                .textFieldStyle(RoundedBorderTextFieldStyle())
                .padding()

            Button("Classify") {
                guard let detector else { return }
                result = detector.classify(userInput)
            }
            .buttonStyle(.borderedProminent)

            if case .success(let classification) = result {
                ClassificationPredictionView(result: classification)
                    .padding()
            }

            Spacer()
        }
    }
}

Setting Confidence Thresholds

Use confidence thresholds to decide when to trust the model:

func handleUserInput(_ query: String) {
    guard let detector else { return }

    switch detector.classify(query) {
    case .success(let result) where result.confidence >= 0.8:
        // High confidence - act on prediction
        if result.isValid {
            showClothingProducts(for: query)
        } else {
            showGeneralHelp()
        }

    case .success(let result) where result.confidence >= 0.5:
        // Medium confidence - ask for confirmation
        showSuggestion("Is this a clothing question?", confidence: result.confidence)

    case .success:
        // Low confidence - offer options
        showMultipleOptions(["Search clothing", "Browse categories", "Contact support"])

    case .failure:
        // Classification failed
        showFallbackUI()
    }
}

Performance

Using NLModel has minimal overhead:

  • Inference time: <10ms on modern iPhones (same as direct CoreML)
  • Memory: Negligible additional overhead
  • The probabilities are computed during inference anyway - NLModel just exposes them

Key Takeaways

  1. CreateML text classifiers don’t expose labelProbability through the auto-generated CoreML interface
  2. Wrap your model in NLModel to access predictedLabelHypotheses(for:maximumCount:)
  3. Use Result type for explicit error handling
  4. Make nlModel non-optional - if initialization fails, throw immediately
  5. Set confidence thresholds to determine how much to trust predictions

Further Reading