Displaying ML Model Probabilities in iOS: Building Trust with Confidence Scores
Machine learning models don’t just make predictions - they also know how confident they are. When building ML-powered features in iOS apps, showing these confidence scores to users can make your app feel more transparent and trustworthy.
Why Show Prediction Probabilities?
For Users:
- Transparency: Users see when the model is uncertain
- Trust: Showing “85% confident this is a clothing question” builds credibility
- Fallback options: When confidence is low, you can offer manual alternatives
For Developers:
- Debug classification issues: Identify edge cases where the model struggles
- Set confidence thresholds: Decide when to show predictions vs. fallbacks
- Improve training data: Find patterns in low-confidence predictions
The Problem: CreateML Text Classifiers Don’t Expose Probabilities
If you’ve searched for how to get probabilities from CreateML text classifiers, you’ve probably seen code like this:
let prediction = try model.prediction(text: text)
let confidence = prediction.labelProbability[prediction.label] ?? 0.0
This doesn’t work. When you add a CreateML Text Classifier to Xcode and inspect the auto-generated Swift interface, you’ll find:
class ClothingQuestionClassifierOutput {
var label: String { ... } // Only this is available
// No labelProbability property!
}
The model only provides the predicted label - no confidence scores.
The Solution: Wrap CoreML in NLModel
Apple’s Natural Language framework provides NLModel, a wrapper designed specifically for text classification. While the raw CoreML interface only exposes the top label, NLModel provides:
predictedLabel(for:)- Get the top predictionpredictedLabelHypotheses(for:maximumCount:)- Get all labels with probabilities
This isn’t a workaround - it’s the intended API for text classification models created with CreateML.
What is NLModel?
NLModel is part of the Natural Language framework (import NaturalLanguage). It wraps CoreML models that were trained for text classification tasks and exposes NLP-specific functionality that CoreML’s generic interface doesn’t provide.
When you create a text classifier in CreateML, the underlying model architecture is optimized for the Natural Language framework. The probability distribution is computed during inference, but CoreML’s auto-generated interface simply doesn’t expose it. NLModel does.
Step 1: Train Your CreateML Model
Create a text classifier in CreateML with your training data:
clothing_question_training_data.csv:
text,label
Do you have size M?,valid
Any Nike?,valid
Got hoodies?,valid
What time is it?,invalid
Hello,invalid
Do you sell laptops?,invalid
Train the model in CreateML (Xcode → Open Developer Tool → Create ML), export the .mlmodel file, and add it to your Xcode project.
Step 2: Create a Classifier Using NLModel
Here’s a clean implementation that provides real confidence scores:
import Foundation
import CoreML
import NaturalLanguage
class ClothingQuestionDetector {
// MARK: - Types
enum ClassificationLabel: String {
case valid
case invalid
var description: String {
switch self {
case .valid: return "Valid clothing question"
case .invalid: return "Not a clothing question"
}
}
var icon: String {
switch self {
case .valid: return "✓"
case .invalid: return "✗"
}
}
}
struct ClassificationResult {
let label: ClassificationLabel
let confidence: Double
let allPredictions: [String: Double]
var isValid: Bool {
label == .valid
}
var confidencePercentage: String {
String(format: "%.0f%%", confidence * 100)
}
var displayText: String {
"\(label.icon) \(label.description) (\(confidencePercentage))"
}
}
enum ClassificationError: Error {
case modelNotLoaded
case noLabelPredicted
}
// MARK: - Properties
private let nlModel: NLModel
// MARK: - Initialization
init() throws {
let config = MLModelConfiguration()
let coreMLModel = try ClothingQuestionClassifier(configuration: config)
guard let nlModel = try? NLModel(mlModel: coreMLModel.model) else {
throw ClassificationError.modelNotLoaded
}
self.nlModel = nlModel
}
// MARK: - Public API
func classify(_ text: String) -> Result<ClassificationResult, ClassificationError> {
guard let predictedLabel = nlModel.predictedLabel(for: text) else {
return .failure(.noLabelPredicted)
}
let result = buildClassificationResult(predictedLabel: predictedLabel, text: text)
return .success(result)
}
func isClothingQuestion(_ text: String, minimumConfidence: Double = 0.6) -> Bool {
guard case .success(let result) = classify(text) else {
return false
}
return result.isValid && result.confidence >= minimumConfidence
}
// MARK: - Private Helpers
private func buildClassificationResult(predictedLabel: String, text: String) -> ClassificationResult {
let label = ClassificationLabel(rawValue: predictedLabel) ?? .invalid
// This is the key: predictedLabelHypotheses provides all probabilities
let hypotheses = nlModel.predictedLabelHypotheses(for: text, maximumCount: 10)
let allPredictions = Dictionary(uniqueKeysWithValues: hypotheses.map { ($0.key, $0.value) })
let confidence = allPredictions[predictedLabel] ?? 0.0
return ClassificationResult(
label: label,
confidence: confidence,
allPredictions: allPredictions
)
}
}
Key points:
-
The
initthrows - If the model can’t load, you know immediately. No silent failures with fake confidence values. -
nlModelis non-optional - Once initialized, the detector is guaranteed to work. No optional unwrapping throughout the code. -
classifyreturnsResult- Explicit success/failure handling. The caller decides how to handle errors. -
predictedLabelHypotheses- This is the magic method. It returns a[String: Double]dictionary with all labels and their probabilities.
Step 3: Displaying Probabilities in SwiftUI
Simple Confidence Badge
struct ConfidenceBadge: View {
let confidence: Double
var confidenceColor: Color {
switch confidence {
case 0.8...1.0: return .green
case 0.5..<0.8: return .orange
default: return .red
}
}
var body: some View {
HStack(spacing: 4) {
Image(systemName: "checkmark.seal.fill")
.font(.caption)
Text("\(Int(confidence * 100))%")
.font(.caption)
.fontWeight(.semibold)
}
.foregroundColor(confidenceColor)
.padding(.horizontal, 8)
.padding(.vertical, 4)
.background(confidenceColor.opacity(0.15))
.cornerRadius(8)
}
}
Full Prediction View
struct ClassificationPredictionView: View {
let result: ClothingQuestionDetector.ClassificationResult
var topPredictions: [(label: String, confidence: Double)] {
result.allPredictions
.sorted { $0.value > $1.value }
.map { (label: $0.key, confidence: $0.value) }
}
var body: some View {
VStack(alignment: .leading, spacing: 16) {
// Top prediction
HStack {
Text(result.label.description)
.font(.title3)
.fontWeight(.semibold)
Spacer()
ConfidenceBadge(confidence: result.confidence)
}
if !result.allPredictions.isEmpty {
Divider()
// All predictions breakdown
ForEach(topPredictions, id: \.label) { prediction in
HStack {
Text(prediction.label.capitalized)
Spacer()
Text("\(Int(prediction.confidence * 100))%")
.foregroundColor(.secondary)
}
}
}
}
.padding()
.background(Color(.systemBackground))
.cornerRadius(12)
}
}
Example Usage
struct ContentView: View {
@State private var userInput = ""
@State private var result: Result<ClothingQuestionDetector.ClassificationResult, ClothingQuestionDetector.ClassificationError>?
private let detector: ClothingQuestionDetector?
init() {
detector = try? ClothingQuestionDetector()
}
var body: some View {
VStack(spacing: 20) {
TextField("Ask about clothing...", text: $userInput)
.textFieldStyle(RoundedBorderTextFieldStyle())
.padding()
Button("Classify") {
guard let detector else { return }
result = detector.classify(userInput)
}
.buttonStyle(.borderedProminent)
if case .success(let classification) = result {
ClassificationPredictionView(result: classification)
.padding()
}
Spacer()
}
}
}
Setting Confidence Thresholds
Use confidence thresholds to decide when to trust the model:
func handleUserInput(_ query: String) {
guard let detector else { return }
switch detector.classify(query) {
case .success(let result) where result.confidence >= 0.8:
// High confidence - act on prediction
if result.isValid {
showClothingProducts(for: query)
} else {
showGeneralHelp()
}
case .success(let result) where result.confidence >= 0.5:
// Medium confidence - ask for confirmation
showSuggestion("Is this a clothing question?", confidence: result.confidence)
case .success:
// Low confidence - offer options
showMultipleOptions(["Search clothing", "Browse categories", "Contact support"])
case .failure:
// Classification failed
showFallbackUI()
}
}
Performance
Using NLModel has minimal overhead:
- Inference time: <10ms on modern iPhones (same as direct CoreML)
- Memory: Negligible additional overhead
- The probabilities are computed during inference anyway -
NLModeljust exposes them
Key Takeaways
- CreateML text classifiers don’t expose
labelProbabilitythrough the auto-generated CoreML interface - Wrap your model in
NLModelto accesspredictedLabelHypotheses(for:maximumCount:) - Use
Resulttype for explicit error handling - Make
nlModelnon-optional - if initialization fails, throw immediately - Set confidence thresholds to determine how much to trust predictions