{"id":1901,"date":"2025-02-18T07:02:19","date_gmt":"2025-02-18T07:02:19","guid":{"rendered":"https:\/\/mailitics.com\/index.php\/2025\/02\/18\/on-device-machine-learning-in-spatial-computing\/"},"modified":"2025-02-18T07:02:19","modified_gmt":"2025-02-18T07:02:19","slug":"on-device-machine-learning-in-spatial-computing","status":"publish","type":"post","link":"https:\/\/mailitics.com\/index.php\/2025\/02\/18\/on-device-machine-learning-in-spatial-computing\/","title":{"rendered":"On-Device Machine Learning in Spatial Computing"},"content":{"rendered":"<p>    On-Device Machine Learning in Spatial Computing<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n    <!-- no image --><br \/>\n \t<BR><br \/>\n<BR><\/BR><\/p>\n<div>\n<p class=\"wp-block-paragraph\" id=\"edcd\">The landscape of computing is undergoing a profound transformation with the emergence of spatial computing platforms(VR and AR). As we step into this new era, the intersection of virtual reality, <a href=\"https:\/\/towardsdatascience.com\/tag\/augmented-reality\/\" title=\"Augmented Reality\">Augmented Reality<\/a>, and on-device machine learning presents unprecedented opportunities for developers to create experiences that seamlessly blend digital content with the physical world.<\/p>\n<p class=\"wp-block-paragraph\" id=\"91bd\">The introduction of\u00a0<strong>visionOS<\/strong>\u00a0marks a significant milestone in this evolution. Apple\u2019s <a href=\"https:\/\/towardsdatascience.com\/tag\/spatial-computing\/\" title=\"Spatial Computing\">Spatial Computing<\/a> platform combines sophisticated hardware capabilities with powerful development frameworks, enabling developers to build applications that can understand and interact with the physical environment in real time. This convergence of spatial awareness and on-device machine learning capabilities opens up new possibilities for object recognition and tracking applications that were previously challenging to implement.<\/p>\n<hr class=\"wp-block-separator has-alpha-channel-opacity is-style-dotted\">\n<h2 class=\"wp-block-heading\" id=\"7cea\">What We\u2019re Building<\/h2>\n<p class=\"wp-block-paragraph\" id=\"5e6b\">In this guide, we\u2019ll be building an app that showcases the power of on-device machine learning in visionOS. We\u2019ll create an app that can recognize and track a diet soda can in real time, overlaying visual indicators and information directly in the user\u2019s field of view.<\/p>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"960\" height=\"540\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_jbhMhmmAiyBzSETEM8LtTg.gif?resize=960%2C540&#038;ssl=1\" alt=\"\" class=\"wp-image-598018\"><\/figure>\n<p class=\"wp-block-paragraph\" id=\"4972\">Our app will leverage several key technologies in the visionOS ecosystem. When a user runs the app, they\u2019re presented with a window containing a rotating 3D model of our target object along with usage instructions. As they look around their environment, the app continuously scans for diet soda cans. Upon detection, it displays dynamic bounding lines around the can and places a floating text label above it, all while maintaining precise tracking as the object or user moves through space.<\/p>\n<p class=\"wp-block-paragraph\" id=\"ab47\">Before we begin development, let\u2019s ensure we have the necessary tools and understanding in place. This tutorial requires:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">The latest version of Xcode 16 with visionOS SDK installed<\/li>\n<li class=\"wp-block-list-item\">visionOS 2.0 or later running on an Apple Vision Pro device<\/li>\n<li class=\"wp-block-list-item\">Basic familiarity with SwiftUI and the Swift programming language<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\" id=\"cf2d\">The development process will take us through several key stages, from capturing a 3D model of our target object to implementing real-time tracking and visualization. Each stage builds upon the previous one, giving you a thorough understanding of developing features powered by on-device machine learning for visionOS.<\/p>\n<h2 class=\"wp-block-heading\" id=\"f2a8\">Building the Foundation: 3D Object Capture<\/h2>\n<p class=\"wp-block-paragraph\" id=\"fed8\">The first step in creating our object recognition system involves capturing a detailed 3D model of our target object. Apple provides a powerful app for this purpose:\u00a0<a href=\"https:\/\/apps.apple.com\/us\/app\/reality-composer\/id1462358802\" rel=\"noreferrer noopener\" target=\"_blank\"><strong>RealityComposer<\/strong><\/a>, available for iOS through the App Store.<\/p>\n<p class=\"wp-block-paragraph\" id=\"718b\">When capturing a 3D model, environmental conditions play a crucial role in the quality of our results. Setting up the capture environment properly ensures we get the best possible data for our machine learning model. A well-lit space with consistent lighting helps the capture system accurately detect the object\u2019s features and dimensions. The diet soda can should be placed on a surface with good contrast, making it easier for the system to distinguish the object\u2019s boundaries.<\/p>\n<p class=\"wp-block-paragraph\" id=\"35bb\">The capture process begins by launching the\u00a0<strong>RealityComposer<\/strong>\u00a0app and selecting \u201cObject Capture\u201d from the available options. The app guides us through positioning a bounding box around our target object. This bounding box is critical as it defines the spatial boundaries of our capture volume.<\/p>\n<figure class=\"wp-block-image alignwide size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"3f3b36\" data-has-transparency=\"true\" style=\"--dominant-color: #3f3b36;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"500\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_57iTM2hbc9E7RKn06slksg-1024x500.webp?resize=1024%2C500&#038;ssl=1\" alt=\"\" class=\"wp-image-598019 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_57iTM2hbc9E7RKn06slksg-1024x500.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_57iTM2hbc9E7RKn06slksg-300x147.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_57iTM2hbc9E7RKn06slksg-768x375.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_57iTM2hbc9E7RKn06slksg-1536x750.webp 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_57iTM2hbc9E7RKn06slksg-2048x1000.webp 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">RealityComposer \u2014 Object Capture Flow \u2014 Image By Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"a2b6\">Once we\u2019ve captured all the details of the soda can with the help of the in-app guide and processed the images, a\u00a0<strong>.usdz<\/strong>\u00a0file containing our 3D model will be created. This file format is specifically designed for AR\/VR applications and contains not just the visual representation of our object, but also important information that will be used in the training process.<\/p>\n<h2 class=\"wp-block-heading\" id=\"653f\">Training the Reference Model<\/h2>\n<p class=\"wp-block-paragraph\" id=\"5d54\">With our 3D model in hand, we move to the next crucial phase: training our recognition model using\u00a0<strong>Create ML<\/strong>. Apple\u2019s\u00a0<strong>Create ML<\/strong>\u00a0application provides a straightforward interface for training machine learning models, including specialized templates for spatial computing applications.<\/p>\n<p class=\"wp-block-paragraph\" id=\"d899\">To begin the training process, we launch\u00a0<strong>Create ML<\/strong>\u00a0and select the \u201cObject Tracking\u201d template from the spatial category. This template is specifically designed for training models that can recognize and track objects in three-dimensional space.<\/p>\n<figure class=\"wp-block-image alignwide size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"929293\" data-has-transparency=\"false\" style=\"--dominant-color: #929293;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"341\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dONrH3vKlYdzLfZAKOXJig-1024x341.webp?resize=1024%2C341&#038;ssl=1\" alt=\"\" class=\"wp-image-598020 not-transparent\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dONrH3vKlYdzLfZAKOXJig-1024x341.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dONrH3vKlYdzLfZAKOXJig-300x100.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dONrH3vKlYdzLfZAKOXJig-768x256.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dONrH3vKlYdzLfZAKOXJig-1536x512.webp 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_dONrH3vKlYdzLfZAKOXJig-2048x683.webp 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">CreateML Project Setup \u2014 Image By Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"aed4\">After creating a new project, we import our\u00a0<strong>.usdz\u00a0<\/strong>file into Create ML. The system automatically analyzes the 3D model and extracts key features that will be used for recognition. The interface provides options for configuring how our object should be recognized in space, including viewing angles and tracking preferences.<\/p>\n<p class=\"wp-block-paragraph\" id=\"ae34\">Once you\u2019ve imported the 3d model and analyzed it in various angles, go ahead and click on \u201cTrain\u201d.\u00a0<strong>Create ML<\/strong>\u00a0will process our model and begin the training phase. During this phase, the system learns to recognize our object from various angles and under different conditions. The training process can take several hours as the system builds a comprehensive understanding of our object\u2019s characteristics.<\/p>\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"262626\" data-has-transparency=\"true\" style=\"--dominant-color: #262626;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"599\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_OXwbNRwiYM1UQpXrNISaVw-1024x599.webp?resize=1024%2C599&#038;ssl=1\" alt=\"\" class=\"wp-image-598021 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_OXwbNRwiYM1UQpXrNISaVw-1024x599.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_OXwbNRwiYM1UQpXrNISaVw-300x176.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_OXwbNRwiYM1UQpXrNISaVw-768x449.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_OXwbNRwiYM1UQpXrNISaVw.webp 1400w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Create ML Training Process \u2014 Image By Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"60da\">The output of this training process is a\u00a0<strong>.referenceobject<\/strong>\u00a0file, which contains the trained model data optimized for real-time object detection in visionOS. This file encapsulates all the learned features and recognition parameters that will enable our app to identify diet soda cans in the user\u2019s environment.<\/p>\n<p class=\"wp-block-paragraph\" id=\"64b4\">The successful creation of our reference object marks an important milestone in our development process. We now have a trained model capable of recognizing our target object in real-time, setting the stage for implementing the actual detection and visualization functionality in our visionOS application.<\/p>\n<h2 class=\"wp-block-heading\" id=\"53ce\">Initial Project Setup<\/h2>\n<p class=\"wp-block-paragraph\">Now that we have our trained reference object, let\u2019s set up our visionOS project. Launch\u00a0<strong>Xcode<\/strong>\u00a0and select \u201cCreate a new Xcode project\u201d. In the template selector, choose visionOS under the platforms filter and select \u201cApp\u201d. This template provides the basic structure needed for a visionOS application.<\/p>\n<figure class=\"wp-block-image alignwide size-large\"><img data-recalc-dims=\"1\" data-dominant-color=\"262623\" data-has-transparency=\"true\" style=\"--dominant-color: #262623;\" loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"387\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5aZNx3L9-ivocxdOPsNfw-1024x387.webp?resize=1024%2C387&#038;ssl=1\" alt=\"\" class=\"wp-image-598022 has-transparency\" srcset=\"https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5aZNx3L9-ivocxdOPsNfw-1024x387.webp 1024w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5aZNx3L9-ivocxdOPsNfw-300x113.webp 300w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5aZNx3L9-ivocxdOPsNfw-768x290.webp 768w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5aZNx3L9-ivocxdOPsNfw-1536x580.webp 1536w, https:\/\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_5aZNx3L9-ivocxdOPsNfw-2048x774.webp 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\"><figcaption class=\"wp-element-caption\">Xcode visionOS Project Setup \u2014 Image By Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"9d30\">In the project configuration dialog, configure your project with these primary settings:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Product Name: SodaTracker<\/li>\n<li class=\"wp-block-list-item\">Initial Scene: Window<\/li>\n<li class=\"wp-block-list-item\">Immersive Space Renderer: RealityKit<\/li>\n<li class=\"wp-block-list-item\">Immersive Space: Mixed<\/li>\n<\/ul>\n<p class=\"wp-block-paragraph\" id=\"a17b\">After project creation, we need to make a few essential modifications. First, delete the file named\u00a0<strong>ToggleImmersiveSpaceButton.swift<\/strong>\u00a0as we won\u2019t be using it in our implementation.<\/p>\n<p class=\"wp-block-paragraph\" id=\"7590\">Next, we\u2019ll add our previously created assets to the project. In Xcode\u2019s Project Navigator, locate the \u201c<strong>RealityKitContent.rkassets<\/strong>\u201d folder and add the 3D object file (\u201c<strong>SodaModel.usdz<\/strong>\u201d file). This 3D model will be used in our informative view. Create a new group named \u201c<strong>ReferenceObjects<\/strong>\u201d and add the \u201c<strong>Diet Soda.referenceobject<\/strong>\u201d file we generated using Create ML.<\/p>\n<p class=\"wp-block-paragraph\" id=\"42a8\">The final setup step is to configure the necessary permission for object tracking. Open your project\u2019s\u00a0<strong>Info.plist<\/strong>\u00a0file and add a new key:\u00a0<strong>NSWorldSensingUsageDescription<\/strong>. Set its value to \u201cUsed to track diet sodas\u201d. This permission is required for the app to detect and track objects in the user\u2019s environment.<\/p>\n<p class=\"wp-block-paragraph\" id=\"86a3\">With these setup steps complete, we have a properly configured visionOS project ready for implementing our object tracking functionality.<\/p>\n<h2 class=\"wp-block-heading\" id=\"80d0\">Entry Point Implementation<\/h2>\n<p class=\"wp-block-paragraph\" id=\"4b8f\">Let\u2019s start with\u00a0<strong>SodaTrackerApp.swift<\/strong>, which was automatically created when we set up our visionOS project. We need to modify this file to support our object tracking functionality. Replace the default implementation with the following code:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">import SwiftUI\n\n\/**\n SodaTrackerApp is the main entry point for the application.\n It configures the app's window and immersive space, and manages\n the initialization of object detection capabilities.\n \n The app automatically launches into an immersive experience\n where users can see Diet Soda cans being detected and highlighted\n in their environment.\n *\/\n@main\nstruct SodaTrackerApp: App {\n    \/\/\/ Shared model that manages object detection state\n    @StateObject private var appModel = AppModel()\n    \n    \/\/\/ System environment value for launching immersive experiences\n    @Environment(.openImmersiveSpace) var openImmersiveSpace\n    \n    var body: some Scene {\n        WindowGroup {\n            ContentView()\n                .environmentObject(appModel)\n                .task {\n                    \/\/ Load and prepare object detection capabilities\n                    await appModel.initializeDetector()\n                }\n                .onAppear {\n                    Task {\n                        \/\/ Launch directly into immersive experience\n                        await openImmersiveSpace(id: appModel.immersiveSpaceID)\n                    }\n                }\n        }\n        .windowStyle(.plain)\n        .windowResizability(.contentSize)\n        \n        \/\/ Configure the immersive space for object detection\n        ImmersiveSpace(id: appModel.immersiveSpaceID) {\n            ImmersiveView()\n                .environment(appModel)\n        }\n        \/\/ Use mixed immersion to blend virtual content with reality\n        .immersionStyle(selection: .constant(.mixed), in: .mixed)\n        \/\/ Hide system UI for a more immersive experience\n        .persistentSystemOverlays(.hidden)\n    }\n}\n<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"5aae\">The key aspect of this implementation is the initialization and management of our object detection system. When the app launches, we initialize our\u00a0<strong>AppModel<\/strong>\u00a0which handles the\u00a0<strong>ARKit<\/strong>\u00a0session and object tracking setup. The initialization sequence is crucial:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">.task {\n    await appModel.initializeDetector()\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"7b20\">This asynchronous initialization loads our trained reference object and prepares the\u00a0<strong>ARKit<\/strong>\u00a0session for object tracking. We ensure this happens before opening the immersive space where the actual detection will occur.<\/p>\n<p class=\"wp-block-paragraph\" id=\"399b\">The immersive space configuration is particularly important for object tracking:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">.immersionStyle(selection: .constant(.mixed), in: .mixed)<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"e74c\">The mixed immersion style is essential for our object tracking implementation as it allows\u00a0<strong>RealityKit<\/strong>\u00a0to blend our visual indicators (bounding boxes and labels) with the real-world environment where we\u2019re detecting objects. This creates a seamless experience where digital content accurately aligns with physical objects in the user\u2019s space.<\/p>\n<p class=\"wp-block-paragraph\" id=\"138a\">With these modifications to\u00a0<strong>SodaTrackerApp.swift<\/strong>, our app is ready to begin the object detection process, with ARKit, RealityKit, and our trained model working together in the mixed reality environment. In the next section, we\u2019ll examine the core object detection functionality in\u00a0<strong>AppModel.swift<\/strong>, another file that was created during project setup.<\/p>\n<h2 class=\"wp-block-heading\" id=\"ca1d\">Core Detection Model Implementation<\/h2>\n<p class=\"wp-block-paragraph\" id=\"c644\"><strong>AppModel.swift<\/strong>, created during project setup, serves as our core detection system. This file manages the\u00a0<strong>ARKit<\/strong>\u00a0session, loads our trained model, and coordinates the object tracking process. Let\u2019s examine its implementation:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">import SwiftUI\nimport RealityKit\nimport ARKit\n\n\/**\n AppModel serves as the core model for the soda can detection application.\n It manages the ARKit session, handles object tracking initialization,\n and maintains the state of object detection throughout the app's lifecycle.\n \n This model is designed to work with visionOS's object tracking capabilities,\n specifically optimized for detecting Diet Soda cans in the user's environment.\n *\/\n@MainActor\n@Observable\nclass AppModel: ObservableObject {\n    \/\/\/ Unique identifier for the immersive space where object detection occurs\n    let immersiveSpaceID = \"SodaTracking\"\n    \n    \/\/\/ ARKit session instance that manages the core tracking functionality\n    \/\/\/ This session coordinates with visionOS to process spatial data\n    private var arSession = ARKitSession()\n    \n    \/\/\/ Dedicated provider that handles the real-time tracking of soda cans\n    \/\/\/ This maintains the state of currently tracked objects\n    private var sodaTracker: ObjectTrackingProvider?\n    \n    \/\/\/ Collection of reference objects used for detection\n    \/\/\/ These objects contain the trained model data for recognizing soda cans\n    private var targetObjects: [ReferenceObject] = []\n    \n    \/**\n     Initializes the object detection system by loading and preparing\n     the reference object (Diet Soda can) from the app bundle.\n     \n     This method loads a pre-trained model that contains spatial and\n     visual information about the Diet Soda can we want to detect.\n     *\/\n    func initializeDetector() async {\n        guard let objectURL = Bundle.main.url(forResource: \"Diet Soda\", withExtension: \"referenceobject\") else {\n            print(\"Error: Failed to locate reference object in bundle - ensure Diet Soda.referenceobject exists\")\n            return\n        }\n        \n        do {\n            let referenceObject = try await ReferenceObject(from: objectURL)\n            self.targetObjects = [referenceObject]\n        } catch {\n            print(\"Error: Failed to initialize reference object: (error)\")\n        }\n    }\n    \n    \/**\n     Starts the active object detection process using ARKit.\n     \n     This method initializes the tracking provider with loaded reference objects\n     and begins the real-time detection process in the user's environment.\n     \n     Returns: An ObjectTrackingProvider if successfully initialized, nil otherwise\n     *\/\n    func beginDetection() async -&gt; ObjectTrackingProvider? {\n        guard !targetObjects.isEmpty else { return nil }\n        \n        let tracker = ObjectTrackingProvider(referenceObjects: targetObjects)\n        do {\n            try await arSession.run([tracker])\n            self.sodaTracker = tracker\n            return tracker\n        } catch {\n            print(\"Error: Failed to initialize tracking: (error)\")\n            return nil\n        }\n    }\n    \n    \/**\n     Terminates the object detection process.\n     \n     This method safely stops the ARKit session and cleans up\n     tracking resources when object detection is no longer needed.\n     *\/\n    func endDetection() {\n        arSession.stop()\n    }\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"b687\">At the core of our implementation is\u00a0<strong>ARKitSession<\/strong>, visionOS\u2019s gateway to spatial computing capabilities. The\u00a0<strong>@MainActor<\/strong>\u00a0attribute ensures our object detection operations run on the main thread, which is crucial for synchronizing with the rendering pipeline.<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">private var arSession = ARKitSession()\nprivate var sodaTracker: ObjectTrackingProvider?\nprivate var targetObjects: [ReferenceObject] = []<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"4e0b\">The\u00a0<strong>ObjectTrackingProvider<\/strong>\u00a0is a specialized component in visionOS that handles real-time object detection. It works in conjunction with\u00a0<strong>ReferenceObject<\/strong>\u00a0instances, which contain the spatial and visual information from our trained model. We maintain these as private properties to ensure proper lifecycle management.<\/p>\n<p class=\"wp-block-paragraph\" id=\"74c5\">The initialization process is particularly important:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-\">let referenceObject = try await ReferenceObject(from: objectURL)\nself.targetObjects = [referenceObject]<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"2e51\">Here, we load our trained model (the .referenceobject file we created in Create ML) into a\u00a0<strong>ReferenceObject<\/strong>\u00a0instance. This process is asynchronous because the system needs to parse and prepare the model data for real-time detection.<\/p>\n<p class=\"wp-block-paragraph\" id=\"4bf3\">The beginDetection method sets up the actual tracking process:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">let tracker = ObjectTrackingProvider(referenceObjects: targetObjects)\ntry await arSession.run([tracker])<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"d056\">When we create the\u00a0<strong>ObjectTrackingProvider<\/strong>, we pass in our reference objects. The provider uses these to establish the detection parameters \u2014 what to look for, what features to match, and how to track the object in 3D space. The\u00a0<strong>ARKitSession.run<\/strong>\u00a0call activates the tracking system, beginning the real-time analysis of the user\u2019s environment.<\/p>\n<h2 class=\"wp-block-heading\" id=\"3314\">Immersive Experience Implementation<\/h2>\n<p class=\"wp-block-paragraph\" id=\"910d\"><strong>ImmersiveView.swift<\/strong>, provided in our initial project setup, manages the real-time object detection visualization in the user\u2019s space. This view processes the continuous stream of detection data and creates visual representations of detected objects. Here\u2019s the implementation:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">import SwiftUI\nimport RealityKit\nimport ARKit\n\n\/**\n ImmersiveView is responsible for creating and managing the augmented reality\n experience where object detection occurs. This view handles the real-time\n visualization of detected soda cans in the user's environment.\n \n It maintains a collection of visual representations for each detected object\n and updates them in real-time as objects are detected, moved, or removed\n from view.\n *\/\nstruct ImmersiveView: View {\n    \/\/\/ Access to the app's shared model for object detection functionality\n    @Environment(AppModel.self) private var appModel\n    \n    \/\/\/ Root entity that serves as the parent for all AR content\n    \/\/\/ This entity provides a consistent coordinate space for all visualizations\n    @State private var sceneRoot = Entity()\n    \n    \/\/\/ Maps unique object identifiers to their visual representations\n    \/\/\/ Enables efficient updating of specific object visualizations\n    @State private var activeVisualizations: [UUID: ObjectVisualization] = [:]\n    \n    var body: some View {\n        RealityView { content in\n            \/\/ Initialize the AR scene with our root entity\n            content.add(sceneRoot)\n            \n            Task {\n                \/\/ Begin object detection and track changes\n                let detector = await appModel.beginDetection()\n                guard let detector else { return }\n                \n                \/\/ Process real-time updates for object detection\n                for await update in detector.anchorUpdates {\n                    let anchor = update.anchor\n                    let id = anchor.id\n                    \n                    switch update.event {\n                    case .added:\n                        \/\/ Object newly detected - create and add visualization\n                        let visualization = ObjectVisualization(for: anchor)\n                        activeVisualizations[id] = visualization\n                        sceneRoot.addChild(visualization.entity)\n                        \n                    case .updated:\n                        \/\/ Object moved - update its position and orientation\n                        activeVisualizations[id]?.refreshTracking(with: anchor)\n                        \n                    case .removed:\n                        \/\/ Object no longer visible - remove its visualization\n                        activeVisualizations[id]?.entity.removeFromParent()\n                        activeVisualizations.removeValue(forKey: id)\n                    }\n                }\n            }\n        }\n        .onDisappear {\n            \/\/ Clean up AR resources when view is dismissed\n            cleanupVisualizations()\n        }\n    }\n    \n    \/**\n     Removes all active visualizations and stops object detection.\n     This ensures proper cleanup of AR resources when the view is no longer active.\n     *\/\n    private func cleanupVisualizations() {\n        for (_, visualization) in activeVisualizations {\n            visualization.entity.removeFromParent()\n        }\n        activeVisualizations.removeAll()\n        appModel.endDetection()\n    }\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"4fa8\">The core of our object tracking visualization lies in the detector\u2019s\u00a0<strong>anchorUpdates<\/strong>\u00a0stream. This\u00a0<strong>ARKit<\/strong>\u00a0feature provides a continuous flow of object detection events:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">for await update in detector.anchorUpdates {\n    let anchor = update.anchor\n    let id = anchor.id\n    \n    switch update.event {\n    case .added:\n        \/\/ Object first detected\n    case .updated:\n        \/\/ Object position changed\n    case .removed:\n        \/\/ Object no longer visible\n    }\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"fc24\">Each\u00a0<strong>ObjectAnchor\u00a0<\/strong>contains crucial spatial data about the detected soda can, including its position, orientation, and bounding box in 3D space. When a new object is detected (.added event), we create a visualization that\u00a0<strong>RealityKit\u00a0<\/strong>will render in the correct position relative to the physical object. As the object or user moves, the .updated events ensure our virtual content stays perfectly aligned with the real world.<\/p>\n<h2 class=\"wp-block-heading\" id=\"418a\">Visual Feedback System<\/h2>\n<p class=\"wp-block-paragraph\" id=\"5d5a\">Create a new file named\u00a0<strong>ObjectVisualization.swift<\/strong>\u00a0for handling the visual representation of detected objects. This component is responsible for creating and managing the bounding box and text overlay that appears around detected soda cans:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">import RealityKit\nimport ARKit\nimport UIKit\nimport SwiftUI\n\n\/**\n ObjectVisualization manages the visual elements that appear when a soda can is detected.\n This class handles both the 3D text label that appears above the object and the\n bounding box that outlines the detected object in space.\n *\/\n@MainActor\nclass ObjectVisualization {\n    \/\/\/ Root entity that contains all visual elements\n    var entity: Entity\n    \n    \/\/\/ Entity specifically for the bounding box visualization\n    private var boundingBox: Entity\n    \n    \/\/\/ Width of bounding box lines - 0.003 provides optimal visibility without being too intrusive\n    private let outlineWidth: Float = 0.003\n    \n    init(for anchor: ObjectAnchor) {\n        entity = Entity()\n        boundingBox = Entity()\n        \n        \/\/ Set up the main entity's transform based on the detected object's position\n        entity.transform = Transform(matrix: anchor.originFromAnchorTransform)\n        entity.isEnabled = anchor.isTracked\n        \n        createFloatingLabel(for: anchor)\n        setupBoundingBox(for: anchor)\n        refreshBoundingBoxGeometry(with: anchor)\n    }\n    \n    \/**\n     Creates a floating text label that hovers above the detected object.\n     The text uses Avenir Next font for optimal readability in AR space and\n     is positioned slightly above the object for clear visibility.\n     *\/\n    private func createFloatingLabel(for anchor: ObjectAnchor) {\n        \/\/ 0.06 units provides optimal text size for viewing at typical distances\n        let labelSize: Float = 0.06\n        \n        \/\/ Use Avenir Next for its clarity and modern appearance in AR\n        let font = MeshResource.Font(name: \"Avenir Next\", size: CGFloat(labelSize))!\n        let textMesh = MeshResource.generateText(\"Diet Soda\",\n                                               extrusionDepth: labelSize * 0.15,\n                                               font: font)\n        \n        \/\/ Create a material that makes text clearly visible against any background\n        var textMaterial = UnlitMaterial()\n        textMaterial.color = .init(tint: .orange)\n        \n        let textEntity = ModelEntity(mesh: textMesh, materials: [textMaterial])\n        \n        \/\/ Position text above object with enough clearance to avoid intersection\n        textEntity.transform.translation = SIMD3(\n            anchor.boundingBox.center.x - textMesh.bounds.max.x \/ 2,\n            anchor.boundingBox.extent.y + labelSize * 1.5,\n            0\n        )\n        \n        entity.addChild(textEntity)\n    }\n    \n    \/**\n     Creates a bounding box visualization that outlines the detected object.\n     Uses a magenta color transparency to provide a clear\n     but non-distracting visual boundary around the detected soda can.\n     *\/\n    private func setupBoundingBox(for anchor: ObjectAnchor) {\n        let boxMesh = MeshResource.generateBox(size: [1.0, 1.0, 1.0])\n        \n        \/\/ Create a single material for all edges with magenta color\n        let boundsMaterial = UnlitMaterial(color: .magenta.withAlphaComponent(0.4))\n        \n        \/\/ Create all edges with uniform appearance\n        for _ in 0..&lt;12 {\n            let edge = ModelEntity(mesh: boxMesh, materials: [boundsMaterial])\n            boundingBox.addChild(edge)\n        }\n        \n        entity.addChild(boundingBox)\n    }\n    \n    \/**\n     Updates the visualization when the tracked object moves.\n     This ensures the bounding box and text maintain accurate positioning\n     relative to the physical object being tracked.\n     *\/\n    func refreshTracking(with anchor: ObjectAnchor) {\n        entity.isEnabled = anchor.isTracked\n        guard anchor.isTracked else { return }\n        \n        entity.transform = Transform(matrix: anchor.originFromAnchorTransform)\n        refreshBoundingBoxGeometry(with: anchor)\n    }\n    \n    \/**\n     Updates the bounding box geometry to match the detected object's dimensions.\n     Creates a precise outline that exactly matches the physical object's boundaries\n     while maintaining the gradient visual effect.\n     *\/\n    private func refreshBoundingBoxGeometry(with anchor: ObjectAnchor) {\n        let extent = anchor.boundingBox.extent\n        boundingBox.transform.translation = anchor.boundingBox.center\n        \n        for (index, edge) in boundingBox.children.enumerated() {\n            guard let edge = edge as? ModelEntity else { continue }\n            \n            switch index {\n            case 0...3:  \/\/ Horizontal edges along width\n                edge.scale = SIMD3(extent.x, outlineWidth, outlineWidth)\n                edge.position = [\n                    0,\n                    extent.y \/ 2 * (index % 2 == 0 ? -1 : 1),\n                    extent.z \/ 2 * (index &lt; 2 ? -1 : 1)\n                ]\n            case 4...7:  \/\/ Vertical edges along height\n                edge.scale = SIMD3(outlineWidth, extent.y, outlineWidth)\n                edge.position = [\n                    extent.x \/ 2 * (index % 2 == 0 ? -1 : 1),\n                    0,\n                    extent.z \/ 2 * (index &lt; 6 ? -1 : 1)\n                ]\n            case 8...11: \/\/ Depth edges\n                edge.scale = SIMD3(outlineWidth, outlineWidth, extent.z)\n                edge.position = [\n                    extent.x \/ 2 * (index % 2 == 0 ? -1 : 1),\n                    extent.y \/ 2 * (index &lt; 10 ? -1 : 1),\n                    0\n                ]\n            default:\n                break\n            }\n        }\n    }\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"3a14\">The bounding box creation is a key aspect of our visualization. Rather than using a single box mesh, we construct 12 individual edges that form a wireframe outline. This approach provides better visual clarity and allows for more precise control over the appearance. The edges are positioned using SIMD3 vectors for efficient spatial calculations:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">edge.position = [\n    extent.x \/ 2 * (index % 2 == 0 ? -1 : 1),\n    extent.y \/ 2 * (index &lt; 10 ? -1 : 1),\n    0\n]<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"a9d2\">This mathematical positioning ensures each edge aligns perfectly with the detected object\u2019s dimensions. The calculation uses the object\u2019s extent (width, height, depth) and creates a symmetrical arrangement around its center point.<\/p>\n<p class=\"wp-block-paragraph\" id=\"9cab\">This visualization system works in conjunction with our\u00a0<strong>ImmersiveView<\/strong>\u00a0to create real-time visual feedback. As the ImmersiveView receives position updates from ARKit, it calls refreshTracking on our visualization, which updates the transform matrices to maintain precise alignment between the virtual overlays and the physical object.<\/p>\n<h2 class=\"wp-block-heading\" id=\"f916\">Informative View<\/h2>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"960\" height=\"720\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_XNMziEMBPrAuffnRTidIoQ.gif?resize=960%2C720&#038;ssl=1\" alt=\"\" class=\"wp-image-598023\"><figcaption class=\"wp-element-caption\">ContentView With Instructions \u2014 Image By Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"a914\"><strong>ContentView.swift<\/strong>, provided in our project template, handles the informational interface for our app. Here\u2019s the implementation:<\/p>\n<pre class=\"wp-block-prismatic-blocks\"><code class=\"language-swift\">import SwiftUI\nimport RealityKit\nimport RealityKitContent\n\n\/**\n ContentView provides the main window interface for the application.\n Displays a rotating 3D model of the target object (Diet Soda can)\n along with clear instructions for users on how to use the detection feature.\n *\/\nstruct ContentView: View {\n    \/\/ State to control the continuous rotation animation\n    @State private var rotation: Double = 0\n    \n    var body: some View {\n        VStack(spacing: 30) {\n            \/\/ 3D model display with rotation animation\n            Model3D(named: \"SodaModel\", bundle: realityKitContentBundle)\n                .padding(.vertical, 20)\n                .frame(width: 200, height: 200)\n                .rotation3DEffect(\n                    .degrees(rotation),\n                    axis: (x: 0, y: 1, z: 0)\n                )\n                .onAppear {\n                    \/\/ Create continuous rotation animation\n                    withAnimation(.linear(duration: 5.0).repeatForever(autoreverses: true)) {\n                        rotation = 180\n                    }\n                }\n            \n            \/\/ Instructions for users\n            VStack(spacing: 15) {\n                Text(\"Diet Soda Detection\")\n                    .font(.title)\n                    .fontWeight(.bold)\n                \n                Text(\"Hold your diet soda can in front of you to see it automatically detected and highlighted in your space.\")\n                    .font(.body)\n                    .multilineTextAlignment(.center)\n                    .foregroundColor(.secondary)\n                    .padding(.horizontal)\n            }\n        }\n        .padding()\n        .frame(maxWidth: 400)\n    }\n}<\/code><\/pre>\n<p class=\"wp-block-paragraph\" id=\"2b67\">This implementation displays our 3D-scanned soda model (SodaModel.usdz) with a rotating animation, providing users with a clear reference of what the system is looking for. The rotation helps users understand how to present the object for optimal detection.<\/p>\n<p class=\"wp-block-paragraph\" id=\"13c1\">With these components in place, our application now provides a complete object detection experience. The system uses our trained model to recognize diet soda cans, creates precise visual indicators in real-time, and provides clear user guidance through the informational interface.<\/p>\n<h2 class=\"wp-block-heading\" id=\"75d3\">Conclusion<\/h2>\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" width=\"800\" height=\"450\" src=\"https:\/\/i0.wp.com\/towardsdatascience.com\/wp-content\/uploads\/2025\/02\/1_vavhs822x6lPZSIdmRgtbw.gif?resize=800%2C450&#038;ssl=1\" alt=\"\" class=\"wp-image-598024\"><figcaption class=\"wp-element-caption\">Our Final App \u2014 Image By Author<\/figcaption><\/figure>\n<p class=\"wp-block-paragraph\" id=\"db1b\">In this tutorial, we\u2019ve built a complete object detection system for visionOS that showcases the integration of several powerful technologies. Starting from 3D object capture, through ML model training in Create ML, to real-time detection using ARKit and RealityKit, we\u2019ve created an app that seamlessly detects and tracks objects in the user\u2019s space.<\/p>\n<p class=\"wp-block-paragraph\" id=\"490a\">This implementation represents just the beginning of what\u2019s possible with on-device machine learning in spatial computing. As hardware continues to evolve with more powerful Neural Engines and dedicated ML accelerators and frameworks like Core ML mature, we\u2019ll see increasingly sophisticated applications that can understand and interact with our physical world in real-time. The combination of spatial computing and on-device ML opens up possibilities for applications ranging from advanced AR experiences to intelligent environmental understanding, all while maintaining user privacy and low latency.<a href=\"https:\/\/medium.com\/@prithivdev?source=post_page---post_author_info--a46e91d5fc4f---------------------------------------\"><\/a><\/p>\n<p>The post <a href=\"https:\/\/towardsdatascience.com\/on-device-machine-learning-in-spatial-computing\/\">On-Device Machine Learning in Spatial Computing<\/a> appeared first on <a href=\"https:\/\/towardsdatascience.com\/\">Towards Data Science<\/a>.<\/p>\n<\/div>\n<p> \t<BR><br \/>\n <BR><\/BR><br \/>\n    Prithiv Dev Devendran<br \/>\n \t<BR><br \/>\n<BR><\/BR><br \/>\n<a href=\"https:\/\/towardsdatascience.com\/on-device-machine-learning-in-spatial-computing\/\">Go to original source<\/a><br \/>\n \t<BR><br \/>\n <BR><\/BR><\/p>\n","protected":false},"excerpt":{"rendered":"<p>On-Device Machine Learning in Spatial Computing The landscape of computing is undergoing a profound transformation with the emergence of spatial computing platforms(VR and AR). As we step into this new era, the intersection of virtual reality, Augmented Reality, and on-device machine learning presents unprecedented opportunities for developers to create experiences that seamlessly blend digital content [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1766,62,1767,221,166,70,1768],"tags":[1769,199,341],"class_list":["post-1901","post","type-post","status-publish","format-standard","hentry","category-3d-object-detection","category-aimldsaimlds","category-augmented-reality","category-computer-vision","category-hands-on-tutorials","category-machine-learning","category-spatial-computing","tag-device","tag-learning","tag-machine"],"_links":{"self":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1901"}],"collection":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/comments?post=1901"}],"version-history":[{"count":0,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/posts\/1901\/revisions"}],"wp:attachment":[{"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/media?parent=1901"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/categories?post=1901"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mailitics.com\/index.php\/wp-json\/wp\/v2\/tags?post=1901"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}