Build a native visionOS app with immersive spaces and tappable 3D objects

This article follows the core tutorial path: turn a plain SwiftUI app into a native visionOS app, control the glass background, package USDZ assets, place them in an immersive space, open a separate volumetric inspection window, and finish with a simple Whac-A-Robot interaction loop.

visionOS app demo showing a 2D window, immersive 3D content, and model inspection workflows

The first real step is to stop thinking of visionOS as just another iPad destination.

This article starts from an ordinary SwiftUI app and then adds the pieces that make it feel native on Vision Pro: a real Apple Vision target, window styling, Reality Composer Pro assets, an immersive space, a second 3D inspection window, and entity-targeted gestures.

The end result is not only a static demo. It becomes a small room-scale mini-game where you can spawn robots in the user's space and remove them by selecting the correct entity.

Project Links Sample repository: mszpro/VisionOS101. English video walkthrough: YouTube demo.

Switch from "Designed for iPad" compatibility mode to a real Apple Vision target first.

A SwiftUI app can appear to run on Vision Pro while still behaving like an iPad app in a floating shell. That mode is fine for quick testing, but it does not give you the proper window behavior or the spatial workflow needed for immersive content.

The practical change is to remove the compatibility destination and add the native Apple Vision target. If part of your codebase or package graph is not ready for visionOS yet, fence the new code with platform checks instead of forking the whole app.

Conditional Compilation Use #if os(visionOS) around visionOS-only code paths when third-party packages or older APIs are not available on the headset.
Native visionOS window appearance after switching to an Apple Vision target
Once the app is running as a native visionOS target, the window treatment and behavior match the platform much more closely.

Control the glass effect instead of accepting the default window treatment.

visionOS applies a glass-like background by default. For a purely 2D control window, that can be useful, but it can also work against the interface you want. The article shows both the static and dynamic versions of this control.

WindowGroup {
    ContentView()
}
.windowStyle(.plain)

@State private var showGlassBackground = true

VStack {
    // your 2D controls
}
.padding()
.frame(width: 700)
.glassBackgroundEffect(
    displayMode: showGlassBackground ? .always : .never
)

That combination gives you two levers: .windowStyle(.plain) for the outer window presentation and .glassBackgroundEffect(displayMode:) for the content itself.

visionOS window showing the effect of disabling the glass background
Once the background effect is under your control, the 2D window can behave more like a deliberate control surface.

Reality Composer Pro gives you a package you can ship and load like code.

The tutorial uses a robot USDZ asset and places it inside a Reality Composer Pro project. One useful detail here is that the Reality Composer Pro project is really a Swift package, so it can be imported into Xcode as a local dependency and treated like a normal bundle source.

Creating a new Reality Composer Pro project for visionOS assets
Create the Reality Composer Pro project beside the app so it can be added back into Xcode cleanly.
Reality Composer Pro project folder structure showing it as a Swift package
The package structure matters because the app later imports the generated asset bundle directly.
Animated robot asset shown inside Reality Composer Pro
Choose a USDZ model with animation if you want to test more than static placement.
#if os(visionOS)
import RealityKit
import DemoAssets

@MainActor
func loadFromRealityComposerProject(
    nodeName: String,
    sceneFileName: String
) async -> Entity? {
    do {
        let scene = try await Entity(
            named: sceneFileName,
            in: demoAssetsBundle
        )
        return scene.findEntity(named: nodeName)
    } catch {
        print("Failed to load \\(sceneFileName)")
        return nil
    }
}
#endif

This is the pattern that unlocks the rest of the demo. Once the asset lives in a bundle and can be found by name, the app can place it in a room, clone it, inspect it in another window, or attach input behavior to it.

Load the entity into a RealityView, scale it, place it, and open that scene from the 2D app window.

The immersive space is where the room-scale part of the demo starts. A dedicated ContentSpace view uses RealityView to load the robot, scale it down, place it in front of the user, and start its first animation if one is available.

ImmersiveSpace(id: "robotSpace") {
    ContentSpace()
}

RealityView { content in
    guard let robot = await loadFromRealityComposerProject(
        nodeName: "robot_walk_idle",
        sceneFileName: "robot_walk_idle.usdz"
    ) else { return }

    robot.scale = .init(x: 0.1, y: 0.1, z: 0.1)
    robot.position = .init(x: 0, y: 0, z: -3)

    if let animation = robot.availableAnimations.first {
        robot.playAnimation(animation.repeat(),
                            transitionDuration: 0,
                            startsPaused: false)
    }

    content.add(robot)
}

The app opens and closes that immersive space through SwiftUI environment values. This article uses a 2D control panel with buttons to open the room, add random robots, close the space, and launch a separate inspection window.

Coordinate Reminder x moves left and right, y moves up and down, and negative z places content in front of the user. If the model seems missing, scale is usually the first thing to check.
Robot model appearing inside an immersive space in front of the user
The immersive space path is what moves the model out of the app window and into the room.
Animated robot walking inside the immersive space
Once the asset is loaded, starting the first bundled animation is just another RealityKit step.

Use a second volumetric WindowGroup when the user wants to inspect one model up close.

The article does not stop at the immersive room view. It also adds a dedicated model-inspection window so the user can rotate the robot directly, separate from the room-scale scene.

struct ARModelOpenParameter: Identifiable, Hashable, Codable {
    var id: String { "\(modelName)-\(modelNodeName)" }
    var modelName: String
    var modelNodeName: String
    var initialScale: Float
}

WindowGroup(for: ARModelOpenParameter.self) { $object in
    if let object {
        RealityView { content in
            guard let arAsset = await loadFromRealityComposerProject(
                nodeName: object.modelNodeName,
                sceneFileName: object.modelName
            ) else { return }
            arAsset.generateCollisionShapes(recursive: true)
            arAsset.components[InputTargetComponent.self] =
                InputTargetComponent(allowedInputTypes: .all)
            content.add(arAsset)
        }
        .dragRotation()
        .glassBackgroundEffect(displayMode: .always)
    }
}
.windowStyle(.volumetric)
.defaultSize(width: 0.5, height: 0.5, depth: 0.5, in: .meters)

The button that launches this window is just as important as the window itself: @Environment(\.openWindow) plus openWindow(value: ARModelOpenParameter(...)).

Separate volumetric window used to inspect and rotate a 3D robot model
A dedicated inspection window is useful when room placement and close-up inspection should be separate interactions.

When you need less custom UI, the built-in Model3D path is much lighter.

The custom volumetric window is flexible, but it is not the only option. This article also points out that visionOS can render a model preview directly with Model3D, which is useful when you need a quick viewer instead of a fully custom RealityView-based scene.

Model3D(named: "robot_walk_idle.usdz", bundle: demoAssetsBundle)
    .padding(.bottom, 50)

This is the right tradeoff when you want a built-in preview path and do not need custom gesture handling, scene graph traversal, or extra runtime setup.

To make the robot selectable, give it collisions and an input target, then listen for entity-targeted taps.

In visionOS, the user selects by looking and pinching. That means the entity must opt into input. The article adds collision shapes, installs InputTargetComponent, and then handles a targeted tap gesture from the surrounding RealityView.

loaded3DAsset.generateCollisionShapes(recursive: true)
loaded3DAsset.components[InputTargetComponent.self] =
    InputTargetComponent(allowedInputTypes: .all)

.gesture(
    TapGesture()
        .targetedToAnyEntity()
        .onEnded { tap in
            var node: Entity? = tap.entity
            while node != nil && node?.name != "robot_root_node" {
                node = node?.parent
            }
            node?.removeFromParent()
        }
)

The parent walk matters because the tapped entity may be a child mesh, not the logical root you want to remove. In the finished mini-game, the app clones the robot into random positions, assigns a stable root name, and removes the whole robot when that root is found.

Not Hand Tracking This pattern is about gaze-and-pinch selection on scene entities, not full hand-tracked physics interaction. The article explicitly treats those as different problems.

The finished demo is small, but it covers most of the visionOS basics you need to prototype quickly.

One tutorial ends up touching most of the platform's important seams: platform targeting, window styling, asset packaging, RealityView loading, immersive spaces, volumetric windows, animation playback, and entity-targeted gestures.

This article also included Apple's sample-code license text because the drag-rotation helper is adapted from Apple's own documentation samples. If you follow the complete project closely, keep that attribution chain intact.

For the full code, use the repository linked above: mszpro/VisionOS101.