Tapping here means user looking at the object with their eyes and using the finger to pinch to select.
If you are looking for ways to allow the user to use hand to physically touch the object, you should use collider detection and also render user’s hands as an Entity. This is not covered in this article!
Then, for the 3D RealityView, you should attach a tap gesture detector (yes, similar to how you detect tap gesture in 2D views, you can attach it to the RealityKit 3D view):
.gesture(TapGesture()
.targetedToAnyEntity()
.onEnded({ tap in
let tappedNode = tap.entity
}))
Now, our 3D immersive space will contain the following code:
import SwiftUI
import RealityKit
struct ContentSpace: View {
@State private var loaded3DAsset: Entity? = nil
var body: some View {
RealityView { content in
loaded3DAsset = await loadFromRealityComposerProject(nodeName: "robot_walk_idle",
sceneFileName: "robot_walk_idle.usdz")
loaded3DAsset?.name = "robot"
loaded3DAsset?.scale = .init(x: 0.1, y: 0.1, z: 0.1)
loaded3DAsset?.position = .init(x: 0, y: 0, z: -3)
loaded3DAsset?.components[InputTargetComponent.self] = InputTargetComponent(allowedInputTypes: .all)
guard let loaded3DAsset else {
return
}
content.add(loaded3DAsset)
}
.gesture(TapGesture()
.targetedToAnyEntity()
.onEnded({ tap in
let tappedNode = tap.entity
}))
}
}
Now, since one node can contain many child nodes, and the user could tap on the child node instead of the robot node itself, we need to recursively check up to the node we are looking for.
For example, when the user taps on the robot we added to the 3D space, the user could be tapping on the head of the robot, which is its own child node. We can run the following helper function to check for the node based on the name.
/// Checks whether a collision event contains one of a list of named entities.
func eventHasTargets(event: CollisionEvents.Began, matching names: [String]) -> Entity? {
for targetName in names {
if let target = eventHasTarget(event: event, matching: targetName) {
return target
}
}
return nil
}
/// Checks whether a collision event contains an entity that matches the name you supply.
func eventHasTarget(event: CollisionEvents.Began, matching targetName: String) -> Entity? {
let aParentBeam = event.entityA[parentMatching: targetName]
let aChildBeam = event.entityA[descendentMatching: targetName]
let bParentBeam = event.entityB[parentMatching: targetName]
let bChildBeam = event.entityB[descendentMatching: targetName]
if aParentBeam == nil && aChildBeam == nil && bParentBeam == nil && bChildBeam == nil {
return nil
}
var beam: Entity?
if aParentBeam != nil || aChildBeam != nil {
beam = (aParentBeam == nil) ? aChildBeam : aParentBeam
} else if bParentBeam != nil || bChildBeam != nil {
beam = (bParentBeam == nil) ? bChildBeam : bParentBeam
}
return beam
}
func findEntity(within: Entity, matching targetName: String) -> Entity? {
let aParentBeam = within[parentMatching: targetName]
let aChildBeam = within[descendentMatching: targetName]
if aParentBeam == nil && aChildBeam == nil {
return nil
}
var beam: Entity?
if aParentBeam != nil || aChildBeam != nil {
beam = (aParentBeam == nil) ? aChildBeam : aParentBeam
}
return beam
}
Remember we have named the robot Entity as robot? We can check for that name:
.gesture(TapGesture()
.targetedToAnyEntity()
.onEnded({ tap in
let tappedNode = tap.entity
if let tappedRobotEntity = findEntity(within: tappedNode, matching: "robot") {
// tapped the robot
// TODO
}
}))
If you follow through this article, you can learn to make a mini game that shows robot in random location in the room and you can tap it to hide it
This article talks about:
Adding VisionOS target to your existing app
Toggle the transparency of the view background
Conditionally run code for VisionOS app only
Load 3D asset (a .usdz file) using asset bundle
Open an immersive space to display the 3D asset alongside 2D view
Open a new window displaying the 3D asset, and allow user hand control to rotate and zoom the model
Change the position, scale of the 3D asset
Apply animation to the 3D asset
Close the immersive space
Allow using pinch and eye to select object
This article has been recorded by me as a Youtube video (but in English)
Github repository:
About my VisionOS journey
I have individually developed and published 3 VisionOS apps and games so far, with one featured on App Store home page story and one featured on WWDC 2024 keynote slide.
Spatial Dream
A tiny puzzle game in your room. Featured in WWDC 2024 Keynote slide.
SoraSNS for Mastodon, Misskey, Bluesky
A Fediverse SNS client. Feature on App Store home page story.
Spatial Boxer
Boxing in space!
Of course, I am working on some more exciting VisionOS apps.
Now, I will publish many articles on developing VisionOS apps, so you can make something cool too.
Writing an article like this takes days for me to prepare. Please consider follow me on Twitter (is that still a thing?) for emotional support, and buy a a coffee for realistic support.
Let’s get started!
Starting point
We will start from a regular SwiftUI application for iOS:
You will have the shown file structure and the basic SwiftUI view with the text Hello World
Now, switch to the project settings for the main target. You might see that the app already has Apple Vision support, but it says Designed for iPad.
This means the app will run on VisionOS the same as running on iPad, without support for 3D spatial assets and with an ugly background.
For example, we can run the app in the simulator now by switching the target to Apple Vision Pro and tap the run button:
You can see the view has a white solid non-transparency background, and the user cannot resize the window at all.
Now, in the list of supported run destinations, remove the Apple Vision Pro (Designed for iPad), and add Apple Vision
Now, run the app in the simulator again, you will see that it is now a native Apple Vision Pro app.
💡
If you apply the above changes for your existing iOS app, you might encounter compilation warnings. For example, the third party packages and libraries you use might not be compatible with Apple Vision OS, or, some of your code is not available for Vision OS. You will need to use the conditional check to run a separate branch of code for Vision OS, which I will talk about in later part of this article.
Control the background glassmorphism effect
By default, you can see there is a glassmorphism (like a glass, and you can see what’s behind the window) background applied. We can control that.
First, since our application window displays 2D contents only, we can set the windowStyle to .plain in the main SwiftUI app:
import SwiftUI
@main
struct VisionOSDemoApp: App {
var body: some Scene {
WindowGroup {
ContentView()
}
.windowStyle(.plain)
}
}
Then, in our ContentView, we can set to not show the background glass effect:
struct ContentView: View {
var body: some View {
VStack {
Image(systemName: "swift")
.resizable()
.scaledToFit()
.frame(width: 120, height: 120)
.foregroundStyle(.tint)
Text("I love SwiftUI! From MszPro.")
.font(.largeTitle)
.bold()
}
.padding()
.glassBackgroundEffect(displayMode: .never)
}
}
Now, you will see the following:
You can also control whether the background glass effect is shown dynamically. For example, here we can use a @State variable and a toggle:
Because we will also talk about animation, you should go and pick a model that has animation, for example, the robot (you should use the robot in order to follow along with this article tutorial). Right click and download the .usdz file
Drag the download file to the Reality Composer Pro project browser. And you will see the animated model playing on the right panel:
Now, in your project in Xcode, import this Reality Composer Pro project like you are importing a Swift Package:
Function to load asset from the bundle
Now, we need to write a function to load a specific asset from the Reality Composer Pro bundle:
#if os(visionOS)
import Foundation
import RealityKit
import DemoAssets
@MainActor
func loadFromRealityComposerProject(nodeName: String, sceneFileName: String) async -> Entity? {
var entity: Entity? = nil
do {
let scene = try await Entity(named: sceneFileName,
in: demoAssetsBundle)
entity = scene.findEntity(named: nodeName)
} catch {
print("Failed to load asset named \(nodeName) within the file \(sceneFileName)")
}
return entity
}
#endif
Notice here, we have used a if os(visionOS) check so this code only compiles if we are building for VisionOS.
The input of this function takes the name of the node (optional), and the name of the file.
It is optional, since the above code uses the root node if you do not provide a specific entity name. However, if you’d like to, you can easily list all child nodes.
First, open the .usdz file, export to SceneKit, and then view the list of nodes included in the file:
In the above case, robot_walk_idle is the name of the root (top level) entity.
This is useful if you have many assets within a single .usdz file and only want to load one of them.
💡
Think of .usdz as read-only format provided to the end-user; and .scn file as the editor format, used by developers and 3D designer to modify the content of the 3D file. Notice that .usdz can be converted to .scn at any time, and it not encrypted.
Creating an immersive space
To display contents in user’s room, you need to create an immersive space. Notice that this is different than displaying a 3D content within the application window itself. We will go through both of these in this article.
TODO: show table contrast of displaying content in user’s room and in a separate window.
Create a new file called ContentSpace.swift. In this file, we will use a RealityView, load the 3D asset, set it up, and add it to the view:
#if os(visionOS)
import SwiftUI
import RealityKit
struct ContentSpace: View {
@State private var loaded3DAsset: Entity? = nil
var body: some View {
RealityView { content in
loaded3DAsset = await loadFromRealityComposerProject(nodeName: "robot_walk_idle",
sceneFileName: "robot_walk_idle.usdz")
loaded3DAsset?.scale = .init(x: 0.1, y: 0.1, z: 0.1)
loaded3DAsset?.position = .init(x: 0, y: 0, z: -3)
// TODO: Collision
// TODO: Allow user to tap on it
// TODO: add lighting
guard let loaded3DAsset else {
return
}
content.add(loaded3DAsset)
}
}
}
💡
Remember to set a proper scale for the object. You can do so by checking the size of the object first (shown in meters) and then calculate it. If you cannot see the object, it is usually because it is too big or too tiny.
Now, go to the main app code, and we can declare the above view as an immersive space. We also provide an id to the space, so we can invoke it anywhere in our app. When we invoke it, it will show up in the user’s room.
import SwiftUI
@main
struct VisionOSDemoApp: App {
var body: some Scene {
WindowGroup {
ContentView()
}
.windowStyle(.plain)
ImmersiveSpace(id: "robotSpace") {
ContentSpace()
}
}
}
To open any immersive space, or to dismiss the currently shown immersive space, we will use a SwiftUI environment variable:
@Environment(\.openImmersiveSpace) private var openImmersiveSpace
@Environment(\.dismissImmersiveSpace) private var dismissImmersiveSpace
Notice that you can only show one immersive space at a time. To show a new one, make sure you dismiss the current one (if there is any).
Now, we can add a button to show the above robotSpace in our 2D view:
If it is too big, you can adjust the .scale factor
Explanation about the coordinate space
You should have noticed that we have used loaded3DAsset?.position = .init(x: 0, y: 0, z: -3) to set the initial position of the 3D asset.
X axis means left and right.
Y axis means up and down (roof or floor)
Z axis means closer or further from the user. If it is negative, it is in front of the user; if it is positive, it means it is behind the user.
Adding animation
If your 3D model comes with an animation, you can play that animation:
RealityView { content in
// ...
guard let loaded3DAsset else {
return
}
// animation
if let firstAnimation = loaded3DAsset.availableAnimations.first {
loaded3DAsset.playAnimation(firstAnimation.repeat(),
transitionDuration: 0,
startsPaused: false)
}
// TODO: Collision
// TODO: Allow user to tap on it
// TODO: add lighting
content.add(loaded3DAsset)
}
Now you can see the robot will be walking.
Opening a new window for an interactive robot model
You can also show the robot in a new window (instead of user’s space) so that the user can pinch to scale the robot, or use hands to rotate and view different angles of the robot:
First, we need to define a data structure that tells the app which AR model to show:
struct ARModelOpenParameter: Identifiable, Hashable, Codable {
var id: String {
return "\(modelName)-\(modelNodeName)"
}
var modelName: String
var modelNodeName: String
var initialScale: Float
}
To allow the user to drag to rotate the model, we can put the following helper function within our code. Notice the below code has been taken from a sample project from Apple Developer documentation.
/*
See the LICENSE.txt file for this sample’s licensing information.
Abstract:
A modifier for turning drag gestures into rotation.
*/
import SwiftUI
import RealityKit
extension View {
/// Enables people to drag an entity to rotate it, with optional limitations
/// on the rotation in yaw and pitch.
func dragRotation(
yawLimit: Angle? = nil,
pitchLimit: Angle? = nil,
sensitivity: Double = 10,
axRotateClockwise: Bool = false,
axRotateCounterClockwise: Bool = false
) -> some View {
self.modifier(
DragRotationModifier(
yawLimit: yawLimit,
pitchLimit: pitchLimit,
sensitivity: sensitivity,
axRotateClockwise: axRotateClockwise,
axRotateCounterClockwise: axRotateCounterClockwise
)
)
}
}
/// A modifier converts drag gestures into entity rotation.
private struct DragRotationModifier: ViewModifier {
var yawLimit: Angle?
var pitchLimit: Angle?
var sensitivity: Double
var axRotateClockwise: Bool
var axRotateCounterClockwise: Bool
@State private var baseYaw: Double = 0
@State private var yaw: Double = 0
@State private var basePitch: Double = 0
@State private var pitch: Double = 0
func body(content: Content) -> some View {
content
.rotation3DEffect(.radians(yaw == 0 ? 0.01 : yaw), axis: .y)
.rotation3DEffect(.radians(pitch == 0 ? 0.01 : pitch), axis: .x)
.gesture(DragGesture(minimumDistance: 0.0)
.targetedToAnyEntity()
.onChanged { value in
// Find the current linear displacement.
let location3D = value.convert(value.location3D, from: .local, to: .scene)
let startLocation3D = value.convert(value.startLocation3D, from: .local, to: .scene)
let delta = location3D - startLocation3D
// Use an interactive spring animation that becomes
// a spring animation when the gesture ends below.
withAnimation(.interactiveSpring) {
yaw = spin(displacement: Double(delta.x), base: baseYaw, limit: yawLimit)
pitch = spin(displacement: Double(delta.y), base: basePitch, limit: pitchLimit)
}
}
.onEnded { value in
// Find the current and predicted final linear displacements.
let location3D = value.convert(value.location3D, from: .local, to: .scene)
let startLocation3D = value.convert(value.startLocation3D, from: .local, to: .scene)
let predictedEndLocation3D = value.convert(value.predictedEndLocation3D, from: .local, to: .scene)
let delta = location3D - startLocation3D
let predictedDelta = predictedEndLocation3D - location3D
// Set the final spin value using a spring animation.
withAnimation(.spring) {
yaw = finalSpin(
displacement: Double(delta.x),
predictedDisplacement: Double(predictedDelta.x),
base: baseYaw,
limit: yawLimit)
pitch = finalSpin(
displacement: Double(delta.y),
predictedDisplacement: Double(predictedDelta.y),
base: basePitch,
limit: pitchLimit)
}
// Store the last value for use by the next gesture.
baseYaw = yaw
basePitch = pitch
}
)
.onChange(of: axRotateClockwise) {
withAnimation(.spring) {
yaw -= (.pi / 6)
baseYaw = yaw
}
}
.onChange(of: axRotateCounterClockwise) {
withAnimation(.spring) {
yaw += (.pi / 6)
baseYaw = yaw
}
}
}
/// Finds the spin for the specified linear displacement, subject to an
/// optional limit.
private func spin(
displacement: Double,
base: Double,
limit: Angle?
) -> Double {
if let limit {
return atan(displacement * sensitivity) * (limit.degrees / 90)
} else {
return base + displacement * sensitivity
}
}
/// Finds the final spin given the current and predicted final linear
/// displacements, or zero when the spin is restricted.
private func finalSpin(
displacement: Double,
predictedDisplacement: Double,
base: Double,
limit: Angle?
) -> Double {
// If there is a spin limit, always return to zero spin at the end.
guard limit == nil else { return 0 }
// Find the projected final linear displacement, capped at 1 more revolution.
let cap = .pi * 2.0 / sensitivity
let delta = displacement + max(-cap, min(cap, predictedDisplacement))
// Find the final spin.
return base + delta * sensitivity
}
}
Then, update your main SwiftUI app code:
@main
struct VisionOSDemoApp: App {
@Environment(\.dismissWindow) private var dismissWindow
var body: some SwiftUI.Scene {
WindowGroup {
ContentView()
}
.windowStyle(.plain)
ImmersiveSpace(id: "robotSpace") {
ContentSpace()
}
WindowGroup(for: ARModelOpenParameter.self) { $object in
// 3D view
if let object {
RealityView { content in
guard let arAsset = await loadFromRealityComposerProject(
nodeName: object.modelNodeName,
sceneFileName: object.modelName
) else {
fatalError("Unable to load beam from Reality Composer Pro project.")
}
arAsset.generateCollisionShapes(recursive: true)
arAsset.position = .init(x: 0, y: 0, z: 0)
arAsset.scale = .init(x: object.initialScale,
y: object.initialScale,
z: object.initialScale)
arAsset.components[InputTargetComponent.self] = InputTargetComponent(allowedInputTypes: .all)
content.add(arAsset)
}
.dragRotation()
.frame(width: 900, height: 900)
.glassBackgroundEffect(displayMode: .always)
}
}
.windowStyle(.volumetric)
.defaultSize(width: 0.5, height: 0.5, depth: 0.5, in: .meters)
}
}
There are several changes:
Change the type of the view from Scene to SwiftUI.Scene, because the same name also appears in the RealityKit system framework.
Add a new WindowGroup, which will be open based on the data provided by the ARModelOpenParameter. Within the new window, it will show the AR object, allow rotation of that object.
To show that window, you can use the openWindow environmental variable and construct a ARModelOpenParameter
@Environment(\.openWindow) private var openWindow
Button("Inspect robot in a window") {
self.openWindow(value: ARModelOpenParameter(modelName: "robot_walk_idle.usdz", modelNodeName: "robot_walk_idle", initialScale: 0.01))
}
Now, you can tap this button to show a separate window, which contains a rotatable 3D model for inspection:
Use system 3D viewer
The above method shows a way where you can design the model preview window yourself. You can also directly use the model preview provided by the VisionOS system:
Model3D(named: "Robot-Drummer") { model in
model
.resizable()
.aspectRatio(contentMode: .fit)
} placeholder: {
ProgressView()
}
You can also use a model within the package you created:
In VisionOS, the user can use the eye to look at something and pinch the fingers to select it. We can apply this to the 3D objects we added to the scene.
💡
Note: This is different from using hands to touch the 3D model, which is not covered in this article due to the complexity of hand tracking, and I will talk about this in another article.
You can simple add the following line for the 3D entity you loaded to allow user to tap on it:
To receive tap events, you should add the following gesture to your 3D space view (the one with RealityView). You can get the tappedNode as the node user tapped on, but do keep in mind this might be a child node of the node you added.
.gesture(TapGesture()
.targetedToAnyEntity()
.onEnded({ tap in
let tappedNode = tap.entity
}))
So remember to check recursively for parent nodes to see if the user tapped on the node you are looking for. You can also use the name of the node as an identifier.
Whac-A-Robot! (Cause I could not find a free 3D model for a mole)
[Video or GIF demo here]
In the previous step, we have stored the loaded 3D model in the ContentSpace. In order to be more flexible and be able to access that asset everywhere in the app, we can put it into a model:
Here, we have the function loadAssets that loads the asset into memory. This function will be called before the immersive space shows up. We also have the addRobotAtRandomPosition function, which is used in the game to add a robot at a random location.
As you can see in the above code, because we need to load the asset before the immersive view shows up. But before the immersive view shows up, there is no world root node (the node of the room environment) to add the asset to. Thus, we will create a global variable contentSpaceOrigin to add all of our loaded assets to, then, when the immersive space shows up, we can simply add the contentSpaceOrigin as a child of the world node.
import SwiftUI
import RealityKit
struct ContentSpace: View {
@Environment(ContentModel.self) var gameModel
var body: some View {
RealityView { content in
content.add(contentSpaceOrigin)
}
.gesture(TapGesture()
.targetedToAnyEntity()
.onEnded({ tap in
let tappedNode = tap.entity
// look up until it reaches the robot main node
var foundRobotMainNode: Entity? = tappedNode
while foundRobotMainNode != nil &&
foundRobotMainNode?.parent != nil {
if foundRobotMainNode?.name == "robot_root_node" {
break // we found it!
} else {
foundRobotMainNode = foundRobotMainNode?.parent
}
}
foundRobotMainNode?.removeFromParent()
speak(text: "まだね")
}))
}
}
Notice in the above code, we set a detection for tap gesture. Whenever there is a tap gesture, the app will remove the robot that has been tapped.
Now, in our 2D view, we will have a button to add a random robot.
import SwiftUI
import RealityKit
struct ContentView: View {
@State private var showGlassBackground: Bool = true
@Environment(ContentModel.self) var gameModel
@Environment(\.openImmersiveSpace) private var openImmersiveSpace
@Environment(\.dismissImmersiveSpace) private var dismissImmersiveSpace
@Environment(\.openWindow) private var openWindow
var body: some View {
VStack {
Image(systemName: "swift")
.resizable()
.scaledToFit()
.frame(width: 120, height: 120)
.foregroundStyle(.tint)
Text("I love SwiftUI! From MszPro.")
.font(.largeTitle)
.bold()
Toggle("Show glass background", isOn: $showGlassBackground)
Button("Open immersive space") {
Task { @MainActor in
await self.dismissImmersiveSpace()
await self.gameModel.loadAssets()
await self.openImmersiveSpace(id: "robotSpace")
}
}
Button("Add random robot in room") {
self.gameModel.addRobotAtRandomPosition()
speak(text: "ハロー")
}
Button("Hide robot") {
Task { @MainActor in
await self.dismissImmersiveSpace()
}
}
Button("Inspect robot in a window") {
self.openWindow(value: ARModelOpenParameter(modelName: "robot_walk_idle.usdz", modelNodeName: "robot_walk_idle", initialScale: 0.01))
}
}
.padding()
.frame(width: 700)
.glassBackgroundEffect(displayMode: showGlassBackground ? .always : .never)
}
}
And here is the helper function to speak something:
Follow me on Twitter to get updates: https://twitter.com/mszpro
Apple code license
Copyright (C) 2024 Apple Inc. All Rights Reserved.
IMPORTANT: This Apple software is supplied to you by Apple
Inc. ("Apple") in consideration of your agreement to the following
terms, and your use, installation, modification or redistribution of
this Apple software constitutes acceptance of these terms. If you do
not agree with these terms, please do not use, install, modify or
redistribute this Apple software.
In consideration of your agreement to abide by the following terms, and
subject to these terms, Apple grants you a personal, non-exclusive
license, under Apple's copyrights in this original Apple software (the
"Apple Software"), to use, reproduce, modify and redistribute the Apple
Software, with or without modifications, in source and/or binary forms;
provided that if you redistribute the Apple Software in its entirety and
without modifications, you must retain this notice and the following
text and disclaimers in all such redistributions of the Apple Software.
Neither the name, trademarks, service marks or logos of Apple Inc. may
be used to endorse or promote products derived from the Apple Software
without specific prior written permission from Apple. Except as
expressly stated in this notice, no other rights or licenses, express or
implied, are granted by Apple herein, including but not limited to any
patent rights that may be infringed by your derivative works or by other
works in which the Apple Software may be incorporated.
The Apple Software is provided by Apple on an "AS IS" basis. APPLE
MAKES NO WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND
OPERATION ALONE OR IN COMBINATION WITH YOUR PRODUCTS.
IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL
OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION,
MODIFICATION AND/OR DISTRIBUTION OF THE APPLE SOFTWARE, HOWEVER CAUSED
AND WHETHER UNDER THEORY OF CONTRACT, TORT (INCLUDING NEGLIGENCE),
STRICT LIABILITY OR OTHERWISE, EVEN IF APPLE HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.