<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>AI » MszPro・株式会社Smartソフト</title>
	<atom:link href="https://mszpro.com/category/ai/feed" rel="self" type="application/rss+xml" />
	<link>https://mszpro.com</link>
	<description>iOS VisionOS SwiftUI Programming Blog. Dream it, Chase it, Code it.</description>
	<lastBuildDate>Mon, 16 Dec 2024 12:55:07 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.1</generator>

<image>
	<url>https://static-assets.mszpro.com/2024/12/cropped-Unknown-32x32.webp</url>
	<title>AI » MszPro・株式会社Smartソフト</title>
	<link>https://mszpro.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Detect, extract, Segment objects on given image in iOS (ImageAnalysisInteraction, VNGenerateForegroundInstanceMask)</title>
		<link>https://mszpro.com/vision-foreground-instance-mask-request</link>
		
		<dc:creator><![CDATA[msz]]></dc:creator>
		<pubDate>Mon, 16 Dec 2024 07:43:12 +0000</pubDate>
				<category><![CDATA[AI]]></category>
		<category><![CDATA[iOS]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<guid isPermaLink="false">https://mszpro.com/?p=377</guid>

					<description><![CDATA[<p>In Photos app, you can long press to extract objects (animals, people) from an image. In this article, we will talk about implementing this same feature using `ImageAnalysisInteraction`, and we will dig deeper to calling `VNGenerateForegroundInstanceMaskRequest` to archive this without using any image views. Who says cats cannot party? (a demo of extracting the foreground [&#8230;]</p>
<p>The post <a href="https://mszpro.com/vision-foreground-instance-mask-request">Detect, extract, Segment objects on given image in iOS (ImageAnalysisInteraction, VNGenerateForegroundInstanceMask)</a> first appeared on <a href="https://mszpro.com">MszPro・株式会社Smartソフト</a>.</p>]]></description>
										<content:encoded><![CDATA[<p>In Photos app, you can long press to extract objects (animals, people) from an image. In this article, we will talk about implementing this same feature using `ImageAnalysisInteraction`, and we will dig deeper to calling `VNGenerateForegroundInstanceMaskRequest` to archive this without using any image views.</p>



<h2 class="wp-block-heading">Who says cats cannot party?</h2>



<h3 class="wp-block-heading">(a demo of extracting the foreground cats and replacing the image background)</h3>



<figure class="wp-block-image"><img fetchpriority="high" decoding="async" width="1600" height="2321" src="https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ.jpg" alt="" class="wp-image-384" srcset="https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-207x300.jpg 207w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-706x1024.jpg 706w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-768x1114.jpg 768w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-1059x1536.jpg 1059w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-1412x2048.jpg 1412w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ.jpg 1600w" sizes="(max-width: 1600px) 100vw, 1600px" /></figure>



<p>This article will apply to both UIKit and SwiftUI applications.</p>



<ul class="wp-block-list">
<li>Detect the objects within a given image</li>



<li>Highlight different objects in your code</li>



<li>Get the image of the object</li>
</ul>



<p>As a bonus of this article, I will also show you how to:</p>



<ul class="wp-block-list">
<li>Get the object at tapped position</li>



<li>Replace the image background behind the subjects</li>
</ul>



<p><strong>Notice: The code within this article will not run in simulator. You should use a physical device to test it.</strong></p>



<p><strong>Notice: SwiftUI code follows after the UIKit code</strong></p>



<p>Let’s get started!</p>



<figure class="wp-block-image"><img decoding="async" width="1600" height="2321" src="https://static-assets.mszpro.com/2024/12/1cb3l9QS_2Z9zxYlUvAHzpQ.jpg" alt="" class="wp-image-381" srcset="https://static-assets.mszpro.com/2024/12/1cb3l9QS_2Z9zxYlUvAHzpQ-207x300.jpg 207w, https://static-assets.mszpro.com/2024/12/1cb3l9QS_2Z9zxYlUvAHzpQ-706x1024.jpg 706w, https://static-assets.mszpro.com/2024/12/1cb3l9QS_2Z9zxYlUvAHzpQ-768x1114.jpg 768w, https://static-assets.mszpro.com/2024/12/1cb3l9QS_2Z9zxYlUvAHzpQ-1059x1536.jpg 1059w, https://static-assets.mszpro.com/2024/12/1cb3l9QS_2Z9zxYlUvAHzpQ-1412x2048.jpg 1412w, https://static-assets.mszpro.com/2024/12/1cb3l9QS_2Z9zxYlUvAHzpQ.jpg 1600w" sizes="(max-width: 1600px) 100vw, 1600px" /></figure>



<h1 class="wp-block-heading">Method 1: Attaching image analysis component to UIImageView image view</h1>



<h2 class="wp-block-heading">Detecting objects within an image</h2>



<p>To perform image analysis, you will need to add `ImageAnalysisInteraction` to&nbsp;<code>UIImageView</code></p>



<script src="https://gist.github.com/mszpro/b1b3d96b22e703d61f1b24c47815fca9.js"></script>



<p>Here, you can set the preferred interaction types. If you use the Apple’s Photos app, you will find that you can not only pick objects within the image, but also texts and QR codes. This is defined using the `preferredInteractionTypes` property. You can provide an array to this property to set which object the user can interact with in your app’s image view.</p>



<script src="https://gist.github.com/mszpro/c3e33a7fdf80d007e7b0b4673ab3b42f.js"></script>



<p><code>.dataDetectors</code>&nbsp;means URLs, email addresses, and physical addresses</p>



<p><code>.imageSubject</code>&nbsp;means objects within the image (the main focus of this article)</p>



<p><code>.textSelection</code>&nbsp;means selecting the text within the image</p>



<p><code>.visualLookUp</code>&nbsp;means objects that the iOS system can show more information about (for example, the breed of a cat or dog)</p>



<p>For this article, you can set it to be only&nbsp;<code>.imageSubject</code></p>



<h3 class="wp-block-heading">Running image analysis</h3>



<p>To run image analysis and check which objects are in the image, you run the below code:</p>



<p>You can also use the `interaction.highlightedSubjects` property to highlight each or all of the detected objects. In the above code, if we set this variable to `detectedSubjects`, it will highlight all the detected objects.</p>



<h2 class="wp-block-heading">Reading the data from image analyzer</h2>



<p>You can read the objects and set which subjects are highlighted in your code by using the `interaction.subjects` property and the `interaction.highlightedSubjects` property.</p>



<script src="https://gist.github.com/mszpro/b194c99b2f66d4d977b4ccd00e27b13b.js"></script>



<p>Within each of the subject (which conforms to `ImageAnalysisInteraction.Subject`, you can read the size, origin (bounding box), and extract the image</p>



<figure class="wp-block-image"><img decoding="async" width="1600" height="527" src="https://static-assets.mszpro.com/2024/12/1R4cIBzJqfm6ZsN3-_R1ngg.png" alt="" class="wp-image-383" srcset="https://static-assets.mszpro.com/2024/12/1R4cIBzJqfm6ZsN3-_R1ngg-300x99.png 300w, https://static-assets.mszpro.com/2024/12/1R4cIBzJqfm6ZsN3-_R1ngg-1024x337.png 1024w, https://static-assets.mszpro.com/2024/12/1R4cIBzJqfm6ZsN3-_R1ngg-768x253.png 768w, https://static-assets.mszpro.com/2024/12/1R4cIBzJqfm6ZsN3-_R1ngg-1536x506.png 1536w, https://static-assets.mszpro.com/2024/12/1R4cIBzJqfm6ZsN3-_R1ngg.png 1600w" sizes="(max-width: 1600px) 100vw, 1600px" /></figure>



<p>To access the size and bounding box:</p>



<script src="https://gist.github.com/mszpro/8786770e2c9c31789649fb1e5aca2929.js"></script>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1600" height="1361" src="https://static-assets.mszpro.com/2024/12/1JhIlM0y8G6lsX0c1Sl8uPA.jpg" alt="" class="wp-image-380" srcset="https://static-assets.mszpro.com/2024/12/1JhIlM0y8G6lsX0c1Sl8uPA-300x255.jpg 300w, https://static-assets.mszpro.com/2024/12/1JhIlM0y8G6lsX0c1Sl8uPA-1024x871.jpg 1024w, https://static-assets.mszpro.com/2024/12/1JhIlM0y8G6lsX0c1Sl8uPA-768x653.jpg 768w, https://static-assets.mszpro.com/2024/12/1JhIlM0y8G6lsX0c1Sl8uPA-1536x1307.jpg 1536w, https://static-assets.mszpro.com/2024/12/1JhIlM0y8G6lsX0c1Sl8uPA.jpg 1600w" sizes="auto, (max-width: 1600px) 100vw, 1600px" /></figure>



<h3 class="wp-block-heading">Getting a single image for all highlighted (selected) object</h3>



<p>You can also get a single image for any objects combined. For example, I can get an image of the left-most cat and right-most cat:</p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1600" height="2321" src="https://static-assets.mszpro.com/2024/12/1HbltUA8F_ycummv6ccaVXg.jpg" alt="" class="wp-image-378" srcset="https://static-assets.mszpro.com/2024/12/1HbltUA8F_ycummv6ccaVXg-207x300.jpg 207w, https://static-assets.mszpro.com/2024/12/1HbltUA8F_ycummv6ccaVXg-706x1024.jpg 706w, https://static-assets.mszpro.com/2024/12/1HbltUA8F_ycummv6ccaVXg-768x1114.jpg 768w, https://static-assets.mszpro.com/2024/12/1HbltUA8F_ycummv6ccaVXg-1059x1536.jpg 1059w, https://static-assets.mszpro.com/2024/12/1HbltUA8F_ycummv6ccaVXg-1412x2048.jpg 1412w, https://static-assets.mszpro.com/2024/12/1HbltUA8F_ycummv6ccaVXg.jpg 1600w" sizes="auto, (max-width: 1600px) 100vw, 1600px" /></figure>



<script src="https://gist.github.com/mszpro/6e6670a8070cc9731361f0c59ffd89c5.js"></script>



<h1 class="wp-block-heading">SwiftUI compatible view</h1>



<p>If you want to use the above logic in SwiftUI, we can write 3 files.</p>



<p>Here is the&nbsp;<code>ObservableObject</code>&nbsp;which helps share the data between the SwiftUI view `ImageAnalysisViewModel` and the compatibility view `ObjectPickableImageView`</p>



<script src="https://gist.github.com/mszpro/68cb28027663d2ce10207574c0349f33.js"></script>



<p>Here is the compatibility view:</p>



<script src="https://gist.github.com/mszpro/cb50b68400cc2798a6fb3efc7b51e5c4.js"></script>



<p>Here is the SwiftUI view:</p>



<script src="https://gist.github.com/mszpro/04035ff46a92c9d72f0a9e6d7a09560c.js"></script>



<p>In the SwiftUI view, you can see that we will call the analyzer class directly. For example, to highlight an object, we use `self.viewModel.interaction.highlightedSubjects.insert(object)`.</p>



<p>Here, we are using&nbsp;<code>.environmentObject</code>&nbsp;view modifier to link the&nbsp;<code>ObservableObject</code>&nbsp;to the compatibility view: `.environmentObject(viewModel)`</p>



<h2 class="wp-block-heading">Find the subject at tapped position</h2>



<p>We can also add a feature to detect which object user tapped on.</p>



<p>First, we will attach a tap gesture recognizer to the image view:</p>



<pre class="wp-block-code"><code>let tapGesture = UITapGestureRecognizer(target: self, action: #selector(handleTap(_:)))<br>imageView.addGestureRecognizer(tapGesture)</code></pre>



<p>In the&nbsp;<code>handleTap</code>&nbsp;function, we check if there is a subject at the tapped location. Then, we can either extract the image or highlight (or remove the highlight) for the (tapped) subject:</p>



<script src="https://gist.github.com/mszpro/ce9ffbac4c017374f15db2c83bb6a4bb.js"></script>



<p>In SwiftUI, we can use the&nbsp;<code>.onTapGesture</code>&nbsp;view modifier directly to read the tapped position:</p>



<script src="https://gist.github.com/mszpro/c291b274e79a322737ed095045f1bbec.js"></script>



<p>Now, you should be able to tap to highlight or remove the highlight of a subject within the image:</p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="346" height="502" src="https://static-assets.mszpro.com/2024/12/1TKS9IJzwj8GcshRA8IcDEw.gif" alt="" class="wp-image-382"/></figure>



<h1 class="wp-block-heading">Method 2: Using Vision requests</h1>



<p>If you do not want to show the image view, and just want to analyze and extract the objects within an image, you can directly use the `VNGenerateForegroundInstanceMaskRequest`, which is the underlying API that the above function calls.</p>



<p>You can run the analysis like below:</p>



<script src="https://gist.github.com/mszpro/38cdccb2d943af235a0ffcac80e4ddde.js"></script>



<p>This function takes the user selected (or your application input)&nbsp;<code>UIImage</code>&nbsp;, convert it to&nbsp;<code>CIImage</code>&nbsp;and then runs the Vision foreground object recognition requests.</p>



<h2 class="wp-block-heading">Extract the masked image</h2>



<p>We can get the mask of the objects. As shown below, a mask indicates the pixels that has the foreground objects. Here, the white part indicates pixels that has been detected as the foreground object, and the black part indicates pixels that are the background.</p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1600" height="2321" src="https://static-assets.mszpro.com/2024/12/1kKTtYYEwcCC1LphASbeSXw.jpg" alt="" class="wp-image-385" srcset="https://static-assets.mszpro.com/2024/12/1kKTtYYEwcCC1LphASbeSXw-207x300.jpg 207w, https://static-assets.mszpro.com/2024/12/1kKTtYYEwcCC1LphASbeSXw-706x1024.jpg 706w, https://static-assets.mszpro.com/2024/12/1kKTtYYEwcCC1LphASbeSXw-768x1114.jpg 768w, https://static-assets.mszpro.com/2024/12/1kKTtYYEwcCC1LphASbeSXw-1059x1536.jpg 1059w, https://static-assets.mszpro.com/2024/12/1kKTtYYEwcCC1LphASbeSXw-1412x2048.jpg 1412w, https://static-assets.mszpro.com/2024/12/1kKTtYYEwcCC1LphASbeSXw.jpg 1600w" sizes="auto, (max-width: 1600px) 100vw, 1600px" /></figure>



<p>In the below code, `convertMonochromeToColoredImage` function will help generate a preview mask image. <code>apply</code>function will help us to apply the mask to the original input image (so we only get an image of cats without the background).</p>



<script src="https://gist.github.com/mszpro/aaa77629576f227f1226f37314d76e23.js"></script>



<p>In the&nbsp;<code>apply</code>&nbsp;function, we can also try to supply a background image. I first scale and crop that background image to fit the original image, then, I apply it as the background of the foreground image.</p>



<figure class="wp-block-image"><img fetchpriority="high" decoding="async" width="1600" height="2321" src="https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ.jpg" alt="" class="wp-image-384" srcset="https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-207x300.jpg 207w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-706x1024.jpg 706w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-768x1114.jpg 768w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-1059x1536.jpg 1059w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ-1412x2048.jpg 1412w, https://static-assets.mszpro.com/2024/12/1PE4kbADw0EBAea2QjFy2NQ.jpg 1600w" sizes="(max-width: 1600px) 100vw, 1600px" /></figure>



<p>Yep! That’s how I made those cats party!</p>



<p>You can find the full project code (in SwiftUI) here:&nbsp;<a href="https://github.com/mszpro/LiftObjectFromImage" target="_blank" rel="noreferrer noopener">https://github.com/mszpro/LiftObjectFromImage</a></p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p>:relaxed: [Twitter&nbsp;<a href="http://twitter.com/MszPro" target="_blank" rel="noreferrer noopener">@MszPro</a>](<a href="https://twitter.com/MszPro" target="_blank" rel="noreferrer noopener">https://twitter.com/MszPro</a>)</p>



<p>:relaxed: 個人ウェブサイト&nbsp;<a href="https://mszpro.com/" target="_blank" rel="noreferrer noopener">https://MszPro.com</a></p>



<figure class="wp-block-image"><img loading="lazy" decoding="async" width="1200" height="900" src="https://static-assets.mszpro.com/2024/12/1WR41Ei1wO48T1HY7UnA61g.png" alt="" class="wp-image-379" srcset="https://static-assets.mszpro.com/2024/12/1WR41Ei1wO48T1HY7UnA61g-300x225.png 300w, https://static-assets.mszpro.com/2024/12/1WR41Ei1wO48T1HY7UnA61g-1024x768.png 1024w, https://static-assets.mszpro.com/2024/12/1WR41Ei1wO48T1HY7UnA61g-768x576.png 768w, https://static-assets.mszpro.com/2024/12/1WR41Ei1wO48T1HY7UnA61g.png 1200w" sizes="auto, (max-width: 1200px) 100vw, 1200px" /></figure><p>The post <a href="https://mszpro.com/vision-foreground-instance-mask-request">Detect, extract, Segment objects on given image in iOS (ImageAnalysisInteraction, VNGenerateForegroundInstanceMask)</a> first appeared on <a href="https://mszpro.com">MszPro・株式会社Smartソフト</a>.</p>]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>

<!--
Performance optimized by W3 Total Cache. Learn more: https://www.boldgrid.com/w3-total-cache/

Page Caching using Disk: Enhanced 
Lazy Loading (feed)
Database Caching 13/73 queries in 0.020 seconds using Disk

Served from: mszpro.com @ 2025-07-08 14:22:44 by W3 Total Cache
-->