January 1st, 2024
Reading and Writing Spatial Video with AVFoundation
- Reading Spatial Video using AVAssetReader
- Reading Spatial Video using AVPlayer
- Writing Spatial Video using AVAssetWriter
If you’ve worked with AVFoundation’s APIs, you’ll be familiar with CVPixelBuffer
, an object which represents a single video frame. AVFoundation manages the tasks of reading, writing, and playing video frames, but the process changes when dealing with spatial video (aka MV-HEVC), which features video from two separate angles.
Loading a spatial video into an AVPlayer or AVAssetReader on iOS appears similar to loading a standard video. By default, however, the frames you receive only show one perspective (the “hero” eye view), while the alternate angle, part of the MV-HEVC file, remains uncompressed.
With iOS 17.2 and macOS 14.2, new AVFoundation APIs were introduced for handling MV-HEVC files. They make it easy to get both angles of a spatial video, but are lacking in documentation1. Here’s a few tips for working with them:
Reading Spatial Video using AVAssetReader
AVAssetReader
can read media data faster than realtime, and is good for reading a video file, applying some transform, and writing back to a file using AVAssetWriter
(for example, converting a spatial video to an SBS video playable on a Meta Quest / XREAL Air). Reading spatial video requires informing VideoToolbox that we want both angles decompressed from the video, instead of just the “hero” eye.
- Create an
AVAssetReader
:
let asset = AVAsset(url: <path-to-spatial-video>)
let assetReader = try AVAssetReader(asset: asset)
- Create an
AVAssetReaderTrackOuptut
, specifying that we want both MV-HEVC video layers decompressed:
let output = try await AVAssetReaderTrackOutput(
track: asset.loadTracks(withMediaType: .video).first!,
outputSettings: [
AVVideoDecompressionPropertiesKey: [
kVTDecompressionPropertyKey_RequestedMVHEVCVideoLayerIDs: [0, 1] as CFArray,
],
]
)
assetReader.add(output)
- Start copying sample buffers containing both angles:
assetReader.startReading()
while let nextSampleBuffer = output.copyNextSampleBuffer() {
guard let taggedBuffers = nextSampleBuffer.taggedBuffers else { return }
let leftEyeBuffer = taggedBuffers.first(where: {
$0.tags.first(matchingCategory: .stereoView) == .stereoView(.leftEye)
})?.buffer
let rightEyeBuffer = taggedBuffers.first(where: {
$0.tags.first(matchingCategory: .stereoView) == .stereoView(.rightEye)
})?.buffer
if let leftEyeBuffer,
let rightEyeBuffer,
case let .pixelBuffer(leftEyePixelBuffer) = leftEyeBuffer,
case let .pixelBuffer(rightEyePixelBuffer) = rightEyeBuffer {
// do something cool
}
}
Reading Spatial Video using AVPlayer
When dealing with real-time playback, you’ll often want to use AVPlayer
, which manages the playback and timing of a video automatically. AVPlayerVideoOutput
is the new API added for reading spatial video in real time, and is straightforward to set up.
- Create an
AVPlayer
:
let asset = AVAsset(url: <path-to-spatial-video>)
let player = AVPlayer(playerItem: AVPlayerItem(asset: asset))
- Create an
AVPlayerVideoOutput
for outputting stereoscopic video:
let outputSpecification = AVVideoOutputSpecification(
tagCollections: [.stereoscopicForVideoOutput()]
)
let videoOutput = AVPlayerVideoOutput(specification: outputSpecification)
player.videoOutput = videoOutput
- Add a periodic time observer for reading frames at a specified interval:
player.addPeriodicTimeObserver(
forInterval: CMTime(value: 1, timescale: 30),
queue: .main
) { _ in
guard let taggedBuffers = videoOutput.taggedBuffers(
forHostTime: CMClockGetTime(.hostTimeClock)
)?.taggedBufferGroup else { return }
let leftEyeBuffer = taggedBuffers.first(where: {
$0.tags.first(matchingCategory: .stereoView) == .stereoView(.leftEye)
})?.buffer
let rightEyeBuffer = taggedBuffers.first(where: {
$0.tags.first(matchingCategory: .stereoView) == .stereoView(.rightEye)
})?.buffer
if let leftEyeBuffer,
let rightEyeBuffer,
case let .pixelBuffer(leftEyePixelBuffer) = leftEyeBuffer,
case let .pixelBuffer(rightEyePixelBuffer) = rightEyeBuffer {
// do something cool
}
}
- Play!
player.play()
Writing Spatial Video using AVAssetWriter
Following the advice in Q&A: Building apps for visionOS, the steps for creating a spatial video from two stereo videos are:
- Create an
AVAssetWriter
:
let assetWriter = try! AVAssetWriter(
outputURL: <path-to-spatial-video>,
fileType: .mov
)
- Create and add a video input for the spatial video. It is important to specify
kVTCompressionPropertyKey_MVHEVCVideoLayerIDs
,kCMFormatDescriptionExtension_HorizontalFieldOfView
, andkVTCompressionPropertyKey_HorizontalDisparityAdjustment
in the compression properties. Without these, your video will not be read as a spatial video on visionOS:
let input = AVAssetWriterInput(
mediaType: .video,
outputSettings: [
AVVideoWidthKey: 1920,
AVVideoHeightKey: 1080,
AVVideoCompressionPropertiesKey: [
kVTCompressionPropertyKey_MVHEVCVideoLayerIDs: [0, 1] as CFArray,
kCMFormatDescriptionExtension_HorizontalFieldOfView: 90_000, // asset-specific, in thousandths of a degree
kVTCompressionPropertyKey_HorizontalDisparityAdjustment: 200, // asset-specific
],
AVVideoCodecKey: AVVideoCodecType.hevc,
]
)
assetWriter.add(input)
- Create an
AVAssetWriterInputTaggedPixelBufferGroupAdaptor
for the video input:
let adaptor = AVAssetWriterInputTaggedPixelBufferGroupAdaptor(assetWriterInput: input)
- Start writing:
assetWriter.startWriting()
assetWriter.startSession(atSourceTime: .zero)
- Start appending frames. Each frame consists of two
CMTaggedBuffer
s:
let left = CMTaggedBuffer(tags: [.stereoView(.leftEye), .videoLayerID(0)], pixelBuffer: leftPixelBuffer)
let right = CMTaggedBuffer(tags: [.stereoView(.rightEye), .videoLayerID(1)], pixelBuffer: rightPixelBuffer)
adaptor.appendTaggedBuffers([left, right], withPresentationTime: <presentation-timestamp>)
- Finish writing:
input.markAsFinished()
assetWriter.endSession(atSourceTime: <end-time>)
assetWriter.finishWriting {
// share assetWriter.outputURL
}
Footnotes
-
Since publishing this article, Apple has added sample code for both reading multiview 3D video files and writing multiview HEVC. ↩