The Role of Tracking in Augmented Reality

By Wendy (Klotz) Mlynarek
April 29, 2024
DELMIA
Feature

Summary

The tracking process is the basis of the application that offers an augmented reality experience.

The Role of Tracking in Augmented Reality

Tracking, the software process that locates a given product in a real-time video camera acquisition, can be used to identify the equipment to be assembled or inspected. The tracking process is the basis of the application that offers an augmented reality experience.

In other words, tracking is:

Locating the camera in relation to specific objects (target objects).
Understanding and mapping the environment.
Real-time recognition and tracking of specific objects (target objects) based on movements.

In addition, the accuracy of the 3D model in relation to reality, the texture and shape of target objects (presence of edges) and lighting conditions (visibility) are key points to take into account for more effective tracking.

Tracking relies on detection and tracking algorithms. These are trained to recognize and follow the object’s distinctive characteristics, such as its shape, color, contours and so on.

For tracking to work, these elements are required:

The real object being manipulated
The specific object to be tracked
The tracking model (or 3D model) of the object to be tracked

Manual tracking initialization

In the case of manual tracking initialization, the user manually defines the first position to start tracking. The application then stores certain reference images taken during tracking. These reference images are a starting point for the tracking initialization process.

When the user defines the first position, the application captures images that represent the scene from different angles and perspectives. These images are then used as a reference for subsequent tracking. They may contain objects or specific elements of the scene that the user wishes to track. For tracking initialization, the reference images are compared with real-time images captured by the device’s camera. Image matching techniques are used to find correspondences between the pixels of the reference images and those of the real-time images.

Using these correspondences, the application can estimate the transformations required to align the reference images with the real-time images. This determines the initial position and orientation of the objects to be tracked in relation to the camera.

Once tracking is initialized, the tracking process begins. The application uses the initial information obtained to track objects in the scene in real time. The positions and orientation will be maintained in the video stream.
This method of tracking initialization depends on the accuracy of the initial position set by the user. If the position is incorrect or imprecise, this can lead to tracking errors later on. However, this approach can be useful in cases where the user wishes to track specific objects and is able to provide a reasonable initial estimate of their position.

Tracking initialization with deep learning

Tracking with AR
When talking about deep learning-based tracking initialization, the idea is to use a trained model to perform this task. This model is capable of learning to recognize relevant features and patterns in reference images in order to provide a more robust and accurate tracking initialization.

The tracking initialization process based on deep learning typically involves several steps:

Data collection: It is necessary to gather a training dataset that includes reference images taken during tracking. These images must cover a variety of scenes, illuminations and environmental conditions in order to obtain a model capable of generalizing efficiently.
Model training: Once the training data has been collected, it is used to train a deep learning model. This model can be a convolutional neural network (CNN) or another architecture adapted to the tracking initialization task. The aim of training is to teach the model to recognize features relevant to tracking initialization, such as objects of interest, distinctive patterns or landmarks in the image.
Validation and adjustment: Once the model has been trained, it is evaluated on a separate validation dataset to measure its performance and effectiveness. If necessary, adjustments can be made to improve model performance, such as increasing the training data, adding regularization or optimizing parameters.
Using the model: Once the model has been sufficiently trained and validated, it can be used to initialize tracking in a real application. When the user starts tracking, the application uses the model to analyze the first images and estimate the initial position of the object or target to be tracked. This initial estimate is then used as the starting point for the continuous tracking process.

Tracking techniques

By using deep learning for tracking initialization, we can therefore obtain a method that is more robust and adaptable to changes in illumination, scene background and other visual variations. However, it should be noted that the effectiveness of the model depends on the quality and diversity of the training data, as well as the performance of the learning algorithm used.

Tracking vs. tracking model

Used in many fields, such as computer vision, robotics and augmented reality, a tracking model is a model or algorithm used as part of the tracking process described above, more specifically in the tracking and localization of objects in image or video sequences.

It is designed to estimate and predict the position, movement and characteristics of an object of interest over time. It can be based on machine learning techniques such as supervised learning, unsupervised learning or reinforcement learning. The aim is to provide precise information on the position and movement of objects, enabling tracking, detection, recognition and analysis.

The tracking model should be distinguished from the term tracking, as they are not the same thing: they are interdependent, and one cannot function without the other.

For model tracking to be possible, several conditions must be met:

3D CAD models
They must be accurate in relation to reality
The associated target object must remain visible in the camera during operations
Contain one or more parts

Tracking is used in the DELMIA Augmented Experience solution to identify the equipment to be assembled or inspected thanks to its 3D model, to simultaneously locate several elements at once, and to display the digital information required for the industrial process in the right place, at the right time and at the right scale.

About The Author

Wendy (Klotz) Mlynarek is DELMIA Strategic Business Development and Marketing Director at DELMIA Marketing.

Watch the Video

Did you enjoy this great article?

Check out our free e-newsletters to read more great articles..