Machine Learning Operations on ZYNQ FPGA Board for Real-Time Face Recognition

1. Introduction

The physical security of data centers is a major challenge to ensure the availability and protection of critical infrastructure. Data center servers store critical information for businesses and individuals, making them important targets for hackers and exposing them to cyberattacks, including cyber-physical attacks.

Access to data center rooms is controlled by badges and RFID (radio frequency identification) cards. This access control method has several advantages, including rapidity and easy deployment [1]. However, this method also has limitations, such as the possibility of copying simple cards and carrying out cloning or replay attacks. Hackers can exploit these flaws as vulnerabilities to gain unauthorized access to data center rooms and, as a result, to sensitive data, or even damage equipment.

To address this issue, we propose a machine learning operations architecture based on a convolutional neural network (CNN) model for facial recognition, implemented on the ZYNQ-7000 FPGA electronic board. The system allows decisions to be made to open the door for authorized users and keep it closed for unauthorized users.

The MLOps approach adapts to changes in the dataset, such as adding a new user or removing an existing user. This allows the model to be automatically retrained after each change to the dataset, enabling the system to quickly take decisions about whether to open the door or keep it closed.

In this paper, we considered integrating DevOps principles into the CNN model lifecycle, which naturally led to the adoption of the MLOps paradigm. Increasingly widespread, MLOps is now an essential approach for automating, orchestrating, and ensuring the reliability of the machine learning pipeline as a whole. Based on the spirit and best practices of DevOps, MLOps provides solutions to the limitations of traditional approaches: data version management, continuous training, automated testing, secure deployment, and operational monitoring of models.

Several recent studies have demonstrated the effectiveness of MLOps in a variety of environments. However, few studies explore its implementation on specialized hardware platforms such as ZYNQ-7000 FPGA boards, which are widely used in industrial environments requiring high performance, minimal latency, and robustness. In this context, our work focuses on integrating a complete MLOps pipeline into an embedded architecture, demonstrating its feasibility and advantages for real-time physical security applications.

The paper in ref. [2] investigates the integration of ML (machine learning) techniques into the DevOps framework, to automate the RCA (root cause analysis) to improve incident management using data-driven techniques to detect, diagnose, and resolve incidents. In [3] the authors benefit from AI methods and techniques to optimize and automate the critical steps in ML projects through the utilization of AutoML (automated machine learning) tools. The study in ref. [4] improves the deliverability and quality of ML application by adopting DevOps practices within ML workflow. In [5] ML models were integrated into the DevOps workflow to manage the data version, model testing, continuous delivery, and continuous monitoring for performance degradation and concept drift. The work in ref. [6] presents a way to integrate CI/CD (continuous integration (CI) and continuous delivery (CD)) pipelines for ML applications in development and production environments using the necessary tools with minimal cost. Ref. [7] proposes various AI techniques with the goal of strengthening test automation for machine learning models through the integration of a DevOps-MLOps pipeline. The study in paper [8] integrates ML models into DevOps workflow in microservices architectures to automate scalable and resource allocation in the application architecture. Ref. [9] provides a deep analysis of the quality and scalability of machine learning pipelines implemented in a Kubernetes cluster for financial services, from data versioning to model monitoring stages. The authors in paper [10] proposed an AutoML pipeline for AML4S Streams that automates the choice of ML models, preprocessing techniques, and the tuning of hyperparameters in cases where datasets change for online machine learning applications. In [11], Kafka-ML was proposed as a solution for monitoring and managing AI and ML pipelines using data streams, benefiting from the advantages of the open-source Kafka-ML framework, which provides an easier web interface for the end user. The work in ref. [12] describes the development of an MLOps workflow to classify models that use Big Data techniques, this workflow is deployed in a Kubernetes cluster for autoscaling and load balancing. The study in ref. [13] proposes an MLOps architecture addressing development, deployment in cloud, maintenance and monitoring to automate the end-to-end soft sensor lifecycle in industrial scale fed-batch fermentation. The existing studies apply MLOps approaches in various environments, including the cloud, GPU servers, and distributed systems. However, very few studies explore its integration on embedded platforms such as FPGA cards, although they play a key role in many industrial applications requiring high performance, low latency, and high energy efficiency. The paper in ref. [14] presents MLOps pipelines to solve issues related to the deployment of ML models in edge environments. The work in ref. [15] introduces a federated learning-based intrusion detection system for smart buildings connected to 6G networks using MLOps by implementing a zero-touch pipeline.

In this context, our paper aims to propose an MLOps architecture suitable for deploying a CNN model on a Zynq-7000 FPGA board, as well as to describe in detail the implementation steps on this platform. This approach demonstrates the feasibility of a complete MLOps pipeline in an embedded hardware environment and opens up new possibilities for real-time intelligent systems.

2. Materials and Methods

The aim of this work is to implement an MLOps pipeline on an embedded electronic system to automate the deployment of CNN models for real-time face detection, improving door security access using flexible, low-cost, and open-source tools.

The architecture of the proposed system consists of several modules and blocks, as illustrated in Figure 1.

HD Camera: Its role is to detect faces and transmit images to the ZYNQ FPGA board.
ZYNQ-7000 FPGA board and CNN and MLOps: embedded hardware platform for implementing MLOps pipeline to automatically deploy CNN models whenever dataset changes occur.
Servomotor and LEDs: The control block determines whether to open the door; the LEDs turn green if access is granted and red if it is denied.
Cloud Data Storage: This component stores training and validation data, particularly the images used for models training and real time images.
Mobile Application: Used to monitor individual access, and manage users’ roles and data, including the addition and removal of users.
MLflow interface: Responsible for tracking the complete CNN model lifecycle, including training parameters, performance metrics, generated versions, and associated artifacts. MLflow ensures the traceability and reproducibility of the MLOps pipeline.
Grafana interface: This stage enables real-time monitoring of deployment environment performance, including FPGA resource usage, and generates alerts in the event of anomalies or performance degradation.
Continuous Training Trigger: After collecting data via the camera, the FPGA card sends it to the cloud. Retraining of CNN models is triggered when new labeled data becomes available. Once retrained and validated, the updated model is redeployed to the embedded card, ensuring that it always runs the latest optimized model without performing full training locally.

This integrated architecture enables the implementation of a complete MLOps pipeline, from data management to secure deployment on embedded hardware, while ensuring continuous monitoring and improvement of the model in a realistic and constrained environment.

2.1. Convolutional Neural Network (CNN)

A convolutional neural network is a machine learning algorithm, specifically a deep learning (DL) model, it consists of neurons that have adjustable biases and weights, the neuron performs a mathematical operation, known as a dot product, between its input and weights, followed by a non-linear activation function. CNN models are particularly well suited to applications such as recommendation systems, natural language processing, and image classification, thanks to their ability to learn complex features [16].

In addition, CNNs can extract information from raw images into a single distinguishable feature, through a series of layered operations designed to capture and process relevant characteristics, as illustrated in Figure 2 below.

The convolution layer is essential to the operation of CNNs as it plays an essential role in extracting useful features from input data, enabling the network to learn and understand complex visual patterns [16]. This layer is basically a convolution of the image from the previous layer, where the weights define the convolution filter.

The input image is given by I(i, j), where each pixel is considered a scalar, the filter is represented as a kernel K(n, m), and the convolved output is given by h(i, j). Figure 3 shows a convolution operation.

h ( i ; j ) = ( I ∗ K ) ( i , j ) = ∑ m ∑ n I ( i − m , i − n ) K ( m , n )

(1)

Pooling Layer: The main objective of this layer is to minimize the dimensions, the complexity of the model, and the number of parameters. The most popular pooling techniques are average pooling and max pooling [16].
Max Pool: type of pooling operation; returns the maximum value within the receptive field [16].
Min Pool: this operation selects the minimum value within the pooling window [16].
Average Pool: this function provides the average value in the target field as presented in Figure 4 [16].
Fully Connected (FC) Layer: FC layers are located at the back of CNN structures and supervise the learning of complex features and the detection of global relationships. Consequently, each node in an FC layer is directly linked to all nodes in the layer above and below it [16]. Figure 5 shows FC layer.
Activation Layer: The activation layers enable the network to learn highly complex connections between the feature maps. The activation function defines the output of the model, its prediction accuracy, and computing efficiency during training. These characteristics are provided through special hardware logic in the hardware implementation [16].

2.2. FPGA: ZYNQ 7000

The ZYNQ-7000 is one of the electronics boards from Xilinix FPGA product line, based on the Xilinx All Programmable SoC (system on chip) architecture, with interfaces such as a secure digital (SD)-card, Ethernet, and USB [16,17]. ZYNQ-7000 board contains ARM Cortex-A9 cores with double-precision floating point support in the processing system (PS), standard programmable logic (PL), two CPU, and 220 DSP (digital signal processing) slices to accelerate and optimize mathematical operations [16,17,18]. The connection between the PS and PL is programmable and multiple, enabling swift data communication [16]. This architecture renders the Zynq7000 SoC perfect for applications such as machine learning, image processing, and digital signal processing with superior performance [19]. The board supports various hardware description languages and development tools, particularly Xilinx Vivado which is the principal tool for hardware design, simulation, and deployment [19]. This SoC handles the PetaLinux operating system and PYNQ framework to generate an overlay layer on the FPGA board, creating a virtual programmable space [16,17,20]. On the Overlay layer, we can run Python and install packages and tools such as MLflow, Gitlab Runner.

In addition, the Zynq-7000 FPGA card has significant physical characteristics, such as a maximum frequency of approximately 100 MHz, power consumption between 1.73 W and 1.95 W, and support for 32-bit floating point precision [21].

The Zynq 7000 SoCs feature integrated dual-core Arm Cortex-A9 processors with an Artix 7 or Kintex 7 28 nm-based PL for excellent performance per watt and maximum design flexibility [22]. Supporting up to 6.6 million logic cells and available with transceivers ranging from 6.25 Gbit/s to 12.5 Gbit/s, Zynq 7000 SoCs enable highly differentiated designs for a wide range of embedded applications, including multi-camera driver assistance systems and Ultra-HD 4K2K television [22]. Figure 6 presents the architecture of the ZYNQ 7000 SoC:

The Table 1 presents the PS features of ZYNQ 7000:

The Table 2 shows the PL features of ZYNQ 7000:

The characteristics of the ZYNQ-7000 SoC show that the ZYNQ-7000 platform is best suited for artificial intelligence applications. It is particularly well suited for implementing an MLOps approach for a CNN model dedicated to facial recognition.

2.3. MLOps: Pipeline Architecture, Tools

MLOps means machine learning operations, which is the application of DevOps practices to machine learning models. DevOps was first introduced as a working philosophy to resolve communication issues between developers and operations teams and to increase the collaboration between these two teams in order to deploy applications effectively, MLOps ensure also the collaboration with the data science team.

DevOps is designed to optimize the software development lifecycle by integrating continuous integration, delivery, and monitoring [23]. Nevertheless, traditional DevOps pipelines are unsuited to data-driven ML applications and present challenges including data versioning and drift, MLOps handles end-to-end automation and management processes such as data preprocessing, model training, hyperparameter tuning, validation, deployment, monitoring, and retraining [23].

The objective of MLOps is to build a complete set of procedures to efficiently and quickly create machine learning models using DevOps tools, workflows, and processes [24]. It strives to automate software delivery to assure continuous delivery; for this reason, integrating machine learning into DevOps CI/CD pipelines involves extra steps due to variations in ML procedures [24]. MLOps is vital for operationalizing machine learning solutions, it focuses on the automation and execution of pipelines, relying on the platforms such as MLflow, TFX, and Kubeflow to support the operational ML lifecycle, including automation, orchestration, and continuous integration and delivery [25].

2.3.1. MLOps: Pipeline Architecture

Our proposed MLOps pipeline aims to improve image classification and recognition in the context of a door security system, particularly for data centers. It is designed to facilitate the testing, integration, and deployment of machine learning models (CNN models) on Xilinx’s ZYNQ-7000 FPGA board.

The pipeline automates the entire lifecycle of implementing the CNN model on the ZYNQ-7000 board. The main steps in the pipeline include data versioning, model training, testing, building, deployment, and real-time monitoring.

The door security system is monitored and controlled via a mobile app, allowing access to be managed and individual activity to be tracked. When access rights (authorization or denial) are granted to a new person, or modified for an existing user via the app, the pipeline is automatically triggered. It records the images and associated rights in a cloud storage database, and then the model begins the analysis and classification process.

Figure 7 provides an overview of our MLOps pipeline deployed on the ZYNQ-7000 FPGA board.

The MLOps pipeline for the FPGA board consists of five main steps:

Step 1: Data Versioning and Quality: This step aims to ensure the traceability of datasets, as well as the verification of image and class quality.
Step 2: Training, Testing, Building, and Deploying the ML Model: After the dataset has been validated, the model is trained with the new data, tested, built, and then prepared for deployment on the ZYNQ FPGA board.
Step 3: Experiment Tracking and MLflow: This step allows the possibility to track the history of trained models, compare hyperparameters, and evaluate performance across different metrics.
Step 4: Security and Validation: This step involves analyzing Python dependencies to detect potential vulnerabilities and running functional tests after deployment to ensure system reliability.
Step 5: Real-Time System Monitoring: This final step allows real-time monitoring of the status of the model and system on the FPGA board to ensure optimal operation and proactive maintenance.

2.3.2. MLOps: Tools

MLflow

MLflow is an open-source tool designed by Databricks for managing and tacking the lifecycle of ML models, by handling experiments and models as well as data versioning [26,27]. It offers comprehensive support for traditional deep learning and machine learning workflows, from model versioning and experiment tracking to deployment and monitoring, and optimizes every stage of the ML lifecycle [14,26]. It supports, controls, and executes many machine learning libraries and programming languages, it enables the deployment of models as web services in multiple inference platforms, such as cloud providers or server infrastructures [28].

Data Versioning: DVC

DVC, meaning Data Version Control, is an open-source tool developed specifically for data versioning, it provides a clear and efficient approach to manage data version through integration with Git, and stands out for its capacity to maintain flexibility and data portability [27].

DVC offers several advantages, such as tracking experiments to monitor data and modules over time, allowing collaboration by sharing data between team members and ensuring everyone works with the same data version, and supporting reproducibility by capturing the exact data and model versions used in experiments, making it easier to reproduce them later [29].

Gitlab CI/CD

CI/CD is an abbreviation for Continuous Integration and Continuous Delivery (or Deployment), it is a DevOps practice that enables continuous building, testing, deploying, and monitoring of code changes [30,31]. A CI/CD pipeline automates the workflow that leads software from source code through build, test, and release stages [32].

GitLab is the first tool and web application capable of handling the entire the DevOps lifecycle, created in 2011 by Valery Sizov and Dmitriy Zaporozhets, it is based on Git for code versioning, provides agile project management features, and enables automation through CI/CD via Gitlab Runner [33]. The principal components of GitLab CI/CD are GitLab Runners, which execute the CI/CD jobs defined in the .gitlab-ci.yml file; Pipelines to automate stages such as build, test and deploy; Environments, which represent the destination where code is deployed, such as staging, production, or testing; Artifacts, which are temporary or persistent files generated during the pipeline and can be used in subsequent pipeline stages [34].

Monitoring: Grafana and Prometheus

Grafana is an open source tool that enables the visualization of data, metrics, log, and alert, extracted from the data source such as CloudWatch, Elasticsearch, Loki, and Prometheus, using queries, to create panels and dashboards for presenting data through a web interface [35,36].

Prometheus is a monitoring and alerting tool, it collect and extract data via HTTP requests from endpoints that expose metrics, stores the data in time series format, and evaluates it using the PromQL query language [35].

3. Experimental Setup

The deployment of the MLOps architecture on the ZYNQ FPGA board is performed in two separate environments: the first is a test environment, and the second is the production environment, where the MLOps pipeline is implemented directly on the FPGA board.

The configuration of the test environment (QEMU) and the physical board is completed after generating a Hardware Description File using Vivado [37], as shown in Figure 8.

The PetaLinux operating system enables the development of embedded linux systems from scratch; it helps developers launch projects through a simple scripting language, configure the device tree based on bitstream files imported from Vivado and hardware description files, assists in building a minimal root filesystem, and finally generates the kernel image and the bootloader, securing that the system can boot and run without interruption [38].

Implementing a project with PetaLinux requires an exported .xsa file from Vivado to configure the target platform with bitstream and the board information, it sets up the required features and the kernel flags to execute on the board [39].

QEMU is an open source virtualization and simulation tool used for system emulation, it provides a virtual environment of an entire machine (emulated devices, memory and CPU) to run a target operating system [40].

Xilinix QEMU offers connectivity and supports mixed simulations environments with the remote-port framework [41]. Xilinix supports a SystemC/TLM interface to connects QEMU, and design the embedded processing system (PS) of any SoC architecture based on ZYNQ, to a template of custom IP blocks integrated into programmable logic (PL) [41].

In our project, the guest OS is PetaLinux, and QEMU is installed on a CentOS 8 Stream virtual machine.

3.1. Dataset

The dataset used in our project is an open dataset obtained from Kaggle [42]. It presents celebrity images and includes five classes; each class contains the face images of a specific actor. This dataset aligns perfectly with our CNN model and the objective of the proposed system, which aims to implement a face-based access-control mechanism. In this scenario, the system processes facial images captured by the camera, and splitting dataset into several classes enables the model to perform effectively and correctly recognize individuals, which allows the perfect identification of authorized and unauthorized people.

The dataset consists of more than 400 images, which were divided into three subsets:

Training set (70%): for learning and adjusting the model parameters.
Validation set (15%): for improving the hyperparameters and preventing overfitting during training.
Test set (15%): for evaluating and assessing the final performance of the model.

The dataset is relatively limited in size, but it reflects a realistic deployment scenario for embedded access control systems, where the number of users is typically limited. The goal of this work is not to build a large-scale facial recognition system but instead to demonstrate the feasibility of deploying an automated MLOps pipeline on an embedded FPGA platform. However, the dataset has a few limitations, including the limited number of classes and images, which may reduce the model’s ability to generalize in large-scale environments. Future work will therefore focus on larger and more diverse datasets to further validate the scalability of the proposed approach.

To compensate for the limited size of the dataset, an increase in real-time image data was applied using the ImageDataGenerator function. Several transformations, including rotation (40°), zoom (0.2), shear (0.2), horizontal flipping, and width/height shift (0.2), were used to artificially increase the diversity of the training images and improve the generalization ability of the CNN model. Every original image generates seven new images, resulting in eight different variations in the same image (1 original image + 7 augmented images), as illustrated in Figure 9. This approach significantly increases the diversity of the training data and allows the model to learn realistic variations in faces, such as changes in orientation, position, or framing. In our case, the dataset contains 400 images. Approximately 70% of these images are dedicated to training, which corresponds to 280 images for the training phase. After applying data augmentation techniques, each image generates eight variations, giving a total of 2240 images used for training (8 × 280).

3.2. CNN Model

The aim of our article is to automate the deployment of a convolutional neural network (CNN) model for facial recognition on a ZYNQ FPGA board, using an MLOps pipeline. In this context, we decided to deploy the Faster R-CNN model on the FPGA board.
The choice of the Faster R-CNN model is based on its ability to simultaneously perform face localization and classification in an image thanks to the integration of a region proposal network (RPN), which automatically generates regions of interest and improves detection accuracy.
The Faster R-CNN model was trained with 128, 128, and 384 filters in its three convolutional layers, with kernel sizes of 5, 5, and 3, respectively. It also incorporates two dropout layers with rates of 0.3 and 0.5, a dense layer of 256 units, and a learning rate of 0.01. Table 3 presents the best hyperparameters obtained for the Faster R-CNN model.

3.3. MLOps Implementation on QEMU

The implementation of MLOps architecture for image classification in the test environment is primarily based on QEMU technology and the PetaLinux operating system, which enable the creation of a virtual environment similar to the FPGA board. This process comprises several levels, as demonstrated in Figure 10.

In GitLab, we created a dedicated “MLOps_FPGA_ZYNQ” project containing all the scripts executed within the CI/CD pipeline. The “.gitlab-ci.yml” pipeline consists of six main stages: data versioning, model training and MLflow, model testing, model building, model deployment, and security scanning. Each stage of the pipeline uses a specific Python script, responsible for executing the tasks corresponding to its objective:

data_quality.py: this script is dedicated to checking the quality of the dataset. It detects corrupted images, abnormal dimensions, imbalances between classes, and missing values. It mainly uses the os and Image modules from the Pillow library.
train.py: This script is responsible for training the CNN model for image classification and experimental tracking using MLflow. It uses the TensorFlow, Keras, mlflow, and mlflow.keras libraries for training and tracking model parameters.
test_model.py: This script is used to validate the pre-trained model. It generates a random image of the appropriate size (np.random.rand(…)) and uses the loaded model to make a prediction on this fake image, thus verifying the consistency and stability of the model.
convert_to_tflite.py: This script prepares the model for deployment by converting it from .h5 format to TensorFlow Lite (TFLite) format, which is suitable for execution on QEMU and the ZYNQ-7000 FPGA board.
run_model.py: This script is responsible for deploying and running the optimized TFLite model in the target environment.
test_fpga.py: This script validates the operation of the model deployed on the target environment, QEMU for testing and ZYNQ FPGA for production, by verifying that the model generates the expected predictions for a given input set.

This pipeline is deployed on an EC2 cloud instance from AWS cloud provider, running the CentOS 9 operating system. In this virtual machine, we installed DVC to ensure data management and traceability. We then prepared a deployment environment similar to that of the ZYNQ FPGA board by installing PetaLinux version 2020.2 and QEMU. In the VM, we installed and configured GitLab Runner (version 18.3.1) to execute CI/CD pipeline jobs. The environment also includes Python 3 for script execution, as well as pip-audit for security analysis and vulnerability detection in Python dependencies. We installed the MLflow package (version 3.1.4) to automate the tracking and traceability of training sessions, including hyperparameters, metrics, models, and artifacts. The MLflow server has been initialized locally, providing a web interface accessible via port 5000, allowing us to compare and view the different experiments.

Regarding system and model monitoring, Prometheus and Grafana have been installed and configured to ensure the collection and visualization of metrics. These include both PetaLinux system metrics (CPU, memory, etc.) and machine learning model metrics (accuracy, predictions, model version, etc.). Grafana is configured to listen on port 3000, while Prometheus collects data on different ports depending on the source: 9090 for the VM, 9091 for PetaLinux, and 8001 for metrics related to the ML model. Machine learning metrics are collected by a Python script running as a background service within the VM.

The S3 Bucket service from cloud provider AWS is used as remote storage for project data. The S3 Bucket configuration is based on authentication parameters (access_key_id, secret_access_key, endpoint_url) and allows versioned data to be pushed to DVC for effective data lifecycle management. The security and confidentiality of biometric data is ensured by encrypting all data stored in S3 using the AES-256 algorithm and controlling access to it through strict IAM policies. The data transmission between the mobile management application and the cloud uses the TLS protocol. This system is based on a security model that considers potential attacks on the mobile application and cloud storage and incorporates anti-spoofing measures into the facial recognition pipeline.

The objective of implementation in a test environment is to ensure that the system, scripts, and proposed MLOps workflow are functioning properly, while facilitating the detection of anomalies, bugs, and potential bottlenecks prior to final deployment on the FPGA board.

3.4. MLOps Implementation on the ZYNQ 7000 Board

The architecture of the MLOps pipeline deployed on the ZYNQ-7000 FPGA board is similar to that deployed in the QEMU virtual environment. The Table 4 presents the features of the ZYNQ 7020 FPGA card used in the implementation.

The CI/CD pipeline is completely preserved; the only difference is that the virtual machine is replaced by the physical FPGA card, as shown in Figure 11.

The necessary tools are installed and configured directly on the FPGA board, thanks to the PetaLinux operating system, which allows the board to be used as an embedded computer while providing the interface between the hardware and software. Under this system, we launched Python 3, MLflow, GitLab Runner, DVC, Prometheus, and Grafana, creating a complete software stack for running, tracking, and monitoring the MLOps pipeline.

PetaLinux supports Python 3 via Yocto packages (python3, python3-pip), enabling the installation of libraries such as TensorFlow, Keras, NumPy, and other dependencies required to run lightweight models on the ARM Cortex-A9 processor of the ZYNQ-7000 board.

However, MLflow is a relatively heavy application, relying on several dependencies such as SQLAlchemy, Flask, and Gunicorn. Considering that the Cortex-A9 in ZYNQ typically has only 512 MB to 1 GB of RAM, this can lead to memory overload, slowdowns, and even crashes.

In order to optimize performance, we installed a minimal version of MLflow using the command: pip install mlflow==2.2.2 --no-deps.

Furthermore, in view of the limitations of embedded storage on the board, artifact storage has been outsourced to an S3 instance (bucket) hosted in the cloud.

GitLab Runner is installed using the gitlab-runner-linux-arm binary, which is compatible with the ARM architecture of the ZYNQ board. The runner is configured in Shell mode, which has proven to be the most suitable for our embedded execution context.

For monitoring purposes, we installed Prometheus Node Exporter to collect system metrics from the board. In addition, Python scripts were developed to extract specific metrics related to the PetaLinux operating system and the machine learning model deployed on the board.

In view of the high memory consumption of the Prometheus web interface, it has not been activated. Instead, we have configured Grafana to display the collected metrics directly through dynamic dashboards (panels) optimized for real-time monitoring.

In this implementation, the ZYNQ-7000 board utilizes the Processing System (ARM Cortex-A9) to execute the complete MLOps software stack, including Python 3.9, MLflow 3.1.4, GitLab Runner 17, DVC 3, Prometheus 2, and Grafana 11. CNN model inference, camera communication management, and S3 storage are also handled by the PS. Programmable logic (PL, FPGA) is used for real-time control of the servo motor and LEDs via PYNQ, which provides a simplified Python interface for controlling FPGA hardware peripherals. Thus, PYNQ replaces direct programming of AXI GPIO IP cores while retaining hardware reactivity, while the PS centralizes software execution and data processing.

4. Results and Discussion

4.1. Model Performances

In order to evaluate the model’s performance, several standard classification metrics were used, including accuracy, precision, F1-score, recall, false positive rate (FPR), and false negative rate (FNR). These metrics are calculated from the confusion matrix, which consists of the following elements:

True Positive (TP): The model correctly predicts a positive class.
True Negative (TN): The model correctly predicts a negative class.
False Positive (FP): The model incorrectly predicts a positive class.
False Negative (FN): The model incorrectly predicts a negative class.

Accuracy: Proportion of correct predictions among all observations.

A c c u r a c y = T P + T N T P + T N + F P + F N

(2)

Precision: Measures the proportion of correct positive predictions.

P r e c i s i o n = T P T P + F P

(3)

Recall: Measures the model’s ability to correctly detect positive examples.

R e c a l l = T P T P + F N

(4)

False positive rate (FPR):

FPR = F P F P + T N

(5)

False negative rate (FNR):

FNR = F N F N + T P

(6)

F1-score: represents the harmonic average of precision and recall.

F 1 = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l

(7)

The TP (true positive), FP (false positive), TN (true negative), and FN (false negative) metrics are calculated using the one-vs-rest (OvR) approach. In this approach, each class is considered successively as the positive class, while the other classes are grouped together as negative classes.

The accuracy obtained after training the model on the test set is 96%, which shows that the model is capable of correctly classifying 96% of the images in the dataset, and that the model effectively learns the discriminating features of the faces present in the dataset.

Figure 12 illustrates the evolution of accuracy during the training and validation phases over different periods. We can see that training accuracy increases gradually, indicating that the model is gradually learning to recognize the characteristics of the images. Validation accuracy also follows a similar trend, showing that the model generalizes correctly on uncaptured data.

Figure 12 also shows the evolution of the loss function during the training and validation phases. We observe a rapid decrease in both training and validation loss during the first epochs, reflecting that the model quickly learns from the data. Validation loss follows a similar downward trend and remains close to the training loss, indicating that the model is effectively generalizing on unseen data.

In order to analyze the model’s performance in more detail, precision, recall, and F1-score metrics were calculated for each class (each person in the dataset). Table 5 shows the results obtained for each class.

Some classes perform better than others, this is linked to the similarities between some faces, which can make the classification task more complex.

In order to better understand the classification errors of the model, a confusion matrix is presented in Figure 13. This matrix allows us to visualize the number of correct and incorrect predictions for each class.

The values on the main diagonal represent correct predictions, while the off-diagonal values correspond to the few classification errors between different classes, reflecting the strong recognition capacity of the model. The matrix demonstrates the effectiveness of the proposed model in distinguishing between the five classes. Most classes, such as Evans, Hemsworth, and Ruffalo, are perfectly recognized with no errors, while Downey and Johansson each have a single misclassified sample. Specifically, one sample of Downey was predicted as Johansson, and one sample of Johansson was predicted as Downey. These rare errors can be attributed to similarities in facial features or variations in poses and lighting conditions within the dataset.

The confusion matrix also allows us to analyze different types of classification errors and identify cases that can be associated with false acceptance rates (FAR) and false rejection rates (FRR), which are important metrics in biometric recognition systems. The FAR corresponds to situations in which the system incorrectly accepts an incorrect identity, i.e., when a sample belonging to one class is classified as belonging to another class. Conversely, the FRR corresponds to cases where the system incorrectly rejects a correct identity, when an individual’s sample is not recognized as belonging to its actual class. In this context, values outside the main diagonal of the confusion matrix represent classification errors that can be interpreted as potential cases of false acceptance or false rejection.

In our study, the number of errors detected is relatively low compared to the total number of predictions, meaning that the model can effectively separate different classes. Although some minor confusion exists between certain visually similar classes, the confusion matrix shows a low overall error rate, suggesting relatively low FAR and FRR values for the proposed system.

4.2. Implementation Results

Our project proposes a complete MLOps pipeline architecture implemented both in the QEMU virtual environment and on the Zynq-7000 FPGA board. Firstly, the QEMU environment was essential for initial validation, allowing quick performance testing without requiring access to the physical FPGA board. Unfortunately, inference latency in QEMU did not fully reflect real hardware performance because no hardware acceleration was available. However, the deployment on the physical Zynq-7000 board provided lower latency, stable streaming via the USB camera, and higher performance reliability.

In the traditional approach, model deployment relies on a highly manual process involving several successive steps, including model retraining, model conversion, and redeployment on the hardware platform. As a result, the time between a data change and the deployment of a new model version can be relatively long, taking several hours or even days, depending on the complexity of the workflow.

In the proposed approach, the workflow is fully automated. By automating the training and deployment phases, the proposed MLOps pipeline architecture significantly reduces model update latency. In particular, the time required to retrain and redeploy the model after a dataset modification is reduced to approximately one hour, compared to several hours in a traditional workflow, as presented in Table 6. This improvement is mainly due to the reduction in human intervention and the automation of the various stages of the pipeline, which allows tasks to be performed more efficiently and continuously.

The CI/CD pipeline covers the entire model lifecycle, including data cleaning, training, artifact construction, testing, deployment, and monitoring. Figure 14 illustrates the successful execution of the CI/CD pipeline on GitLab, confirming the automated deployment of the workflow in both the test and production environments.

Real-time monitoring of the overall system using Grafana in Figure 15 indicates that the latest version of the deployed model achieves 96% accuracy, despite high data drift of 70%. We compute the data drift score by comparing the distribution of incoming input data with the training dataset. The score is normalized between 0 (no drift) and 1 (maximum drift). A threshold of 0.7 is used to trigger alerts, prompting data inspection and potential model retraining.

This confirms that Prometheus is correctly collecting metrics and effectively detecting changes in data distribution over time. In addition, the activation of red LEDs indicates that the new data comes from individuals with no access, which reinforces the importance of the integrated detection system. This type of view is perfect for successfully managing both the global system and the embedded model.

Figure 16 presents model tracking through the MLflow interface, which offers a clear overview of all experiments performed, their count, and all details associated with each run: dataset used, training duration, metrics obtained, and version of the model generated. This centralized view facilitates the comparison of runs, performance analysis, and the selection of the best version of the model to deploy.

The integration of an MLOps pipeline greatly reduced deployment time and ensured version traceability in both environments. Thanks to MLflow, each iteration of the model was tracked, allowing training metrics to be correlated with embedded performance. The results demonstrate that combining MLOps practices with FPGA deployment increases reliability, reproducibility, and maintainability—three aspects that are essential for AI systems at the edge.

Although FPGA platforms are known for low latency and energy efficiency, a direct comparison with CPUs, GPUs, or Jetson platforms is beyond the scope of this work. The main objective is to demonstrate the feasibility of deploying a complete MLOps pipeline on a resource-constrained FPGA system. The results obtained on the Zynq-7000 board confirm stable and near real-time performance. A detailed comparative evaluation will be considered in future work.

5. Conclusions

This paper demonstrated that the implementation of MLOps on a Zynq FPGA board offers several advantages for door security systems. On one hand, the adaptable architecture of the FPGA enables highly optimized parallel processing, providing low latency and improved energy efficiency compared to GPUs, making it an ideal solution for real-time embedded applications. On the other hand, MLOps integration automates the entire lifecycle of machine learning models, including training, deployment, and monitoring. This combination not only promotes continuous performance improvement but also enhances observability and fast adaptation to new challenges or conditions of use. The combined MLOps–FPGA Zynq approach enables the design of faster, more reliable, scalable, and energy-efficient access control solutions.

In this work, we used open source and lightweight tools that are widely adopted in the technology field and are well suited for an embedded MLOps architecture, along with one cloud instance for data storage and another cloud instance for the test environment. The PetaLinux and QEMU packages required to create a test environment are particularly large and require more than 50 GB of storage capacity. This requirement often blocks virtual machines hosted on local virtualization tools such as Hyper-V, as well as saturating the physical resources of the host machine. To overcome these limitations, the cloud was the perfect solution, as it offers increased flexibility by dynamically adjusting storage and computing resources according to deployment needs.

Future work could extend this MLOps architecture to other electronics platforms in order to optimize the lifecycle of models deployed on these embedded systems. It would also be interesting to explore the integration of additional DevOps tools, such as Terraform or Ansible, in order to automate the management of the cloud infrastructure dedicated to data storage and the deployment of associated services.

Author Contributions

Conceptualization, B.K. and M.M.; methodology, B.K., M.M. and R.E.G.; software, B.K. and M.M.; validation, M.M. and R.E.G.; formal analysis, M.M. and R.E.G.; investigation, B.K., M.M. and R.E.G.; resources, B.K. and M.M.; data curation, B.K.; writing—original draft preparation, B.K.; writing—review and editing, B.K. and M.M.; visualization, M.M.; supervision, M.M. and R.E.G.; project administration, R.E.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MLOps	Machine Learning Operation
ML	Machine Learning
CNN	Convolutional Neural Network
RFID	Radio Frequency Identification
DL	Deep learning
FC Layer	Fully Connected Layer
FPGA	Field-Programmable Gate Array

References

Emakpor, S.; Esekhaigbe, E. Development of an RFID-based security door system. J. Electr. Control Telecommun. Res. 2020, 1, 9–16. [Google Scholar] [CrossRef]
Tamanampudi, V.M. A Data-Driven Approach to Incident Management: Enhancing DevOps Operations with Machine Learning-Based Root Cause Analysis. Distrib. Learn. Broad Appl. Sci. Res. 2020, 6, 419–466. [Google Scholar]
Tatineni, S.; Katari, A. Advanced AI-Driven Techniques for Integrating DevOps and MLOps: Enhancing Continuous Integration, Deployment, and Monitoring in Machine Learning Projects. J. Sci. Technol. 2021, 2, 68–98. [Google Scholar]
Rzig, D.E.; Hassan, F.; Kessentini, M. An empirical study on ML DevOps adoption trends, efforts, and benefits analysis. Inf. Softw. Technol. 2022, 152, 107037. [Google Scholar] [CrossRef]
Bharath, C.V.; Vamshi, B.M. DevOps in the Age of Machine Learning: Bridging the Gap Between Development and Data Science. Int. J. Mach. Learn. Res. Cybersecur. Artif. Intell. 2024, 15, 530–544. [Google Scholar]
Karamitsos, I.; Albarhami, S.; Apostolopoulos, C. Applying DevOps Practices of Continuous Automation for Machine Learning. Information 2020, 11, 363. [Google Scholar] [CrossRef]
Tatineni, S.; Rodwal, A. Leveraging AI for Seamless Integration of DevOps and MLOps: Techniques for Automated Testing, Continuous Delivery, and Model Governance. J. Mach. Learn. Pharm. Res. 2022, 2, 9–41. [Google Scholar]
Tamanampudi, V.M. Leveraging Machine Learning for Dynamic Resource Allocation in DevOps: A Scalable Approach to Managing Microservices Architectures. J. Sci. Technol. 2020, 1, 709–748. [Google Scholar]
Zafar, A. End-to-End MLOps in Financial Services: Resilient Machine Learning with Kubernetes. J. Big Data Smart Syst. 2020, 10, 1459–1467. Available online: https://www.ijsr.net/getabstract.php?paperid=SR221013093711 (accessed on 22 March 2025).
Kalaitzidis, E.; Diamantopoulos, T.; Michailoudis, A.; Symeonidis, A.L.; Kalaitzidis, E.; Diamantopoulos, T.; Michailoudis, A.; Symeonidis, A.L. AML4S: An AutoML Pipeline for Data Streams. Mach. Learn. Knowl. Extr. 2025, 7, 87. [Google Scholar] [CrossRef]
Martín, C.; Langendoerfer, P.; Zarrin, P.S.; Díaz, M.; Rubio, B. Kafka-ML: Connecting the data stream with ML/AI frameworks. Future Gener. Comput. Syst. 2022, 126, 15–33. [Google Scholar] [CrossRef]
Burgueño-Romero, A.M.; Barba-González, C.; Aldana-Montes, J.F. Big Data-driven MLOps workflow for annual high-resolution land cover classification models. Future Gener. Comput. Syst. 2025, 163, 107499. [Google Scholar] [CrossRef]
Metcalfe, B.; Acosta-Pavas, J.C.; Robles-Rodriguez, C.E.; Georgakilas, G.K.; Dalamagas, T.; Aceves-Lara, C.A.; Daboussi, F.; Koehorst, J.J.; Corrales, D.C. Towards a machine learning operations (MLOps) soft sensor for real-time predictions in industrial-scale fed-batch fermentation. Comput. Chem. Eng. 2025, 194, 108991. [Google Scholar] [CrossRef]
Ordóñez, S.A.C.; Samanta, J.; Suárez-Cetrulo, A.L.; Carbajo, R.S.; Ordóñez, S.A.C.; Samanta, J.; Suárez-Cetrulo, A.L.; Carbajo, R.S. Intelligent Edge Computing and Machine Learning: A Survey of Optimization and Applications. Future Internet 2025, 17, 417. [Google Scholar] [CrossRef]
Garroppo, R.G.; Giardina, P.G.; Landi, G.; Ruta, M.; Garroppo, R.G.; Giardina, P.G.; Landi, G.; Ruta, M. Trustworthy AI and Federated Learning for Intrusion Detection in 6G-Connected Smart Buildings. Future Internet 2025, 17, 191. [Google Scholar] [CrossRef]
Kouach, B.; Mekhfioui, M.; Mrabet, A.E.; Gouri, R.E. FPGA-Based Implementation of Convolutional Neural Networks for Enhanced Physical Security in Data Center Door Access Systems. Eng. Technol. Appl. Sci. Res. 2025, 15, 24807–24814. [Google Scholar] [CrossRef]
Bouchra, K.; Mohcin, M.; Rachid, E.G. Implementation of Deep Learning models on an FPGA board for voice classification. In Proceedings of the 2025 5th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), Fez, Morocco, 15–16 May 2025; pp. 1–5. [Google Scholar] [CrossRef]
Ngo, D.-M.; Lightbody, D.; Temko, A.; Pham-Quoc, C.; Tran, N.-T.; Murphy, C.C.; Popovici, E.; Ngo, D.-M.; Lightbody, D.; Temko, A.; et al. HH-NIDS: Heterogeneous Hardware-Based Network Intrusion Detection Framework for IoT Security. Future Internet 2022, 15, 9. [Google Scholar] [CrossRef]
Shen, C.-H.; Wu, Y.-H.; Chen, S.-J.; Yu, C.-Y. High-Speed-Recognition Artificial Intelligence Chip Based on ARM + FPGA Platform. Eng. Proc. 2025, 92, 33. [Google Scholar] [CrossRef]
Mhaouch, A.; Gtifa, W.; Machhout, M.; Mhaouch, A.; Gtifa, W.; Machhout, M. FPGA Hardware Acceleration of AI Models for Real-Time Breast Cancer Classification. AI 2025, 6, 76. [Google Scholar] [CrossRef]
Hassaan, Z.A.; Yacoub, M.H.; Said, L.A.; Hassaan, Z.A.; Yacoub, M.H.; Said, L.A. FPGA-Accelerated ESN with Chaos Training for Financial Time Series Prediction. Mach. Learn. Knowl. Extr. 2025, 7, 160. [Google Scholar] [CrossRef]
SoC AMD ZynqTM 7000. AMD. Available online: https://www.amd.com/fr/products/adaptive-socs-and-fpgas/soc/zynq-7000.html (accessed on 27 December 2025).
Kukkaro, A.; Moreschini, S.; Taibi, D.; Hastbacka, D. Federated Learning Mlops Tools. A Systematic Mapping Study. Soc. Sci. Res. Netw. Rochester 2025, 23. [Google Scholar] [CrossRef]
Amrit, C.; Narayanappa, A.K. An analysis of the challenges in the adoption of MLOps. J. Innov. Knowl. 2025, 10, 100637. [Google Scholar] [CrossRef]
Tekinerdogan, B. Machine Learning Product Line Engineering: A Systematic Reuse Framework. Mach. Learn. Knowl. Extr. 2025, 7, 58. [Google Scholar] [CrossRef]
MLflow. MLflow: A Tool for Managing the Machine Learning Lifecycle. Available online: https://mlflow.org/docs/latest/ml/ (accessed on 30 November 2025).
Aqsone. DVC: The Key Tool for Data Management in ML. Available online: https://www.aqsone.com/en/blog/dvc-the-key-tool-for-data-management-in-ml (accessed on 30 November 2025).
Hewage, N.; Meedeniya, D. Machine Learning Operations: A Survey on MLOps Tool Support. arXiv 2022, arXiv:2202.10169. [Google Scholar] [CrossRef]
The Complete Guide to Data Version Control with DVC. Available online: https://www.datacamp.com/tutorial/data-version-control-dvc (accessed on 30 November 2025).
GitLab Docs. Get Started with GitLab CI/CD. Available online: https://docs.gitlab.com/ci/ (accessed on 25 October 2025).
What is CI/CD? Available online: https://about.gitlab.com/topics/ci-cd/ (accessed on 30 November 2025).
What is a CI/CD Pipeline? Available online: https://about.gitlab.com/topics/ci-cd/cicd-pipeline/ (accessed on 30 November 2025).
Rey, J.-F. Environnement R & GitLab CI/CD Retour d’expérience. INRAE—Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement. 2021. Available online: https://hal.science/hal-03286897/file/GitLabCICD_R_RENCONTRESR2021.pdf (accessed on 30 November 2025).
Koppanati, P.K. Implementing Dynamic Environments with GitLab CI/CD. Eur. J. Adv. Eng. Technol. 2020, 9, 175–179. [Google Scholar]
Kouach, B.; Mekhfioui, M.; El Mrabet, A.; El Gouri, R. Implementing Machine Learning models in a Kubernetes cluster for a smart monitoring system. In Proceedings of the 2025 11th International Conference on Optimization and Applications (ICOA), Kenitra, Morocco, 16–17 October 2025; pp. 1–6. [Google Scholar] [CrossRef]
Karthick, G.; Mapp, G.; Crowcroft, J.; Karthick, G.; Mapp, G.; Crowcroft, J. Toward Secure SDN Infrastructure in Smart Cities: Kafka-Enabled Machine Learning Framework for Anomaly Detection. Future Internet 2025, 17, 415. [Google Scholar] [CrossRef]
R. W. S. Zynq-7000 HW-SW Co-Simulation QEMU-QuestaSim—REDS Blog. Available online: https://blog.reds.ch/?p=1180 (accessed on 6 November 2025).
Zhang, Q.; Huang, Z. Design and implementation of Linux system for SAR imaging equipment. In Proceedings of the 4th International Conference on Computer, Artificial Intelligence and Control Engineering, Hefei, China, 10–12 January 2025; pp. 398–402. [Google Scholar]
Klimes, M. Embedded Linux and PS/PL Communication on Zynq UltraScale + MPSoC. Master’s Thesis, Masaryk University, Brno, Czech Republic, 2025. [Google Scholar]
About QEMU—QEMU Documentation. Available online: https://www.qemu.org/docs/master/about/index.html (accessed on 19 August 2025).
Co-Simulation—Xilinx Wiki—Confluence. Available online: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/862421112/Co-simulation (accessed on 19 August 2025).
Yasser, H. Avengers Faces Dataset. Available online: https://www.kaggle.com/datasets/yasserh/avengers-faces-dataset (accessed on 14 December 2024).

Figure 1. The MLOps implementation architecture for deploying a CNN on an FPGA board.

Asi 09 00071 g001

Figure 2. Architecture of convolutional neural network.

Asi 09 00071 g002

Figure 3. 2D direct convolution.

Asi 09 00071 g003

Figure 4. Example of max and average pooling operation.

Asi 09 00071 g004

Figure 5. Fully connected layer.

Asi 09 00071 g005

Figure 6. The ZYNQ 7000 architecture.

Asi 09 00071 g006

Figure 7. The proposed MLOps pipeline.

Asi 09 00071 g007

Figure 8. ZYNQ embedded system pipeline.

Asi 09 00071 g008

Figure 9. Example of data augmentation applied to an image from the dataset.

Asi 09 00071 g009

Figure 10. Integration of MLOps in a QEMU environment.

Asi 09 00071 g010

Figure 11. Integration of MLOps in a ZYNQ-7000 board.

Asi 09 00071 g011

Figure 12. Evolution of training and validation accuracy and loss.

Asi 09 00071 g012

Figure 13. Confusion matrix.

Asi 09 00071 g013

Figure 14. GitLab repository architecture supporting the MLOps chain and reliable CI/CD pipeline execution.

Asi 09 00071 g014

Figure 15. Real time deployment of the MLOps pipeline on the Zynq-7000 FPGA with system monitoring.

Asi 09 00071 g015

Figure 16. Overview of MLflow experiment runs for the CNN face classification pipeline.

Asi 09 00071 g016

Table 1. The PS features of ZYNQ 7000.

Features	ZYNQ7000
Devices	Z-7010, Z-7015, Z-7020, Z-7030, Z-7035, Z-7045, Z-7100
Processor core	Arm Cortex-A9 MPCore double cœur
Maximum frequency	Up to 866 MHz and 1 GHz
External memory support	DDR3, DDR3L, DDR2, LPDDR2
Key peripherals	USB 2.0, Gigabit Ethernet, SD/SDIO
Dedicated peripheral pins	Up to 128

Table 2. The PL features of ZYNQ 7000.

Features	ZYNQ7000
Logic Cells (K)	28 and 444
Block RAM (Mb)	2.1 and 26.5
DSP Slices	80 and 2020
Maximum I/O Pins	100 and 400
Maximum Transceiver Count	4 and 16

Table 3. The best hyperparameters values for the FASTER R-CNN model.

Hyperparameter	Value
Filters_1	128
Kernel_size_1	5
Filters_2	128
Kernel_size_2	5
Filters_3	384
Kernel_size_3	3
Dropout_1	0.3
Dense_units	256
Dropout_2	0.5
Learning_rate	0.01
Training epochs	100

Table 4. ZYNQ 7020 board specifications.

Device	DSP Slices	Logic Cells	Look-Up Tables	Flip-Flops
ZYNQ 7020	220 programmable	85K	53,200	106,400

Table 5. Model performance per class.

Class	Precision	Recall	F1-Score
Chris Evans	0.97	0.97	0.97
Chris Hemsworth	0.96	0.96	0.96
Mark Ruffalo	0.95	0.97	0.96
Robert Downey Jr	0.97	0.96	0.96
Scarlett Johansson	0.96	0.95	0.96

Table 6. Model update latency comparison between MLOps-based and traditional approaches.

Approach	Model Update Process	Estimated Update Latency
Traditional workflow	Manual training and
deployment	7 h
Proposed MLOps pipeline	Automated training and deployment	1 h

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2026 by the authors. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.

Hacker Times