Hardware-Accelerated AI for Windows Apps Using ONNX RT

By MLM Team on November 23, 2023 in Partners 0

ONNX Runtime Support in the Qualcomm AI Stack

The Qualcomm AI Stack, shown in Figure 1 below, provides the tools and runtimes to take advantage of the NPU at the edge:

Figure 1 – The Qualcomm AI Stack provides hardware and software components for AI at the edge across all Snapdragon platforms.

At the highest level of the stack sits popular AI frameworks for generating models. These models can then be executed on various AI runtimes including ONNX RT. ONNX RT includes an Execution Provider that uses the Qualcomm AI Engine Direct SDK bare-metal inference on Snapdragon various cores including its Hexagon NPU. Figure 2 shows a more detailed view of the Qualcomm AI Stack components:

Figure 2 – Overview of the Qualcomm AI Stack including its runtime framework support and backend libraries.

Application-level Integration

At the application level, developers can compile their applications for ONNX runtime built with support for Qualcomm AI Engine Direct SDK. ONNX RT’s Execution Provider constructs a graph from an ONNX model for execution on a supported backend library.

Developers can use the ONNX runtime API’s that provides a consistent interface across all Execution Providers. It is also designed to support various programming languages like Python, C/C++/C#, Java, and Node.js.

We offer two options to generate context binaries. One way is to use the Qualcomm AI Engine Direct tool chain. Alternatively, developers can generate the binary using ONNX RT EP, which in turn uses the Qualcomm AI Engine Direct API’s. The context binary files help applications reduce the compile time for networks. These are created when the app runs for the first time. On subsequent runs, the model loads from the cached context binary file.