Hailo-10H launch arrives as the edge AI accelerator market is projected to grow from $10.13 billion in 2025 to $113.71 billion by 2034, driven by the demand for privacy-first, low-latency AI processing
Yesterday, Israeli chipmaker Hailo Technologies announced the commercial availability of the Hailo-10H, the first discrete AI accelerator purpose-built for generative AI workloads at the edge. By running large language models (LLMs), vision-language models (VLMs), and other multi-modal AI directly on-device, the Hailo-10H eliminates the need for cloud-based inference while offering unmatched power efficiency and low latency.
“This is the first discrete AI processor to bring real generative AI performance to the edge,” said Orr Danon, CEO and co-founder of Hailo Technologies, speaking to All About Circuits. “We’ve combined high efficiency, cost-effectiveness, and a robust software ecosystem that developers can start using today.”
Generative AI Performance in a 2.5 W Power Envelope
The Hailo-10H is designed around Hailo’s second-generation neural core architecture, which provides 40 tera-operations per second (TOPS) of INT4 performance and 20 TOPS of INT8 at a typical power draw of 2.5 W.
According to Hailo, the chip achieves first-token generation in under one second and sustains 10 tokens per second on 2-billion parameter LLMs. It can also generate images with Stable Diffusion 2.1 in under five seconds, demonstrating a major leap forward for offline generative workloads.
The Hailo-10H Edge AI accelerator
Unlike traditional edge accelerators focused primarily on vision tasks, the Hailo-10H’s architecture includes a direct DDR interface for scaling larger models, which addresses one of the key bottlenecks in LLM and VLM inference at the edge. The chip is fully compatible with TensorFlow, PyTorch, ONNX, and Keras, and is supported by Hailo’s mature software stack: A platform already used by over 10,000 active developers each month.
Hailo AI software suite
By processing data locally, the Hailo-10H significantly reduces latency and bandwidth consumption, while also mitigating privacy risks. “Edge AI must deliver cloud-level intelligence without compromising real-time responsiveness or data privacy,” noted Danon.
“With the Hailo-10H, sensitive information stays on-device, and developers can build products that don’t depend on unreliable network connections.”
Applications in Retail, Automotive, and Embedded Systems
The Hailo-10H targets retail, automotive, telecommunications, security, and personal computing markets. HP has already announced the adoption of the chip within its HP AI Accelerator M.2 Card, designed for point-of-sale and hospitality systems. This enables real-time fraud detection, customer personalization, and local AI-powered assistants without the recurring costs of cloud services.
Automotive applications are another core focus. The Hailo-10H is AEC-Q100 Grade 2 qualified and is scheduled for 2026 production within cockpit displays, driver monitoring, and in-vehicle infotainment systems.
With its ability to handle multi-modal workloads that combine voice and vision, the chip enables natural-language interfaces and AI-driven assistance in environments where hands-free interaction is essential.
“We are seeing strong interest from carmakers who want to add generative AI to the user experience,” Danon explained.
“Voice-based interfaces, paired with vision processing, offer a much safer and more intuitive way for drivers to interact with vehicles.”
Hybrid AI Workflows for the Edge
While the Hailo-10H highlights generative AI, the device is designed to work in hybrid AI pipelines that blend LLMs or VLMs with more traditional convolutional neural networks (CNNs). Generative models may handle contextual tasks like summarization or translation, while CNNs manage real-time frame detection or event triggers. This hybrid approach not only conserves power but also ensures real-time responsiveness for mission-critical applications like video analytics.
Danon noted that customers have even begun replacing GPUs for certain workloads, citing the Hailo-10H’s performance-per-watt advantage and lower cost.
“For many edge use cases, GPUs are overkill,” Danon said. “With our device, developers can achieve both cost efficiency and the ability to run complex generative models where they’re needed.”
All images used courtesy of Hailo Technologies.