AIgean, pronounced like the sea, is an open framework to build and deploy machine learning (ML) algorithms on a heterogeneous cluster of devices (CPUs and FPGAs). We leverage two open source projects: Galapagos, for multi-FPGA deployment, and hls4ml, for ...
INFerence-as-a-Service (INFaaS) has become a primary workload in the cloud. However, existing FPGA-based Deep Neural Network (DNN) accelerators are mainly optimized for the fastest speed of a single task, while the multi-tenancy of INFaaS has not been ...
Disaggregated computer architectures eliminate resource fragmentation in next-generation datacenters by enabling virtual machines to employ resources such as CPUs, memory, and accelerators that are physically located on different servers. While this paves ...
Machine learning in the physical layer of communication systems holds the potential to improve performance and simplify design methodology. Many algorithms have been proposed; however, the model complexity is often unfeasible for real-time deployment. The ...
Robotics applications process large amounts of data in real time and require compute platforms that provide high performance and energy efficiency. FPGAs are well suited for many of these applications, but there is a reluctance in the robotics community ...
Field-Programmable Gate Arrays (FPGAs) are widely used for custom hardware implementations, including in many security-sensitive industries, such as defense, communications, transportation, medical, and more. Compiling source hardware descriptions to FPGA ...
Multipliers are used in virtually all Digital Signal Processing (DSP) applications such as image and video processing. Multiplier efficiency has a direct impact on the overall performance of such applications, especially when real-time processing is ...
High-level synthesis (HLS) can be used to create hardware accelerators for compute-intense software parts such as loop structures. Usually, this process requires significant amount of user interaction to steer kernel selection and optimizations. This can ...
In today’s FPGA-based soft-processors, one of the slowest instructions is integer division. Compared to the low single-digit latency of other arithmetic operations, the fixed 32-cycle latency of radix-2 division is substantially longer. Given that today’s ...
In this article, we survey existing academic and commercial efforts to provide Field-Programmable Gate Array (FPGA) acceleration in datacenters and the cloud. The goal is a critical review of existing systems and a discussion of their evolution from ...