Category Archives: HPC

ALveo u280 from XUP

We have just received two Xilinx Alveo U280 as a donation from the Xilinx University Program.

Many thanks to Xilinx for this generous donation!

These two cards, together with the Alveo U50 we bought earlier will be the foundation for the FPGA accelerated computing infrastructure we are designing within the Hardware Acceleration Lab.

Two most-advanced accelerator cards will allow to experiment with our Lattice QCD kernels in a multi-node environment, where the networking and efficient kernel-to-kernel communication over PCIe bus and QSFP+ will be key elements.

First alveo up and running

Couple of weeks ago we have acquired our first Xilinx Alveo U50.

Turns out that setting it up and running first bitfiles easy, especially using Alveo-Pynq and Jupyter notebooks!

It’s only about 15 lines of code to access your accelerator card, configure with a bitfile, allocate memory, transfer data and run the kernel.

Below you can find all links required to configure and operate Alveo cards:

  • Xilinx Runtime and Deployment Target Platform: [link]
  • Alveo-PYNQ: [link]

Accelerating HPC

It took some time, a lot of research and development to make this very important step!

Trying to employ latest technologies, we have implemented, compiled and successfully run the accelerated Conjugate Gradient solver on Alveo U280 (shell 2019.2) using Vitis, both officially released just couple days ago.

In this design we are evaluating some of techniques like:

  • Integrated HBM memory
  • Fully streamlined kernel

We have managed to fit 3 instances of the kernel into the device, consuming about 70% of available resources.

Each kernel instance works with Iteration Interval of 2 clock cycles at 300 MHz, that gives almost 600 GFLOPs for the entire solution!

New Paper – investigating the dirac operator evaluation on FPGAs

We are pleased to announce that our paper “Investigating the Dirac operator evaluation on FPGAs”, where we describe our research on running accelerated computations on hardware has been published in Supercomputing Frontiers and Innovations vol. 6 no. 2 2019.

Feel free to check it out any time under this [link].

You can find there a description of our kernel performance, evaluated on Xilinx Alveo U250 platforms and developed with SDAccel software package.

Soon we will present a detailed study of various algorithm architectures in order to achieve highest performance and profit from embedded HBM in Alveo U280.

Supercomputing Frontiers Europe 2019

Our recent research results of implementation of Conjugate Gradient as benchmark for HPC solutions was presented during Supercomputing Frontiers Europe 2019 conference in Warsaw 11 – 13 March.

For more details about the conference click [here]

Click [here] to access the presentation.

The talk covers:

  • Implementation of Conjugate Gradient computing kernel with Vivado HLS for Xilinx Alveo U250 platform
  • System design and performance results with the external DDR memory
  • System design and performance results with the embedded memory block

Seminar – Using FPGA devices for Lattice QCD

You are welcome to join the seminar by dr. Piotr Korcyl on Tuesday 15th January 2019 at 12:15 in room D-2-02 at the Faculty of Physics, Astronomy and Applied Computer Science of Jagiellonian University.

The talk will cover:

  • implementation of Conjugate Gradient algorithm on Xilinx Zynq MPSoC
  • design methodologies to accelerate computations on FPGA platforms using High Level Synthesis
  • memory management and data transport infrastructure
  • prospects to design an FPGA High Performance Computing platform.

Conjugate Gradient as benchmark for FPGAs in HPC

During the International Conference on Lattice Field Theory in East Lansing, MI, USA we presented a poster describing our hardware based accelerator for the Dirac matrix inverter. For the first time FPGA devices were shown to be useful in the HPC context discussed at this conference. Several groups expressed their interest in collaboration including the groups from: Michigan State University, Massachusetts Institute of Technology, Brookhaven National Laboratory and China Normal University.

Presented results together with an overview of further development will soon be published. In the meantime you can check out the poster in the results section.

HW-based Conjugate Gradient on Conference

Our innovative solution for accelerating the Conjugate Gradient algorithm in Lattice Quantum Chromodynamics has been accepted for a poster presentation during the 36th Annual International Symposium on Lattice Field Theory in East Lansing in USA.

We have developed an accelerator capable of performing double precision computations with peak performance at the level of 750 GFLOPS, entirely implemented in Programmable Logic. It is a unique project of this type and sets an entry point for the development of a distributed and scalable High-Performance Computing platform.

Lattice Quantum Chromodynamics – Accelerated

We have managed to evaluate first implementations of the Conjugate Gradient algorithm – an iterative solver for sets of linear equations used in Monte Carlo simulations on the Zynq MPSoC device.

Monte Carlo based algorithms are commonly used in theoretical physics. Such simulations are run on supercomputers, members of our team are using those located in Juelich Forschungszentrum Germany and ICM in Poland, both ranked in top 500 list (positions 22 and 223 respectively).

Those computing facilities are mainly empolying Intel Xeon CPUs, how about accelerating those computations with FPGAs? Or even replace with new platforms featuring FPGAs and MPSoCs? Enourmous amount of resources in currently available series, support from integrated ARM processors and soon available 3D ICs with integrated High Bandwidth Memory are even more convincing.

Our first results are promissing and show great potential of this technology. First shot? About 150x acceleration factor over single ARM core.

Soon we will present more detailed study.