Posts by Collection

publications

MLPerf Inference Benchmark

Published in arXiv, 2019

Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and four orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf implements a set of rules and practices to ensure comparability across systems with wildly differing architectures. In this paper, we present the method and design principles of the initial MLPerf Inference release. The first call for submissions garnered more than 600 inference-performance measurements from 14 organizations, representing over 30 systems that show a range of capabilities.

Download here

talks

Intel Spoken Language Technologies Summit (iSLTS) 2019 Keynote

Published: October 23, 2019

Recurrent neural networks (RNNs), including workloads like recommender systems, machine translation, speech synthesis and speech transcription, form a significant proportion of data center deep learning inference. Productionized versions of these models typically contain tens to hundreds of millions of parameters but some have been scaled to billions of parameters given enough data. Increasing the size of a model also increases its compute and memory requirements. Reducing the computational cost of these models translates directly to cost and energy savings for service operators.

Xilinx Developer Forum (XDF) 2019 Europe

Published: November 13, 2019

Myrtle.ai will describe how their RNN accelerator on an Alveo U250 outperforms alternatives in throughput, power and latency. They will describe how this has been achieved by exploiting unstructured sparsity and quantisation implemented on a scalable array of highly optimised MAU Accelerator™ cores. The results of a comparison using a DeepSpeech benchmark will demonstrate this FPGA advantage in a range of applications including speech to text transcription and time series analysis.

Sam Davis