Past few decades have seen unprecedented growth in the information processing capabilities of electronic systems such as desktops, laptops, mobiles phone etc. This emergence of advanced data processing systems has revolutionized several industries and has led to the availability of vast amounts of data. Recent advances in machine learning and big data explore ways of deriving useful conclusions from the available data but at a significant cost in silicon. Hence, it has now become crucial to ask, “what is the best way to build information processing systems for the future?”. This session invites researchers working on addressing various aspects of this question, including but not limited to, advances in state-of-the-art digital and analog CMOS-based designs, advances in state-of-the-art computer architectures and compilers, ways of addressing challenges such as high device variability and leakage power, alternative computing paradigms such as bio-neuro-inspired computing or computing using beyond-CMOS devices, alternative storage paradigms such as in-memory computers, novel memories such as RRAM or MRAM etc.
Energy-Efficient Edge Computing for AI-driven Applications
February 22, 9:00-9:50, CSL B02
Edge computing near the sensor is preferred over the cloud due to privacy or latency concerns for a wide range of applications including robotics/drones, self-driving cars, smart Internet of Things, and portable/wearable electronics. However, at the sensor there are often stringent constraints on energy consumption and cost in addition to throughput and accuracy requirements. In this talk, we will describe how joint algorithm and hardware design can be used to reduce energy consumption while delivering real-time and robust performance for applications including deep learning, computer vision, autonomous navigation and video/image processing. We will show how energy-efficient techniques that exploit correlation and sparsity to reduce compute, data movement and storage costs can be applied to various AI tasks including object detection, image classification, depth estimation, super-resolution, localization and mapping.
Vivienne Sze is an Associate Professor at MIT in the Electrical Engineering and Computer Science Department. Her research interests include energy-aware signal processing algorithms, and low-power circuit and system design for multimedia applications such as computer vision, autonomous navigation, machine learning and video compression. Prior to joining MIT, she was a Member of Technical Staff in the R&D Center at TI, where she developed algorithms and hardware for the latest video coding standard H.265/HEVC. She is a co-editor of the book entitled “High Efficiency Video Coding (HEVC): Algorithms and Architectures” (Springer, 2014).
Dr. Sze received the B.A.Sc. degree from the University of Toronto in 2004, and the S.M. and Ph.D. degree from MIT in 2006 and 2010, respectively. In 2011, she was awarded the Jin-Au Kong Outstanding Doctoral Thesis Prize in electrical engineering at MIT for her thesis on “Parallel Algorithms and Architectures for Low Power Video Decoding”. She is a recipient of the 2017 Qualcomm Faculty Award, 2016 Google Faculty Research Award, 2016 AFOSR Young Investigator Award, 2016 3M Non-tenured Faculty Award, 2014 DARPA Young Faculty Award, 2007 DAC/ISSCC Student Design Contest Award and a co-recipient of the 2016 MICRO Top Picks Award and 2008 A-SSCC Outstanding Design Award.
More information about our research in the Energy-Efficient Multimedia Systems group can be found at: http://www.rle.mit.edu/eems/
Mochamad Asri, University of Texas at Austin
A Preliminary Study on Integration Tradeoffs of Hardware Accelerators for High-Performance Computing
February 22, 10:00-10:15, CSL B02
Integrating Application Specific Integrated Circuits (ASICs) with host CPUs or GPUs at different levels of the memory hierarchy has emerged as a possible solution for power-efficient high-end computing. Typical high-performance computing (HPC) applications are composed of different kernels executing on CPUs, GPUs and accelerators, where often large amounts of data are exchanged between kernels and components. Optimizing the data movement and hardware/software coupling between host CPUs and accelerators can significantly impact acceleration benefits, but quantifying detailed and non-obvious tradeoffs of different architecture and algorithm choices at the level of complete HPC systems and applications has been lacking.
Mochamad Asri received his bachelor degree from Tokyo Institute of Technology. He is currently pursuing his Ph.D degree in Electrical and Computer Engineering at The University of Texas at Austin. His research interests include High-Performance Architecture, MicroArchitecture, and HW-SW co-design.
Robust in-memory classifier via Stochastic Gradient Descent
February 22, 10:15-10:30, CSL B02
Embedded sensory systems continuously acquire and process data for inference and decision making purposes under stringent energy constraints. Such ‘always ON’ systems need to track changing data statistics and environmental conditions such as temperature with minimal energy consumption. Digital inference architectures are not well-suited for such energy-constrained sensory systems due to their high energy consumption, which is dominated by the energy cost of memory read accesses and digital computations (>75%). In-memory architectures significantly reduce the energy cost by embedding pitch-matched analog computations in the periphery of the SRAM bit-cell array (BCA). However, their analog nature combined with stringent area constraints makes these architectures susceptible to process, voltage, and temperature (PVT) variations. This work demonstrates the use of on-chip SGD-based training to compensate for variations both in PVT and data statistics to design a robust in-memory classifier.
Sujan Gonugondla received his Bachelors and Masters in Technology in Electrical Engineering from Indian Institute of Technology Madras at 2014. He is currently pursuing his doctoral degree in Electrical and Computer engineering from University of Illinois at Urbana-Champaign under the guidance of Prof. Naresh Shanbhag. His research interests are in the areas of machine learning hardware, in-memory computing and analog signal processing.
DtCraft: A High-performance Distributed Execution Engine at Scale
February 22, 10:35-10:50, CSL B02
Today, cloud vendors such as IBM, Amazon, and Google have reduced the hardware factor. We now have easy access to clusters without the need to build our own hardware infrastructure. Creating software that readily leverages these platforms has become a critical challenge. However, cluster programming is notoriously difficult due to subtly error-prone instructions for message passing. In this talk I will present DtCraft, a cluster programming system that efficiently streamlines the building of high-performance parallel and distributed applications. With DtCraft, users can focus their efforts on application-level developments without being impeded by the details of distributed computing. We have successfully applied DtCraft to machine learning and large-scale VLSI design automation. Relative to existing cluster computing frameworks, DtCraft can speed up the machine learning workload by 10-20x. In an example semiconductor timing analysis problem, DtCraft achieved 30x speed-up and 17x fewer lines of coding on a 40-node Amazon cluster compared to a handcrafted implementation. The potential productivity gain is tremendous.
Tsung-Wei Huang received his PhD degree from the ECE Department at the University of Illinois at Urbana-Champaign (UIUC). During his PhD, he developed OpenTimer, an open-source timing analysis tool that has successfully helped many circuit designers and startups analyze the timing of their designs. He also built DtCraft, a modern C++17 based distributed execution engine to help simplify the programming for computer clusters.
Multichannel Signal Processing for Augmented Listening Devices
February 22, 10:50-11:05, CSL B02
Augmented listening devices, such as hearing aids, personal sound amplifiers, and an emerging category of “smart” headphones, promise to enhance human listening by processing the sound we hear to reduce noise and improve understanding. Current state-of-the-art listening devices struggle, however, in noisy and reverberant environments. Large microphone arrays, made possible by recent advances in embedded sensing technology, could dramatically improve the performance of listening technology by allowing devices to separate, process, and enhance multiple sound sources in real time. In this talk, I will discuss the potential benefits and the design challenges of wearable microphone arrays and multichannel augmented listening devices.
Ryan Corey is a PhD student working with Professor Andrew Singer in CSL. A Chicago-area native, Ryan completed his Bachelor’s degree in Electrical Engineering at Princeton University, then returned to Illinois for graduate school. In 2014, he was awarded the National Science Foundation Graduate Research Fellowship. Ryan studies speech and audio signal processing with a particular focus on multichannel processing for hearing aid applications. When he’s not playing with microphones, Ryan enjoys travel, cooking, and silly hats.
[Best IPIS Student Presentation] A Bio-Inspired 27-Band Hyperspectral Imager Integrated with an Off-Chip Programmable Fabric
February 22, 11:10-11:25, CSL B02
As a sensing modality that permits the measurement of the electromagnetic spectrum at every point in a scene, hyperspectral imaging has emerged as a promising technology for capturing information about both the structural and chemical composition of objects. While engineered hyperspectral imagers have incorporated complex optomechanical architectures or computational strategies that inhibit constrained sensor systems, biology has given rise to elegant and efficient vision systems that offer superior performance in acquiring and processing spectral information. Inspired by the compound eye of the mantis shrimp, we have developed a hyperspectral imager that integrates pixelated interference filters with a stacked photodiode imaging sensor to expose 27 spectral channels from ~450 nm to ~750 nm. Coupling this hyperspectral imager with an off-chip programmable fabric, signal processing and machine learning algorithms can operate directly on these channels to provide a low latency, low power sensor system that enables diverse applications from agriculture to medicine.
Steven Blair is a Ph.D. student in the Department of Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign and is pursuing a specialization in biomedical imaging under Dr. Viktor Gruev. He received Bachelor of Science degrees in electrical engineering and computer engineering from Southern Illinois University Carbondale in May 2016. His research seeks to integrate findings from across science and engineering to enable novel spectral imaging systems, and his current work focuses on the development of a hyperspectral imager inspired by the mantis shrimp and applications to classification problems in the fields of agriculture and medicine.
Optimal Statistical Error Compensation and its Application to Energy-efficient Beyond-CMOS Systems
February 22, 11:25-11:40, CSL B02
The von Neumann architecture requires CMOS transistors to operate under practically error free conditions with 1 error in 1015 switching events. Moore’s law based scaling achieved CMOS energy and delay reduction, while maintaining its very high switching accuracy. Today, the energy and delay reductions from scaling have stagnated and the impact of nanoscale device process variations on the system-level functionality has tremendously increased. Thus, this strict requirement of practically error free switching events for von Neumann computing is becoming very expensive. Even spin logic devices, one of the most promising beyond-CMOS devices, need to operate at the switching error rates of 1% in order to be competitive to CMOS, rendering them incompatible with deterministic nature of the von Neumann architecture, despite their advantages such as non-volatility, higher logic density. In this work, we propose a novel statistical error compensation technique, referred to as TreeCompensator, to compensate for the inherent device-level process variations and switching errors. We show that the TreeCompensator computes the maximum a-posteriori (MAP) estimate of the correct output in presence of device-level switching errors. We demonstrate its effectiveness to enhance the error resiliency of the spin-based Boolean implementations. We propose novel architectural-level techniques to shape the error distribution at the output of any given spin-based Boolean implementation, enabling low-complexity error compensation. Our proposed approach achieves higher device error rate tolerance compared to conventional spin-based designs for 120-dimensional linear support vector machine (SVM) classifier implementation.
Ameya Patil received the B.Tech. degree from the Department of Electrical Engineering, Indian Institute of Technology Hyderabad, Hyderabad, India, in 2014, and the M.S. degree from the Department of Electrical and Computer Engineering, University of Illinois at Urbana–Champaign, Champaign, IL, USA, in 2016, where he is currently pursuing the Ph.D. degree. His current research interests include the intersection of machine learning, circuits, and architecture. Mr. Patil was a recipient of the Joan and Lalit Bahl Fellowship from the ECE Department, UIUC, in 2015–2016 and 2016–2017.