With the end of Dennard scaling and Moore’s law, IC chips, especially large-scale ones, now face more reliability challenges, and reliability has become one of the mainstay merits of VLSI designs. In this context, this book presents a built-in on-chip fault-tolerant computing paradigm that seeks to combine fault detection, fault diagnosis, and error recovery in large-scale VLSI design in a unified manner so as to minimize resource overhead and performance penalties. Following this computing paradigm, we propose a holistic solution based on three key components: self-test, self-diagnosis and self-repair, or “3S” for short. We then explore the use of 3S for general IC designs, general-purpose processors, network-on-chip (NoC) and deep learning accelerators, and present prototypes to demonstrate how 3S responds to in-field silicon degradation and recovery under various runtime faults caused by aging, process variations, or radical particles. Moreover, we demonstrate that 3S not only offers a powerful backbone for various on-chip fault-tolerant designs and implementations, but also has farther-reaching implications such as maintaining graceful performance degradation, mitigating the impact of verification blind spots, and improving chip yield. This book is the outcome of extensive fault-tolerant computing research pursued at the State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences over the past decade. The proposed built-in on-chip fault-tolerant computing paradigm has been verified in a broad range of scenarios, from small processors in satellite computers to large processors in HPCs. Hopefully, it will provide an alternative yet effective solution to the growing reliability challenges for large-scale VLSI designs.
With the end of Dennard scaling and Moore’s law, IC chips, especially large-scale ones, now face more reliability challenges, and reliability has become one of the mainstay merits of VLSI designs. In this context, this book presents a built-in on-chip fault-tolerant computing paradigm that seeks to combine fault detection, fault diagnosis, and error recovery in large-scale VLSI design in a unified manner so as to minimize resource overhead and performance penalties. Following this computing paradigm, we propose a holistic solution based on three key components: self-test, self-diagnosis and self-repair, or “3S” for short. We then explore the use of 3S for general IC designs, general-purpose processors, network-on-chip (NoC) and deep learning accelerators, and present prototypes to demonstrate how 3S responds to in-field silicon degradation and recovery under various runtime faults caused by aging, process variations, or radical particles. Moreover, we demonstrate that 3S not only offers a powerful backbone for various on-chip fault-tolerant designs and implementations, but also has farther-reaching implications such as maintaining graceful performance degradation, mitigating the impact of verification blind spots, and improving chip yield. This book is the outcome of extensive fault-tolerant computing research pursued at the State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences over the past decade. The proposed built-in on-chip fault-tolerant computing paradigm has been verified in a broad range of scenarios, from small processors in satellite computers to large processors in HPCs. Hopefully, it will provide an alternative yet effective solution to the growing reliability challenges for large-scale VLSI designs.
Content distribution, i.e., distributing digital content from one node to another node or multiple nodes, is the most fundamental function of the Internet. Since Amazon’s launch of EC2 in 2006 and Apple’s release of the iPhone in 2007, Internet content distribution has shown a strong trend toward polarization. On the one hand, considerable investments have been made in creating heavyweight, integrated data centers (“heavy-cloud”) all over the world, in order to achieve economies of scale and high flexibility/efficiency of content distribution. On the other hand, end-user devices (“light-end”) have become increasingly lightweight, mobile and heterogeneous, creating new demands concerning traffic usage, energy consumption, bandwidth, latency, reliability, and/or the security of content distribution. Based on comprehensive real-world measurements at scale, we observe that existing content distribution techniques often perform poorly under the abovementioned new circumstances. Motivated by the trend of “heavy-cloud vs. light-end,” this book is dedicated to uncovering the root causes of today’s mobile networking problems and designing innovative cloud-based solutions to practically address such problems. Our work has produced not only academic papers published in prestigious conference proceedings like SIGCOMM, NSDI, MobiCom and MobiSys, but also concrete effects on industrial systems such as Xiaomi Mobile, MIUI OS, Tencent App Store, Baidu PhoneGuard, and WiFi.com. A series of practical takeaways and easy-to-follow testimonials are provided to researchers and practitioners working in mobile networking and cloud computing. In addition, we have released as much code and data used in our research as possible to benefit the community.
Thank you for visiting our website. Would you like to provide feedback on how we could improve your experience?
This site does not use any third party cookies with one exception — it uses cookies from Google to deliver its services and to analyze traffic.Learn More.