Position: PhD Candidate
Current Institution: University of Wisconsin, Madison
Abstract: Towards Truly Reliable Distributed Storage
Distributed storage systems have a simple goal: to reliably store and provide access to user data. However, realizing this apparently simple goal is fraught with challenges; for example, a power loss, a system crash, or a faulty storage device can cause distributed systems to lose data or become unavailable. To protect against failures, distributed storage systems traditionally have paid a cost: performance. With a careful understanding of how failures occur in reality, the seemingly conflicting goals of reliability and performance can be realized together. In my thesis, we take a two-pronged approach to improve the reliability of distributed storage systems. First, we build two reliability-testing frameworks to analyze the effects of faulty disks and crash failures; using these tools we examine ten systems (e.g., ZooKeeper Cassandra and Redis). Our analysis leads to new fundamental insights on how current reliability measures fall short. For example, most systems do not effectively use redundancy to recover from a local corruption: a single fault can cause disastrous outcomes such as data loss, silent corruption, and unavailability. Our tools have an immediate practical impact: many vulnerabilities exposed by our tools have been acknowledged and fixed by developers. Second, using these insights we develop protocol-aware recovery (PAR), a new technique that improves resiliency to faulty storage devices. A key aspect of PAR is that it is not specific to a system; rather, it exploits the properties of protocols common to many distributed systems. For instance, we apply PAR to two different systems that implement a replicated state machine by exploiting their common properties. We show that the PAR versions safely recover from storage faults and provide high availability while the unmodified versions can lose data or become unavailable; we also show that the PAR versions have little or no performance overhead.
Aishwarya Ganesan is a PhD candidate in Computer Sciences at the University of Wisconsin, Madison. She is advised by Professors Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau. Her research interests are in distributed systems storage and file systems and operating systems with a focus on improving the reliability of distributed systems without compromising their performance. Her research has been recognized with the best-paper award at FAST 18 and a best-paper-award nomination at FAST 17. Before starting her Ph.D. Aishwarya was a Research Fellow for two years at Microsoft Research India in the Mobility Networks and Systems group. At MSR she built systems that leverage smart glasses to enable new applications such as physical analytics and near-vision communication. She completed her master’s degree from the Indian Institute of Technology – Bombay and her bachelor’s degree from Coimbatore Institute of Technology.