Stabilizing Time-Adaptive Protocols
Shay Kutten Boaz Patt-Shamiry
Dept. of Industrial Engineering & Management College of Computer Science
The Technion Northeastern University
boaz@ccs.neu.edu
and
IBM T.J. Watson Research Center
kutten@ie.technion.ac.il
February 8, 1998
Abstract
We study the scenario where a transient batch of faults hit a minority of the nodes in a distributed
system by corrupting their state. We concentrate on the basic persistent bit problem, where the system is
required to maintain a 0/1 value in the face of transient failures by means of replication. We give an
algorithm to stabilize the value to a correct state quickly; that is, denoting the unknown number of faulty
nodes by f, our algorithm recovers the value of the bit at all nodes in O(f) time units for any f ! n=2,
where n is the number of all nodes. Moreover, complete state quiescence occurs in O(diam) time units, where
diam denotes the actual diameter of the network. This means that the value persists indefinitely so long as
any f ! n=2 faults are followed by \Omega (diam) fault-free time units. (Strict self-stabilization requires
recovery for f * n=2 as well.) We prove matching lower bounds on both the output stabilization time and the
state quiescence time. Using our persistent bit algorithm, we present a transformer which takes a distributed
non-reactive non-stabilizing protocol P, and produces a protocol P0 which solves the problem P solves, with
the additional property that if a batch of faults changes the state of f ! n=2 of the nodes, then the output
is recovered in O(f) time units, and the state stabilizes in O(diam) time units. Our upper and lower bounds
are all proved in the synchronous network model.