Improving Operating System Reliability Researchers have shown that operating system (OS) reliability can be improved by structuring the OS as a collection of components protected by techniques that limit error propagation. This approach is adopted by microkernel systems such as Minix3 and L4. When an error occurs in a component providing an OS service, the suggested recovery strategy is to restart the service in order to restore it to a correct state. A restart, unfortunately, affects all applications that are currently using the service. This is due to the loss of state information maintained within the service. CuriOS represents a novel OS design that uses lightweight partitioning and distribution of OS service state to mitigate the problem of state loss during a restart. Services are encapsulated in separate protection domains and access to state information is carefully assigned on a need-to-know basis. Error propagation within application-related state maintained by the service is also significantly reduced. This state management approach has been implemented for several CuriOS services. Experiments show that it is possible to recover from 87-100% of all manifested errors while maintaining low performance overheads.