On November 2, 2018 starting at 19:44 UTC / 14:44 CST, we experienced a disruption to the server running one instance of our telephony services. This disruption caused all calls anchored on one of our San Jose servers to be disconnected.
Telnyx has a number of High Availability mechanisms in place for the Telnyx Telephony Engine, including:
Telnyx currently lacks an active call recovery mechanism when the server upon which a given set of Telnyx Telephony Engine applications crashes or otherwise becomes unresponsive. This is what happened in the case of this incident.
Approximately 500 customer calls were disconnected.
The docker-engine process segfaulted at 19:44:35 UTC and recovered at 19:44:38 UTC.