Delay in API response time
Incident Report for Telnyx
Postmortem

Summary

On May 3, 2019 starting at 11:00 UTC, we experienced delayed response times to api.telnyx.com.

On May 8, 2019 starting at 20:00 UTC, we experienced additional response time delays.

Between April 22 to the present, we have experienced momentary spikes in api response times and increased 5XX errors, including:

  • Thursday, April 25

  • Wednesday, May 8

  • Friday, May 10

  • Friday, May 13

Impact

Customers sending calls to api.telnyx.com experienced increased response times or 500 errors.

Root Cause

  • Increased transit of API requests

  • Increased API call queuing and timeouts

Timeline (Central Time)

March and April: Telnyx observes instability in Google Cloud Central, which is where some of our infrastructure is located. This results in delayed 200 OK responses and 5XX errors with Telnyx services such as Call Control and Mission Control.

May 3rd, 20:00 UTC: In an effort to mitigate the risk of additional Google Cloud Central outages, Telnyx deploys two additional instances of its API Gateway in two different cloud providers in two different regions. API calls are now routed round robin to the four different API Gateway instances in multiple regions and cloud providers. Because of this, those calls that are traveling to the new instances inherently take longer to reach the API Gateway.

May 8th, 11:00 UTC: Telnyx migrates database masters from central to east. For API calls requiring database look-ups, there is an increased latency in communication between Central and East.

Action Items

  • Bypass Telnyx Legacy Edge Stack for latency-sensitive API commands, such as call control

  • Update edge proxy configuration to allow for increased API traffic

  • Enable region-based service-to-service interactions for API-based services

  • Add Call Control API Response times to status.telnyx.com

Posted May 14, 2019 - 18:55 UTC

Resolved
This incident has been resolved.
Posted May 10, 2019 - 22:41 UTC
Identified
We have identified an issue with our API response time and are currently investigating. We will provide updates shortly.
Posted May 10, 2019 - 22:04 UTC
This incident affected: Mission Control API (US East Region, US Central Region, US West Region).