Degradation while trying to refresh access tokens
Incident Report for Atlassian Developer
Postmortem

SUMMARY

On February 28th, 2022, between 12:30AM and 1:30AM UTC, and on March 7th, 2022, 12:26AM UTC and 12:42AM UTC, Atlassian customers using Jira native apps and apps developed by partners were unable to perform access token refreshes. The event was triggered by a faulty deployment of an OAuth2 service in the Atlassian Identity Platform. The incident was detected within 1 minute by automated monitoring and mitigated by a rollback of the offending service, which put Atlassian systems into a known good state. The total time to resolution was about 42 minutes.

The two incidents are tracked here and this incident review applies to both:

IMPACT

Client requests against the /oauth/token endpoint expecting a GZIP’d response body received an unexpected response and so would summarily fail on their refresh grant flows.

ROOT CAUSE

A build of an OAuth2 service with broken content negotiation was autodeployed to production. This build exhibited broken compression/decompression in content negotiation while performing refresh_token grant flows. Any request that expected a GZIP’d response body against the /oauth/token endpoint in the context of refreshing an access token would have failed due to the server not honoring the request by sending a compressed response.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We are prioritizing the following improvement actions to avoid repeating this type of incident:

  • Code fix to the offending service

Furthermore, we deploy our changes progressively to avoid broad impact but in this case, our continuous tests did not work as expected. To minimise the impact of breaking changes to our environments, we will implement additional preventative measures such as:

  • The offending codepath will be guarded by a feature flag
  • A continuous test will ensure correct content negotiation of compressed payloads in all environments 
  • Information gathering about what was received by clients and how they reacted

 

We apologize to customers and partners whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.

Posted Mar 21, 2022 - 22:41 UTC

Resolved
Between 11:26AM AEDT and 11:42AM AEDT on March 7th 2022 our OAuth services experienced degradation resulting in empty refresh tokens being obtained upon access token exchange.
The tokens are invalid and are unable to be used in future exchanges, affected apps and users should re-start the authorization flow.
Posted Mar 08, 2022 - 23:43 UTC
This incident affected: Authentication and user management.