Atlassian Marketplace Outage
Incident Report for Atlassian Developer
Resolved
Marketplace has been stable for several days now and we are confident that everything is fully operational. The search issue that was reported earlier in the day was unrelated to any of this and was handled separately. We are now closing this incident as resolved.
Posted Mar 31, 2020 - 16:07 UTC
Update
We have now enabled Install/download counts in both marketplace.atlassian.com and in-product marketplace. We will continue to monitor the various systems involved and update as and when there is new information.
Posted Mar 30, 2020 - 13:48 UTC
Update
Marketplace site has continued to remain stable since the last update and system stability and performance metrics look healthy. Team has made some improvements to our deployment process which will help reduce the possibility of recurrence by limiting resource usage and contention during the deployment period.

Reducing the impact to Minor because the system has remained stable for over two weeks with analytic event processing enabled. The only functionality that is not restored is analytics related functionality like install counts. We caught up on our backlog of unprocessed analytics events 4 days ago and are now in the process of carefully reviewing the numbers for discrepancies. Install counts will be made visible once we are confident about the numbers.
Posted Mar 26, 2020 - 01:07 UTC
Update
Marketplace has now been up continuously for 2 weeks, and we have been closely monitoring system health for this period.

We have now successfully caught up with the backlog of analytic events that had built up during the incident. Over the next few days, we will review these analytics carefully and then make active install counts visible once we are confident the numbers are accurate.
Posted Mar 20, 2020 - 17:39 UTC
Update
Marketplace has remained up for 12 days. We have made changes to the analytics processing which we think will reduce the load on the system, and enable faster processing. We are now slowly catching up on the backlog of events that weren’t processed last week due to our incident response.

Although the system has remained up for 12 days, we are exercising caution before closing off the incident because we want to carefully explore the different possible contributors to the incident and ensure that we have taken the appropriate steps to avoid a recurrence.

Also, we should have some rough estimates on when the installation count numbers will be back on the Marketplace by this Friday.
Posted Mar 18, 2020 - 04:07 UTC
Update
Marketplace has now been stable in production for over 6 days.

The root cause is not conclusively known but we have a strong theory that it is related to increased load processing the analytics events and the related increase in worker cluster size. We have been investigating this theory thoroughly today and successfully deployed changes that will give us more telemetry to understand this.

We are still not processing analytics events fast enough to match the inbound rate of analytics, and want to exercise caution in scaling up the analytics processing to avoid affecting the stability of the core system functionality.

We successfully deployed a fix to search relevancy rankings that had been affected by our incomplete processing of analytics events.
Posted Mar 12, 2020 - 17:19 UTC
Update
The MPAC service has now been stable in production for almost five days.
The root cause is not conclusively known but we have a strong theory that it is related to increased load processing the analytics events and the related increase in worker cluster size.
We have been able to successfully deploy to production safely, although there was a brief period of intermittent javascript errors(unrelated to the previous incident) which trigger us to roll back one of the deployments.
We have unblocked the last remaining blocked endpoints in Marketplace so Japanese currency is showing in server UPM and the usage analytics "phone home" is being received are now working.
We have deployed a change to the frontend to remove the currently incorrect installation count from the UI.
Posted Mar 11, 2020 - 05:42 UTC
Update
Marketplace has been stable and operating all critical functionality since ~Friday 9am UTC, processing installs, upgrades and renewals as usual. We are confident it will remain stable with current mitigations put in place, which include increased resources for the service and disabling some non-core functionality.
There are 2 remaining areas of impact as a result of these mitigations:
Server customers in Japan continue to see USD pricing instead of Japanese Yen.
Analytics features have been disabled resulting in temporarily incorrect details being show for active install counts on App listing and in vendor reporting.
The full root cause remains unknown at this stage. There is a large team including senior technical leaders from across Atlassian working in shifts across timezones to restore 100% of operations, improve profiling and metrics and carry out additional load testing to confidently re-enable all non-core services.
Posted Mar 10, 2020 - 04:00 UTC
Update
As part of the ongoing investigations, we have disabled the analytics endpoint.
As a result of this, the installation count is reflecting as 0 - this has in turn affected the Top Vendor Status of the vendors.

The flagged Top Vendor status would have triggered an email notification, kindly ignore the same.
Posted Mar 09, 2020 - 06:31 UTC
Monitoring
We have recovered the services with all core functionality operational in Marketplace. We do not have a root cause identified yet. We are constantly monitoring the services and investigating further to identify the root cause. The following things are affected: Japanese customers on server/DC would be seeing pricing in USD.
Posted Mar 06, 2020 - 10:13 UTC
Update
No root cause identified
Our earlier hypothesis of a JVM heap issue has been proven wrong while we had been monitoring since the last update and trying various things to find out what caused the outage. While we were unable to reproduce this in staging, we just hit reproduced this in production which is causing a partial outage again. We are actively investigating this for a fix.
Posted Mar 06, 2020 - 06:42 UTC
Update
We are continuing to work on a fix for this issue.
Posted Mar 05, 2020 - 22:13 UTC
Update
Root cause identified
We continue to work on resolving the issue where app downloads and installations are failing. We have identified the root cause and expect recovery shortly. Vendors are unable to add new apps/versions.
Posted Mar 05, 2020 - 18:30 UTC
Update
We are continuing to work on a fix for this issue.
Posted Mar 05, 2020 - 16:53 UTC
Update
We are continuing to work on a fix for this issue.
Posted Mar 05, 2020 - 15:22 UTC
Identified
The issue has been identified and a fix is being implemented.
Posted Mar 05, 2020 - 13:24 UTC
Update
We are continuing to investigate the incident.
Posted Mar 05, 2020 - 09:28 UTC
Investigating
We are currently investigating this issue.
Posted Mar 05, 2020 - 07:37 UTC
This incident affected: Marketplace (App listing management, App listings, App pricing, App submissions, Category landing pages, Evaluations and purchases, In-product Marketplace and app installation (Cloud), In-product Marketplace and app installation (Server), Notifications, Private listings, Reporting APIs and dashboards, Search, Vendor management, Vendor Home Page).