Resolved -
# Postmortem: Salesforce Authentication & Deployment Delays
**Date Range:** January 27-29, 2025
**Status:** Resolved
## Summary
Between January 27-29, some customers experienced intermittent deployment failures and delays when Blue Canvas attempted to authenticate with Salesforce. The issues have been fully resolved.
## What Happened
### January 27
We made infrastructure improvements to support customers who require IP allowlisting for Salesforce access. All Blue Canvas traffic to Salesforce is now routed through 5 static IP addresses, making it easier for customers to configure their Salesforce security settings.
We also improved visibility into authentication issues by adding warnings to deployment operations when they are blocked due to Salesforce authentication errors.
### January 28
While implementing the authentication error warnings, we inadvertently introduced a configuration change that caused temporary errors to persist longer than necessary. When Salesforce experienced brief slowdowns, our system would cache these errors for up to 1 hour, preventing automatic recovery even after Salesforce responded normally again.
### January 29
Salesforce's authentication service experienced intermittent slowdowns, causing some authentication requests to time out. Combined with our caching behavior, this resulted in deployments being marked as failed instead of retrying automatically.
## Impact
- Customers experienced intermittent deployment failures with messages like "Failed to get Salesforce token" or authentication timeout errors
- Some deployments required manual retry after Salesforce recovered
## Resolution
We implemented the following fixes:
1. **Smarter Error Handling:** Our system now distinguishes between temporary issues (like timeouts) and permanent errors (like invalid credentials). Temporary issues trigger automatic retry instead of immediate failure.
2. **Faster Recovery:** Temporary errors are now cached for only 1 minute instead of 1 hour, allowing much faster recovery when Salesforce service is restored.
3. **Improved User Communication:** When Salesforce is slow to respond, you'll now see the message: *"Packaging is taking longer because Salesforce is not responding. We will retry automatically."* instead of an immediate failure.
4. **Automatic Retry:** Deployments affected by temporary Salesforce issues will now retry automatically without requiring manual intervention.
## Questions?
If you have any questions or experienced issues during this period that haven't been resolved, please contact our support team.
Jan 29, 16:38 UTC
Monitoring -
We've implemented a fix and Deployment Requests are successfully building once again.
Any that are currently marked as failed because the read operation timed out can be safely retried.
Jan 29, 16:27 UTC
Investigating -
Deployment requests are currently failing to package with the message "The read operation timed out".
It seems like our authentication requests to Salesforce are being timed out, and not being retried very often. We're currently pushing a fix to retry more frequently while we continue to investigate.
Jan 29, 15:57 UTC