The GitHub Blog 06月12日 07:28
GitHub Availability Report: May 2025
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GitHub发布了2025年5月的服务可用性报告,详细介绍了当月发生的三个影响用户体验的事件。这些事件涉及Issues服务附件上传问题、GitHub Actions的延迟作业启动以及Microsoft Teams GitHub集成服务完全中断。报告分析了每个事件的原因、影响范围和缓解措施,并强调了GitHub为提高服务稳定性和可靠性所做的努力。

💥 5月1日,Issues服务出现问题,导致用户无法上传附件,持续约1小时。问题源于新功能添加的自定义HTTP请求头导致CORS错误,影响约13万用户约45分钟。GitHub通过回滚该功能解决了问题,并增加了新的指标来监控客户端请求的更改。

⏰ 5月28日,GitHub Actions在公共仓库中使用Ubuntu-24标准托管运行器时,作业启动出现延迟,持续约5小时。原因是后端缓存配置错误导致作业分配重复,影响了约19.7%的Ubuntu-24托管运行器作业。GitHub通过修复配置和扩大资源池解决了问题,并计划改进故障转移的弹性。

🛑 5月30日,Microsoft Teams GitHub集成服务完全中断,持续约7小时50分钟。原因是与下游身份验证提供商的身份验证问题,导致所有功能出现100%错误率。GitHub通过与提供商合作恢复服务,并计划迁移到更可靠的身份验证方法,以降低未来发生类似问题的风险。

In May, we experienced three incidents that resulted in degraded performance across GitHub services.

May 1 22:09 UTC (lasting 1 hour and 4 minutes)

On May 1, 2025, from 22:09 UTC to 23:13 UTC, the Issues service was degraded and users weren’t able to upload attachments. The root cause was identified to be a new feature which added a custom header to all client-side HTTP requests, causing CORS errors when uploading attachments to our provider. We estimate that ~130k users were impacted by the incident for ~45min.

We mitigated the incident by rolling back the feature flag that added the new header at 22:56 UTC. In order to prevent this from happening again, we are adding new metrics to monitor and ensure the safe rollout of changes to client-side requests. We have since deployed an augmented version of the feature based on learnings from this incident that is performing well in production.

May 28 09:45 UTC (lasting 5 hours)

On May 28, 2025, from approximately 09:45 UTC to 14:45 UTC, GitHub Actions experienced delayed job starts for workflows in public repos using Ubuntu-24 standard hosted runners. This was caused by a misconfiguration in backend caching behavior after a failover, which led to duplicate job assignments reducing overall capacity in the impacted hosted runner pools. Approximately 19.7% of Ubuntu-24 hosted runner jobs on public repos were delayed. Other hosted runners, self-hosted runners, and private repo workflows were unaffected.

By 12:45 UTC, the configuration issue was fixed through updates to the backend cache. The pools were also scaled up to more quickly work through the backlog of queued jobs until queuing impact was fully mitigated at 14:45 UTC. We are improving failover resiliency and validation to reduce the likelihood of similar issues in the future.

May 30 08:10 UTC (lasting 7 hours and 50 minutes)

On May 30, 2025, between 08:10 UTC and 16:00 UTC, the Microsoft Teams GitHub integration service experienced a complete service outage.

During this period, the integration was unable to process user requests or deliver notifications, resulting in a 100% error rate across all functionality, with the exception of link previews. This outage was caused by an authentication issue with our downstream authentication provider.

While the appropriate monitoring was in place, the alerting thresholds were not sufficiently sensitive to trigger a timely response, resulting in a delay in incident detection and engagement. Once engaged, our team worked closely with the downstream provider to diagnose and resolve the authentication failure. However, longer-than-expected response times from the provider contributed to the extended duration of the outage.

We mitigated the incident by working with our provider to restore service functionality and are working to migrate to more durable authentication methods to reduce the risk of similar issues in the future.


Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.

The post GitHub Availability Report: May 2025 appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GitHub 服务中断 可用性报告
相关文章