The GitHub Blog 前天 05:08
GitHub Availability Report: April 2025
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

GitHub在2025年4月经历了三次服务性能下降事件。这些事件分别由内部依赖项配置更改、数据库资源争用以及配置更改导致的任务迁移失败引起。GitHub迅速响应,通过回滚更改、优化数据库、改进监控和增强测试覆盖范围等措施恢复服务。报告详细介绍了每次事件的影响范围、持续时间以及GitHub为防止未来类似问题所采取的措施,强调了GitHub在提高服务稳定性和可靠性方面的持续努力。

🗓️4月11日事件:Codespaces用户创建和启动失败。由于对内部依赖项的配置更改未通过测试覆盖范围,导致约75%的Codespaces用户在UTC时间03:05至03:44之间遇到创建和启动失败的问题。GitHub通过回滚更改迅速恢复了服务。

🗄️4月23日事件(一):数据库资源争用导致服务降级。UTC时间07:00至07:20期间,由于数据库主机资源争用,多个GitHub服务出现性能下降,导致2-5%的请求出现错误。问题由查询负载和正在进行的模式更改之间的交互引起,在模式迁移完成后恢复。

🚚4月23日事件(二):迁移服务配置更改导致失败。UTC时间19:13:50至22:11:00,由于配置更改移除了存储库迁移工作者的访问权限,GitHub的迁移服务出现故障,影响了57个组织的837次迁移。GitHub恢复访问权限后,正常操作恢复,未进一步中断。

In April, we experienced three incidents that resulted in degraded performance across GitHub services.

April 11 03:05 UTC (lasting 39 minutes)

On April 11, 2025, from 03:05 UTC to 03:44 UTC, approximately 75% of Codespaces users faced create and start failures. These were caused by manual configuration changes to an internal dependency that escaped our test coverage. Our monitors and detection mechanism triggered, which helped us triage, revert the changes, and restore service health.

We are working on building additional gates, safer mechanisms for testing, and rolling out such configuration changes. We expect no further disruptions. 

April 23 07:00 UTC (lasting 20 minutes)

On April 23, 2025, between 07:00 UTC and 07:20 UTC, multiple GitHub services experienced degradation caused by resource contention on database hosts. The resulting error rates, which ranged from 2–5% of total requests, led to intermittent service disruption for users. The issue was triggered by an interaction between query load and ongoing schema change that led to connection saturation. The incident recovered after the schema migration was completed.

Our prior investments in monitoring and improved playbooks helped us effectively organize our first responder teams, leading to faster triaging of the incident. We have also identified a regression in our schema change tooling that led to increased resource utilization during schema and reverted to a previous stable version. 

To prevent similar issues in the future, we are reviewing the capacity of the database, improving monitoring and alerting systems, and implementing safeguards to reduce time to detection and mitigation. 

April 23 19:13 UTC (lasting 42 minutes)

On April 23, 2025, between 19:13:50 UTC and 22:11:00 UTC, GitHub’s Migration service experienced elevated failures caused by a configuration change that removed access for repository migration workers. During this time, 837 migrations across 57 organizations were affected. Impacted migrations required a retry after the log message “Git source migration failed. Error message: An error occurred. Please contact support for further assistance.” was displayed. Once access was restored, normal operations resumed without further interruption.

As a result of this incident, we have implemented enhanced test coverage and refined monitoring thresholds to help prevent similar disruptions in the future.


Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.

The post GitHub Availability Report: April 2025 appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GitHub 服务中断 可用性报告 故障分析
相关文章