The GitHub Blog 02月13日
GitHub Availability Report: January 2025
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

2025 年 1 月,GitHub 经历三次服务性能下降事件。包括 1 月 9 日因部署引入问题查询致服务受影响;1 月 13 日因配置变更致 Git 操作不可用;1 月 30 日因硬件故障致网页请求失败。每次事件后都采取了缓解措施,并提出未来改进计划。

💻1 月 9 日,部署引入问题查询,致服务受扰,后回滚部署解决

🚧1 月 13 日,配置变更致 Git 操作不可用,通过回滚配置缓解

🧰1 月 30 日,硬件故障致网页请求失败,手动切换硬件并改进

In January, we experienced three incidents that resulted in degraded performance across GitHub services.

January 09 1:26 UTC (lasting 31 minutes)

On January 9, 2025, between 01:26 UTC and 01:56 UTC, GitHub experienced widespread disruption to many services, with users receiving 500 responses when trying to access various functionality. This was due to a deployment which introduced a query that saturated a primary database server. On average, the error rate was 6% and peaked at 6.85% of update requests.

We were able to mitigate the incident by identifying the source of the problematic query and rolling back the deployment. The internal tooling and our dashboards surfaced the relevant data that helped us quickly identify the problematic query. It took us a total of 14 minutes from the time to engage to finding the errant query.

However, we are investing in tooling to detect problematic queries prior to deployment to prevent and to reduce our time to detection and mitigation of issues like this one in the future.

January 13 23:35 UTC (lasting 49 minutes)

On January 13, 2025, between 23:35 UTC and 00:24 UTC, all Git operations were unavailable due to a configuration change related to traffic routing and testing that caused our internal load balancer to drop requests between services that Git relies upon.

We mitigated the incident by rolling back the configuration change.

We are improving our monitoring and deployment practices to improve our time to detection and automated mitigation for issues like this in the future.

January 30 14:22 UTC (lasting 26 minutes)

On January 30, 2025, between 14:22 UTC and 14:48 UTC, web requests to github.com experienced failures (at peak the error rate was 44%), with the average successful request taking over three seconds to complete.

This outage was caused by a hardware failure in the caching layer that supports rate limiting. In addition, the impact was prolonged due to a lack of automated failover for the caching layer. A manual failover of the primary to trusted hardware was performed following recovery to ensure that the issue would not reoccur under similar circumstances.

As a result of this incident, we will be moving to a high availability cache configuration and adding resilience to cache failures at this layer to ensure requests are able to be handled should similar circumstances happen in the future.


Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.

The post GitHub Availability Report: January 2025 appeared first on The GitHub Blog.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

GitHub 服务性能 问题解决
相关文章