Gabes Virtual World 06月11日 22:55
vCenter appliance database issue
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文分享了在家庭实验室环境中,由于断电导致vCenter Appliance无法正常启动的数据库问题的解决过程。文章作者通过检查日志文件,发现数据库无法加载,并最终通过pg_resetxlog命令重置了事务日志,成功恢复了vCenter的运行。整个过程涉及了对数据库错误的分析、命令的使用以及对数据丢失风险的认识。作者强调了此操作的非官方支持性,并分享了详细的操作步骤。

💡 **问题诊断:** 作者首先通过查看`/var/log/vmware/vpxd/vpxd.log`日志,发现无法连接到数据库,然后检查`/storage/db/vpostgres/pg_log/postgresql.log`日志,发现数据库系统中断,存在无效的检查点记录,导致数据库无法启动。

⚠️ **解决方案:** 针对错误信息“PANIC: could not locate a valid checkpoint record”,作者查阅资料后,决定使用`pg_resetxlog`命令重置数据库的写前日志(write-ahead log)。

🛠️ **操作步骤:** 首先,通过`/etc/vmware-vpx/embedded_db.cfg`找到数据库的存储位置。然后,由于需要超级用户权限,作者使用`su vpostgres -s /bin/sh`切换到vpostgres用户,并执行`pg_resetxlog -f /storage/db/vpostgres`命令。

✅ **结果验证:** 重置事务日志后,作者尝试启动vpxd服务,但仍遇到问题。最终通过重启vCenter Appliance解决了问题,vCenter恢复正常运行。

Recently I had an issue in my homelab environment. Because of some power outages, my vCenter Appliance hadn’t been shutdown correctly and now vCenter didn’t start correctly anymore. After some searching I found that the database could not be loaded. In the VMware KBs I couldn’t find anything that fixes the start up of the database it self. Mostly it is about resetting the database, but even though my environment is quite small, I had VSAN running in it and was afraid about what would happen if I connect a clean vCenter to the existing hosts. So I decided to dive in and try and fix it at the database level.

To see what was going on, I first check the vpxd.log ( /var/log/vmware/vpxd/vpxd.log) and found that a login to the database was not possible:

info vpxd[7FF9A8AD97A0] [Originator@6876 sub=vpxdVdb] [VpxdVdb::SetDBType] Logging in to DSN: VMware VirtualCenter with username vcerror vpxd[7FF9A8AD97A0] [Originator@6876 sub=vpxdVdb] [VpxdVdb::SetDBType] Failed to connect to database: ODBC error: (08001) - [unixODBC]Could not connect to the server; --> Connection refused [127.0.0.1:5432].  Retry attempt: 1 ...
Then I wanted to check if the database was running at all. In the database logs (/storage/db/vpostgres/pg_log/postgresql.log) I saw the following lines:
2016-09-10 19:02:12.294 UTC 57d458b4.21d8 0   LOG:  database system was interrupted; last known up at 2016-05-16 22:58:35 UTC2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  unexpected pageaddr E/C8000000 in log segment 000000010000000E000000CC, offset 02016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  invalid primary checkpoint record2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  unexpected pageaddr E/C8000000 in log segment 000000010000000E000000CC, offset 02016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   LOG:  invalid secondary checkpoint record2016-09-10 19:02:14.920 UTC 57d458b4.21d8 0   PANIC:  could not locate a valid checkpoint record2016-09-10 19:02:14.920 UTC 57d458b1.20bf 0   LOG:  startup process (PID 8664) was terminated by signal 6: Aborted2016-09-10 19:02:14.920 UTC 57d458b1.20bf 0   LOG:  aborting startup due to startup process failure
Some Google assistance on “PANIC:  could not locate a valid checkpoint record” learn that there probably was a checkpoint not cleared properly because of the unclean shutdown. Suggested solutions talked about using pg_resetxlog which will reset the write-ahead log and other control information of a PostgreSQL database cluster.
** Warning ** Nowhere can I find anything on this command in the VMware KBs, so I want to emphasise that the next steps are unsupported and I expect resetting the write-ahead log will also cause some data loss. You’re at your own from here :-)
The command line for the pg_resetxlog would be:
/opt/vmware/vpostgres/9.3/bin/pg_resetxlog -f  {Location of the database}
First I needed to find out, where the database was located. This can be found in /etc/vmware-vpx/embedded_db.cfg at the following line:
EMB_DB_STORAGE='/storage/db/vpostgres'
Then when running the pg_resetxlog command, I received an error:
/opt/vmware/vpostgres/9.3/bin/pg_resetxlog -f  /storage/db/vpostgresYou must run pg_resetxlog as the PostgreSQL superuser
Hmm, the superuser? When looking at the directory contents of the /storage/db/vpostgres directory, I saw the user vpostgres had rights on this directory. So I tried running the command as the vpostgres user:
su vpostgres -s /bin/sh/opt/vmware/vpostgres/9.3/bin/pg_resetxlog -f  /storage/db/vpostgres
This returned: Transaction log reset
I then tried to start vpxd again ( service vmware-vpxd start ) but again it took a lot of time. I could then see in the logs that it was waiting for services on port 8089 and since I had stopped and started a number of services during my troubleshooting, I decided to just reboot the appliance. After the reboot, vCenter was up and running again and I could reconnect without any issues.

See full post at: vCenter appliance database issue

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

vCenter 数据库 故障恢复 pg_resetxlog
相关文章