arXiv:2405.14078v2 Announce Type: replace Abstract: Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL). In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards. In particular, we study finite-time analysis of a distributed Q-learning algorithm, and provide a new sample complexity result of $\tilde{\mathcal{O}}\left( \min\left{\frac{1}{\epsilon^2}\frac{t{\text{mix}}}{(1-\gamma)^6 d{\min}^4 } ,\frac{1}{\epsilon}\frac{\sqrt{|\gS||\gA|}}{(1-\sigma2(\boldsymbol{W}))(1-\gamma)^4 d{\min}^3} \right}\right)$ under tabular lookup