arXiv:2508.03782v1 Announce Type: cross Abstract: The performance of decoders in Quantum Error Correction (QEC) is key to realizing practical quantum computers. In recent years, Graph Neural Networks (GNNs) have emerged as a promising approach, but their training methodologies are not yet well-established. It is generally expected that transferring theoretical knowledge from classical algorithms like Minimum Weight Perfect Matching (MWPM) to GNNs, a technique known as knowledge distillation, can effectively improve performance. In this work, we test this hypothesis by rigorously comparing two models based on a Graph Attention Network (GAT) architecture that incorporates temporal information as node features. The first is a purely data-driven model (baseline) trained only on ground-truth labels, while the second incorporates a knowledge distillation loss based on the theoretical error probabilities from MWPM. Using public experimental data from Google, our evaluation reveals that while the final test accuracy of the knowledge distillation model was nearly identical to the baseline, its training loss converged more slowly, and the training time increased by a factor of approximately five. This result suggests that modern GNN architectures possess a high capacity to efficiently learn complex error correlations directly from real hardware data, without guidance from approximate theoretical models.