arXiv:2507.05424v1 Announce Type: cross Abstract: Large language models are capable of leveraging both contextual and parametric knowledge but how they prioritize and integrate these sources remains underexplored. We introduce CoPE, a novel evaluation framework that systematically measures contextual knowledge (CK) and parametric knowledge (PK) across models and languages. Using our MultiWikiAtomic dataset in English, Spanish, and Danish, we analyze how large language models (LLMs) integrate context, prioritize information, and incorporate PK in open-ended question answering. Our analysis uncovers a phenomenon we call lost-in-the-later, where LLMs tend to overlook or deprioritize information that appears later in a given context, revealing a strong positional bias that affects contextual grounding. We further find that reasoning models, as well as non-reasoning models prompted with chain-of-thought (CoT), use context even less than non-reasoning models without CoT and fail to mitigate the lost-in-the-later effect. CoT prompting, in particular, results in lower recall and shorter responses, leading to degraded contextual grounding. Based on these insights, we design prompt-based methods to effectively leverage input context. A case study applying CoPE to summarization demonstrates that CK-informed prompting improves factual grounding and reduces hallucination.