arXiv:2507.05307v1 Announce Type: cross Abstract: The integration of Large Language Models (LLMs) into browser extensions has revolutionized web browsing, enabling sophisticated functionalities like content summarization, intelligent translation, and context-aware writing assistance. However, these AI-powered extensions introduce unprecedented challenges in testing and reliability assurance. Traditional browser extension testing approaches fail to address the non-deterministic behavior, context-sensitivity, and complex web environment integration inherent to LLM-powered extensions. Similarly, existing LLM testing methodologies operate in isolation from browser-specific contexts, creating a critical gap in effective evaluation frameworks. To bridge this gap, we present ASSURE, a modular automated testing framework specifically designed for AI-powered browser extensions. ASSURE comprises three principal components: (1) a modular test case generation engine that supports plugin-based extension of testing scenarios, (2) an automated execution framework that orchestrates the complex interactions between web content, extension processing, and AI model behavior, and (3) a configurable validation pipeline that systematically evaluates behavioral consistency and security invariants rather than relying on exact output matching. Our evaluation across six widely-used AI browser extensions demonstrates ASSURE's effectiveness, identifying 531 distinct issues spanning security vulnerabilities, metamorphic relation violations, and content alignment problems. ASSURE achieves 6.4x improved testing throughput compared to manual approaches, detecting critical security vulnerabilities within 12.4 minutes on average. This efficiency makes ASSURE practical for integration into development pipelines, offering a comprehensive solution to the unique challenges of testing AI-powered browser extensions.