MarkTechPost@AI 前天 04:40
An Advanced Coding Implementation: Mastering Browser‑Driven AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文介绍如何在Google Colab中构建一个浏览器驱动的AI Agent,利用Playwright的无头Chromium引擎、browser_use库、LangChain的Gemini模型,以及安全处理API密钥的pydantic。通过getpass、asyncio等工具,实现交互式Agent平台,无需离开Colab环境。该方案可用于抓取数据、总结文章等任务,并提供灵活的扩展性。

💻环境搭建:文章首先介绍了如何在Google Colab中安装所需的库,包括Playwright、python-dotenv、LangChain GoogleGenerativeAI连接器和browser-use,并下载必要的浏览器二进制文件。

🔑安全配置:文章强调了API密钥的安全处理,使用getpass获取密钥,并用pydantic的SecretStr进行安全存储。同时,通过设置ANONYMIZED_TELEMETRY环境变量为“false”,禁用匿名遥测数据上报。

🤖核心组件:文章详细介绍了Browser和BrowserContext的设置,包括无头浏览器实例的初始化、网络空闲等待、元素高亮显示以及会话录制。此外,还介绍了agent_loop函数,它封装了“思考和浏览”循环,实现Agent的运行和结果返回。

In this tutorial, we will learn how to harness the power of a browser‑driven AI agent entirely within Google Colab. We will utilize Playwright’s headless Chromium engine, along with the browser_use library’s high-level Agent and BrowserContext abstractions, to programmatically navigate websites, extract data, and automate complex workflows. We will wrap Google’s Gemini model via the langchain_google_genai connector to provide natural‑language reasoning and decision‑making, secured by pydantic’s SecretStr for safe API‑key handling. With getpass managing credentials, asyncio orchestrating non‑blocking execution, and optional .env support via python-dotenv, this setup will give you an end‑to‑end, interactive agent platform without ever leaving your notebook environment.

!apt-get update -qq!apt-get install -y -qq chromium-browser chromium-chromedriver fonts-liberation!pip install -qq playwright python-dotenv langchain-google-generative-ai browser-use!playwright install

We first refresh the system package lists and install headless Chromium, its WebDriver, and the Liberation fonts to enable browser automation. It then installs Playwright along with python-dotenv, the LangChain GoogleGenerativeAI connector, and browser-use, and finally downloads the necessary browser binaries via playwright install.

import osimport asynciofrom getpass import getpassfrom pydantic import SecretStrfrom langchain_google_genai import ChatGoogleGenerativeAIfrom browser_use import Agent, Browser, BrowserContextConfig, BrowserConfigfrom browser_use.browser.browser import BrowserContext

We bring in the core Python utilities, os for environment management and asyncio for asynchronous execution, plus getpass and pydantic’s SecretStr for secure API‑key input and storage. It then loads LangChain’s Gemini wrapper (ChatGoogleGenerativeAI) and the browser_use toolkit (Agent, Browser, BrowserContextConfig, BrowserConfig, and BrowserContext) to configure and drive a headless browser agent.

os.environ["ANONYMIZED_TELEMETRY"] = "false"

We disable anonymous usage reporting by setting the ANONYMIZED_TELEMETRY environment variable to “false”, ensuring that neither Playwright nor the browser_use library sends any telemetry data back to its maintainers.

async def setup_browser(headless: bool = True):    browser = Browser(config=BrowserConfig(headless=headless))    context = BrowserContext(        browser=browser,        config=BrowserContextConfig(            wait_for_network_idle_page_load_time=5.0,            highlight_elements=True,            save_recording_path="./recordings",        )    )    return browser, context

This asynchronous helper initializes a headless (or headed) Browser instance and wraps it in a BrowserContext configured to wait for network‑idle page loads, visually highlight elements during interactions, and save a recording of each session under ./recordings. It then returns both the browser and its ready‑to‑use context for your agent’s tasks.

async def agent_loop(llm, browser_context, query, initial_url=None):    initial_actions = [{"open_tab": {"url": initial_url}}] if initial_url else None    agent = Agent(        task=query,        llm=llm,        browser_context=browser_context,        use_vision=True,        generate_gif=False,          initial_actions=initial_actions,    )    result = await agent.run()    return result.final_result() if result else None

This async helper encapsulates one “think‐and‐browse” cycle: it spins up an Agent configured with your LLM, the browser context, and optional initial URL tab, leverages vision when available, and disables GIF recording. Once you call agent_loop, it runs the agent through its steps and returns the agent’s final result (or None if nothing is produced).

async def main():    raw_key = getpass("Enter your GEMINI_API_KEY: ")    os.environ["GEMINI_API_KEY"] = raw_key    api_key = SecretStr(raw_key)    model_name = "gemini-2.5-flash-preview-04-17"    llm = ChatGoogleGenerativeAI(model=model_name, api_key=api_key)    browser, context = await setup_browser(headless=True)    try:        while True:            query = input("\nEnter prompt (or leave blank to exit): ").strip()            if not query:                break            url = input("Optional URL to open first (or blank to skip): ").strip() or None            print("\n Running agent…")            answer = await agent_loop(llm, context, query, initial_url=url)            print("\n Search Results\n" + "-"*40)            print(answer or "No results found")            print("-"*40)    finally:        print("Closing browser…")        await browser.close()await main()

Finally, this main coroutine drives the entire Colab session: it securely prompts for your Gemini API key (using getpass and SecretStr), sets up the ChatGoogleGenerativeAI LLM and a headless Playwright browser context, then enters an interactive loop where it reads your natural‑language prompts (and optional start URL), invokes the agent_loop to perform the browser‑driven AI task, prints the results, and finally ensures the browser closes cleanly.

In conclusion, by following this guide, you now have a reproducible Colab template that integrates browser automation, LLM reasoning, and secure credential management into a single cohesive pipeline. Whether you’re scraping real‑time market data, summarizing news articles, or automating reporting tasks, the combination of Playwright, browser_use, and LangChain’s Gemini interface provides a flexible foundation for your next AI‑powered project. Feel free to extend the agent’s capabilities, re‑enable GIF recording, add custom navigation steps, or swap in other LLM backends to tailor the workflow precisely to your research or production needs.


Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 90k+ ML SubReddit.

[Register Now] miniCON Virtual Conference on AGENTIC AI: FREE REGISTRATION + Certificate of Attendance + 4 Hour Short Event (May 21, 9 am- 1 pm PST) + Hands on Workshop

The post An Advanced Coding Implementation: Mastering Browser‑Driven AI in Google Colab with Playwright, browser_use Agent & BrowserContext, LangChain, and Gemini appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Google Colab Playwright AI Agent LangChain Gemini
相关文章