MarkTechPost@AI 03月29日 00:15
Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文展示了如何结合Python的Pandas库、Google Cloud的google.generativeai包和Gemini Pro模型,构建一个数据分析Agent。通过安装必要的库、配置API密钥,并利用IPython的显示功能,逐步演示了如何将DataFrame转换为Markdown格式,并使用自然语言查询生成数据洞察。该方法突出了传统数据分析工具与现代AI驱动方法的结合潜力,简化了数据查询和解释,为数据科学家提供了更高效、更具创新性的数据分析方法。

💻 环境搭建:首先,通过pip安装Pandas和google-generativeai库,为数据操作和AI分析奠定基础。

🔑 导入与配置:导入Pandas、google.generativeai和Markdown,并配置Google API密钥,初始化Gemini Pro模型,为后续的数据分析做好准备。

📊 数据准备:创建一个Pandas DataFrame,包含产品、类别、地区、销量和价格等示例销售数据,为后续的分析提供数据来源。

🤖 构建数据分析Agent:定义一个ask_gemini_about_data函数,该函数将DataFrame和自然语言查询作为输入,使用Gemini Pro模型生成分析结果。

💡 实例演示:通过5个示例查询,展示如何使用自然语言向Gemini Pro模型提问,例如计算总销量、找出销量最高的商品、计算平均价格等,从而获得数据洞察。

In this tutorial, we demonstrate the integration of Python’s robust data manipulation library Pandas with Google Cloud’s advanced generative capabilities through the google.generativeai package and the Gemini Pro model. By setting up the environment with the necessary libraries, configuring the Google Cloud API key, and leveraging the IPython display functionalities, the code provides a step-by-step approach to building a data science agent analyzing a sample sales dataset. The example shows how to convert a DataFrame into markdown format and then use natural language queries to generate insights about the data, highlighting the potential of combining traditional data analysis tools with modern AI-driven methods.

!pip install pandas google-generativeai --quiet

First, we install the Pandas and google-generativeai libraries quietly, setting up the environment for data manipulation and AI-powered analysis.

import pandas as pdimport google.generativeai as genaifrom IPython.display import Markdown

We import Pandas for data manipulation, google.generativeai for accessing Google’s generative AI capabilities, and Markdown from IPython.display to render markdown-formatted outputs.

GOOGLE_API_KEY = "Use Your API Key Here"genai.configure(api_key=GOOGLE_API_KEY)model = genai.GenerativeModel('gemini-2.0-flash-lite')

We assign a placeholder API key, configure the google.generativeai client with it, and initialize the ‘gemini-2.0-flash-lite’ GenerativeModel for generating content.

data = {'Product': ['Laptop', 'Mouse', 'Keyboard', 'Monitor', 'Webcam', 'Headphones'],        'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics', 'Electronics'],        'Region': ['North', 'South', 'East', 'West', 'North', 'South'],        'Units Sold': [150, 200, 180, 120, 90, 250],        'Price': [1200, 25, 75, 300, 50, 100]}sales_df = pd.DataFrame(data)print("Sample Sales Data:")print(sales_df)print("-" * 30)

Here, we create a Pandas DataFrame named sales_df containing sample sales data for various products, and then print the DataFrame followed by a separator line to visually distinguish the output.

def ask_gemini_about_data(dataframe, query):    """    Asks the Gemini Pro model a question about the given Pandas DataFrame.    Args:        dataframe: The Pandas DataFrame to analyze.        query: The natural language question about the DataFrame.    Returns:        The response from the Gemini Pro model as a string.    """    prompt = f"""You are a data analysis agent. Analyze the following pandas DataFrame and answer the question.    DataFrame:    ```    {dataframe.to_markdown(index=False)}    ```    Question: {query}    Answer:    """    response = model.generate_content(prompt)    return response.text

Here, we construct a markdown-formatted prompt from a Pandas DataFrame and a natural language query, then use the Gemini Pro model to generate and return an analytical response.

# Query 1: What is the total number of units sold across all products?query1 = "What is the total number of units sold across all products?"response1 = ask_gemini_about_data(sales_df, query1)print(f"Question 1: {query1}")print(f"Answer 1:\n{response1}")print("-" * 30)
Query 1 Output
# Query 2: Which product had the highest number of units sold?query2 = "Which product had the highest number of units sold?"response2 = ask_gemini_about_data(sales_df, query2)print(f"Question 2: {query2}")print(f"Answer 2:\n{response2}")print("-" * 30)
Query 2 Output
# Query 3: What is the average price of the products?query3 = "What is the average price of the products?"response3 = ask_gemini_about_data(sales_df, query3)print(f"Question 3: {query3}")print(f"Answer 3:\n{response3}")print("-" * 30)
Query 3 Output
# Query 4: Show me the products sold in the 'North' region.query4 = "Show me the products sold in the 'North' region."response4 = ask_gemini_about_data(sales_df, query4)print(f"Question 4: {query4}")print(f"Answer 4:\n{response4}")print("-" * 30)
Query 4 Output
# Query 5. More complex query: Calculate the total revenue for each product.query5 = "Calculate the total revenue (Units Sold * Price) for each product and present it in a table."response5 = ask_gemini_about_data(sales_df, query5)print(f"Question 5: {query5}")print(f"Answer 5:\n{response5}")print("-" * 30)
Query 5 Output

In conclusion, the tutorial successfully illustrates how the synergy between Pandas, the google.generativeai package, and the Gemini Pro model can transform data analysis tasks into a more interactive and insightful process. The approach simplifies querying and interpreting data and opens up avenues for advanced use cases such as data cleaning, feature engineering, and exploratory data analysis. By harnessing these state-of-the-art tools within the familiar Python ecosystem, data scientists can enhance their productivity and innovation, making it easier to derive meaningful insights from complex datasets.


Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post Tutorial to Create a Data Science Agent: A Code Implementation using gemini-2.0-flash-lite model through Google API, google.generativeai, Pandas and IPython.display for Interactive Data Analysis appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

Pandas google-generativeai Gemini Pro 数据分析 AI Agent
相关文章