AI 大模型应用进阶系列 ( 一 ) Python 基础

数据类型

数字

在 Python 中，数字类型是基本数据类型之一，用于表示数值

  a = 10        # 十进制整数  b = 0b1010    # 二进制整数（等于十进制的10）  c = 0o12      # 八进制整数（等于十进制的10）  d = 0xA       # 十六进制整数（等于十进制的10）  print(a, b, c, d)  # 输出: 10 10 10 10

    x = 3.14    y = 1.23e-4  # 科学计数法，表示1.23×10⁻⁴，即0.000123    print(x, y)  # 输出: 3.14 0.000123

字符串

在 Python 中，字符串属于内置的数据类型，用于表示文本数据

基本字符串

name="小明"age="18"print(f{{name}今年{age}岁) // 输出: 小明今年18岁

转义字符串

text="转义\"python\""

三引号字符串

包含多行文本，保留所有换行和空格

"""    你好，    Python"""

字符串拼接

name = "小明"age = 18message = "name" + str(age) + "岁"print(message)print(f"{name} 今年 {str(age)} 岁")

索引

    name = "python"    print(name[0]) 输出第一个字符 p    print(name[-1]) 输出最后一个字符 n

切片

    str = "Hello, world!"     print(str[0:6]) # 输出：Hello（从第0到第4个字符，不包括第6个）    print(str[6:]) # 输出：world!（从第6个字符到结尾）     print(str[:6]) # 输出：Hello（从开头到第5个字符）

方法

lower 全部转换为小写

    str = "Hello, World"    print(str.lower()) 输出 "hello, world"

upper 全部转换为大写

    str = "Hello, World"    print(str.lower()) 输出 "HELLO, WORLD"

find 查找字符第一次出现的位置

    str = "Hello, World"    print(str.find("World")) 输出 6

replace 替换

    str = "Hello, World"   print(str.replace("World", "Python")) 替换所有"World"为"Python"

format 用{}占位，后面用format填充

    name = "小明"     age = 18    print("{}今年{}岁".format(name, age)) 输出:小明今年18岁

f-string

    name = "小明"     age = 18    print(f"{name}今年{age}岁") 输出:小明今年18岁

encode

decode

startWith

endWidth

isdigit

布尔值

TrueFalse

空值

name = None

列表

列表（List）属于内置的数据结构，它是一种可变的有序序列，能够存放各种类型的元素

创建

list = ["a", "b", "c"]print(list[0]) # 输出第一个元素：aprint(list[1]) # 输出第二个元素：bprint(list[2]) # 输出第三个元素：cprint(list[-1]) # 输出最后一个元素print(list[-2]) # 输出倒数第二个元素print(list[3]) # 报错：IndexError: list index out of range

添加

list = ["a", "b", "c"]list.append('d')print(list) # 输出 ["a", "b", "c", "d"]

插入

list = ["a", "b", "c"]# 在指定位置插入一个元素list.insert(1, "e")print(list) # 输出 ["a", "e", "b", "c"]

修改

list = ["a", "b", "c"]list[1] = "d"print(list) # 输出 ["a", "d", "c"]

删除

list = ["a", "b", "c"]const item = list.pop()print(item) # 输出 aprint(list) # 输出 ['b', 'c']

元祖

元组（tuple） 是一种不可变的序列类型，用于存储多个元素

创建

tuple = ('a', 'b', 'c')print(tuple) # 输出 ('a', 'b', 'c')

访问

tuple = ('a', 'b', 'c')item = tuple[1] # 输出 'b'item2 = tuple[-1] # 输出 'c'

不可变性

tuple = ('a', 'b', 'c')tuple[1] = 'd' # TypeError: 'tuple' object does not support item assignment

可以包含可变对象

tuple = ['a', ['b', 'c']]tuple[1][0] = 'd'print(tuple) # 输出 ['a', ['d', 'c']]

字典

字典（Dictionary） 是一种无序、可变且可迭代的数据结构，用于存储键值对（key-value pairs）

创建

# 方式1：使用花括号empty_dict = {}person = {"name": "zhangsan", "age": 18}# 方式2：使用 dict() 构造函数colors = dict(red="#FF0000", green="#00FF00", blue="#0000FF")

访问

person = {"name": "zhangsan", "age": 18}print(person["name"])  # 输出: zhangsanprint(person.get("age"))  # 输出: 18 (推荐使用 get 方法)

添加

person = {"name": "zhangsan", "age": 18}person["city"] = "Beijing"  # 添加新键 "city"print(person) # {'name': 'zhangsan', 'age': 18, 'city': 'Beijing'}

修改

person = {"name": "zhangsan", "age": 18}person["name"] = "lisi"print(person) # {'name': 'lisi', 'age': 18}

获取

person = {"name": "zhangsan", "age": 18}print(person["name"]) # 输出zhangsanprint(person["city"]) # 报错 KeyError: 'city'print(person.get("name")) # 输出 zhangsanprint(person.get("city")) # 输出 None 查不到时返回Noneprint(person.get("job", "IT")) # 输出 IT 默认值

删除

person = {"name": "zhangsan", "age": 18 }name = person.pop("name")print(name) # 输出 zhangsanprint(person) # 输出 {'age': 18 }

集合

在 Python 中，集合（Set）是一种无序且元素唯一的数据结构，属于内置的数据类型之一

创建

# 创建包含元素的集合set = {"a", "b", "c"}print(set)  # 输出: {'a', 'b', 'c'}# 集合会自动去重numbers = {1, 2, 2, 3, 3, 3}print(numbers)  # 输出: {1, 2, 3}

# 从列表创建集合set = set(["a", "b", "c"])print(colors)  # 输出: {'a', 'b', 'c'}# 创建空集合（必须用 set()，因为 {} 表示空字典）empty_set = set()print(type(empty_set))  # 输出: <class 'set'>

添加

set = {"a", "b", "c"}set.add('d')print(set) # 输出 {'d', 'a', 'c', 'b'}

删除

set = {"a", "b", "c"}set.remove('a')print(set) # 输出 {'b', 'c'}

函数

在 Python 中，函数是组织好的、可重复使用的代码块，用于执行单一或相关联的任务。函数使代码更模块化、更易读且易于维护。以下是关于 Python 函数定义的详细介绍：

函数定义

def function_name(parameters):    """函数文档字符串（可选）"""    # 函数体    return [expression]  # 返回语句（可选）

返回值

def calculate(a, b):    return a + b, a - b, a * bsum_result, diff_result, prod_result = calculate(10, 5)print(sum_result)    # 输出: 15print(diff_result)   # 输出: 5print(prod_result)   # 输出: 50

匿名函数

square = lambda x: x ** 2print(square(4))  # 输出: 16# 结合内置函数使用numbers = [1, 2, 3, 4, 5]squared_numbers = list(map(lambda x: x ** 2, numbers))print(squared_numbers)  # 输出: [1, 4, 9, 16, 25]

高阶函数

有些函数可以接受其他函数作为参数，或者把函数作为返回值，这样的函数被称为高阶函数

def apply_operation(func, a, b):    return func(a, b)result = apply_operation(lambda x, y: x + y, 3, 4)print(result)  # 输出: 7

生成器函数

def count_up_to(n):    i = 1    while i <= n:        yield i        i += 1counter = count_up_to(3)print(next(counter))  # 输出: 1print(next(counter))  # 输出: 2print(next(counter))  # 输出: 3

内置函数

1. 数据类型转换

函数	功能	示例
`int(x)`	将 `x` 转换为整数	`int("10") → 10`
`float(x)`	将 `x` 转换为浮点数	`float("3.14") → 3.14`
`str(x)`	将 `x` 转换为字符串	`str(42) → "42"`
`bool(x)`	将 `x` 转换为布尔值（`True`/`False`）	`bool(0) → False`
`list(x)`	将 `x` 转换为列表	`list("abc") → ['a', 'b', 'c']`
`tuple(x)`	将 `x` 转换为元组	`tuple([1,2]) → (1,2)`
`dict(x)`	将 `x` 转换为字典	`dict([('a',1)]) → {'a':1}`
`set(x)`	将 `x` 转换为集合	`set([1,1,2]) → {1,2}`

2. 数学运算

函数	功能	示例
`abs(x)`	返回 `x` 的绝对值	`abs(-5) → 5`
`round(x, n)`	对 `x` 四舍五入，保留 `n` 位小数	`round(3.1415, 2) → 3.14`
`max(iterable)`	返回可迭代对象中的最大值	`max([1,5,3]) → 5`
`min(iterable)`	返回可迭代对象中的最小值	`min([1,5,3]) → 1`
`sum(iterable)`	返回可迭代对象中所有元素的和	`sum([1,2,3]) → 6`

3. 迭代与序列操作

函数	功能	示例
`len(x)`	返回对象的长度或元素个数	`len("abc") → 3`
`sorted(x)`	返回排序后的新列表（原列表不变）	`sorted([3,1,2]) → [1,2,3]`
`reversed(x)`	返回反向迭代器	`list(reversed([1,2])) → [2,1]`
`range(start, stop, step)`	生成数字序列	`list(range(0,5,2)) → [0,2,4]`
`enumerate(iterable)`	返回带索引的元组迭代器	`list(enumerate("ab")) → [(0,'a'), (1,'b')]`
`zip(*iterables)`	将多个迭代器的元素打包成元组	`list(zip([1,2], ['a','b'])) → [(1,'a'), (2,'b')]`

4. 输入输出

函数	功能	示例
`print(*objects)`	打印对象到标准输出	`print("Hello", 42)`
`input(prompt)`	从标准输入读取字符串（带提示）	`name = input("请输入名字: ")`

5. 文件操作

函数	功能	示例
`open(file, mode)`	打开文件并返回文件对象	`f = open("test.txt", "r")`

6. 类型与对象检查

函数	功能	示例
`type(x)`	返回对象的类型	`type(42) → <class 'int'>`
`isinstance(x, type)`	判断 `x` 是否为指定类型的实例	`isinstance(42, int) → True`
`dir(x)`	返回对象的所有属性和方法名称	`dir([])` 返回列表的所有方法
`id(x)`	返回对象的唯一标识符（内存地址）	`id(obj)`

7. 高级内置函数

函数	功能	示例
`map(func, iterable)`	对迭代器中的每个元素应用函数	`list(map(lambda x:x*2, [1,2])) → [2,4]`
`filter(func, iterable)`	过滤迭代器中符合条件的元素	`list(filter(lambda x:x>0, [-1,2])) → [2]`
`reduce(func, iterable)`	累积计算迭代器中的元素（需导入）	`from functools import reduce` `reduce(lambda x,y:x+y, [1,2,3]) → 6`
`eval(expression)`	执行字符串表达式并返回结果	`eval("1+2") → 3`

8. 其他常用函数

函数	功能	示例
`pow(x, y)`	计算 `x` 的 `y` 次幂	`pow(2, 3) → 8`
`divmod(x, y)`	返回 `x` 除以 `y` 的商和余数	`divmod(10, 3) → (3, 1)`
`help([obj])`	显示对象的帮助文档（交互式环境）	`help(list)`
`globals()`	返回当前全局符号表（字典形式）	`globals()`
`locals()`	返回当前局部符号表（字典形式）	`locals()`

面向对象

面向对象编程（Object-Oriented Programming，OOP）是一种编程范式，它将数据（属性）和操作数据的代码（方法）封装为相互关联的 “对象”，并通过对象之间的交互来构建软件系统。其核心思想是将现实世界中的事物抽象为程序中的对象，每个对象拥有自己的状态和行为，从而提高代码的可复用性、可维护性和可扩展性

核心概念

类（Class） ：类是对象的蓝图或原型，它定义了对象的属性和方法

对象（Object） ：对象是类的实例

方法（Method） ：方法是对象可以执行的操作，通常是函数

三大特性

封装

将数据（属性）和操作数据的方法绑定在一起，并通过访问控制隐藏内部实现细节，只暴露必要的接口。

class BankAccount:    def __init__(self, balance):        self.__balance = balance  # 私有属性，外部无法直接访问            # 存款方法    def deposit(self, amount):        self.__balance += amount              # 获取余额方法    def get_balance(self):        return self.__balance  bankAccount = BankAccount(200)print(bankAccount.get_balance())  # 输出: 200bankAccount.deposit(600)print(bankAccount.get_balance())  # 输出: 800

继承

子类继承父类的属性和方法，可重写或扩展父类功能，实现代码复用

# 父类class Animal:    def __init__(self, name):        self.name = name    def speak(self):        raise NotImplementedError("子类必须实现这个方法")# 子类class Dog(Animal):    def speak(self):        return f"{self.name} 汪汪叫"# 子类class Cat(Animal):    def speak(self):        return f"{self.name} 喵喵叫"# 使用示例dog = Dog("大黄")cat = Cat("小花")print(dog.speak())  # 输出: 大黄 汪汪叫print(cat.speak())  # 输出: 小花 喵喵叫

多态

不同类的对象可以对同一消息做出不同响应，通过继承和接口实现

class Animal:    def speak(self):        return "动物发出声音"class Dog(Animal):    def speak(self):        return "汪汪叫"class Cat(Animal):    def speak(self):        return "喵喵叫"# 多态示例：同一方法调用，不同子类有不同行为def make_animal_speak(animal):    print(animal.speak())dog = Dog()cat = Cat()make_animal_speak(dog)  # 输出：汪汪叫make_animal_speak(cat)  # 输出：喵喵叫

判断

基本 `if-else` 结构

x = 10if x > 5:    print("x 大于 5")  # 条件为真时执行else:    print("x 小于等于 5")

多条件判断：`elif`

score = 85if score >= 90:    print("优秀")elif score >= 80:    print("良好")  # 执行此分支elif score >= 60:    print("及格")else:    print("不及格")

嵌套判断

age = 20is_student = Trueif age >= 18:    if is_student:        print("成年人学生")  # 执行此分支    else:        print("成年人非学生")else:    print("未成年人")

逻辑运算符（`and`, `or`, `not`）

x = 5y = 10if x > 0 and y < 20:    print("两个条件都满足")  # 执行此分支if x > 10 or y > 5:    print("至少一个条件满足")  # 执行此分支if not x > 10:    print("x 不大于 10")  # 执行此分支

成员运算符（`in`, `not in`）

fruits = ["apple", "banana", "cherry"]if "apple" in fruits:    print("列表包含 apple")  # 执行此分支name = "Alice"if "li" in name:    print("名字包含 'li'")  # 执行此分支

三元表达式

x = 10result = "偶数" if x % 2 == 0 else "奇数"print(result)  # 输出：偶数

真值判断

Python 中以下值被视为 False，其他值均为 True：

False

None

0

0.0

[]

()

''

{}

name = ""if not name:  # 空字符串为 False    print("名字为空")values = []if values:  # 空列表为 False，此条件不执行    print("列表不为空")

模块

在 Python 中，模块（Module） 是组织代码的基本单元，用于将相关的函数、类和变量放在一起，提高代码的可复用性和可维护性。以下是关于 Python 模块的详细介绍：

一、模块的基本概念

1. 什么是模块？

模块

.py

包（Package）

__init__.py

2. 模块的作用

代码复用

命名空间隔离

组织架构

二、模块的导入方式

1. `import` 语句

导入整个模块，使用时需加模块名前缀。

import mathprint(math.pi)          # 输出: 3.141592653589793print(math.sqrt(16))    # 输出: 4.0

2. `from ... import ...`

导入模块中的特定对象，直接使用无需前缀。

from math import pi, sqrtprint(pi)               # 输出: 3.141592653589793print(sqrt(16))         # 输出: 4.0

3. `from ... import *`

导入模块中的所有对象（不推荐，可能导致命名冲突）。

from math import *print(sin(pi/2))        # 输出: 1.0

4. `as` 别名

为模块或对象指定别名，简化名称。

import numpy as npfrom pandas import DataFrame as DFarr = np.array([1, 2, 3])df = DF({'col1': [1, 2], 'col2': [3, 4]})

三、模块搜索路径

Python 解释器按以下顺序查找模块：

内置模块（如sys、math）

当前目录

环境变量PYTHONPATH指定的路径

Python 安装路径的标准库和第三方库目录

查看搜索路径：

import sysprint(sys.path)  # 输出模块搜索路径列表

四、自定义模块

创建一个模块很简单，只需将代码保存为.py文件。

示例：创建my_module.py

# my_module.pydef greet(name):    return f"Hello, {name}!"def add(a, b):    return a + b# 模块内测试代码if __name__ == "__main__":    print(greet("Alice"))  # 仅当直接运行此文件时执行

导入并使用：

import my_moduleprint(my_module.greet("Bob"))  # 输出: Hello, Bob!print(my_module.add(3, 4))     # 输出: 7

五、包（Package）的使用

包是一种层次化的模块组织方式，通过目录结构管理模块。

目录结构示例：

my_package/    __init__.py      # 必须存在，可为空文件    module1.py    module2.py    subpackage/      # 子包        __init__.py        submodule.py

导入包中的模块：

# 方式1：完整路径导入import my_package.module1my_package.module1.foo()# 方式2：部分导入from my_package import module2module2.bar()# 方式3：从子包导入from my_package.subpackage import submodulesubmodule.baz()

六、常用内置模块

Python 标准库提供了大量内置模块，例如：

os：操作系统接口（文件操作、路径管理）

sys：Python 解释器相关参数

math：数学函数

random：随机数生成

datetime：日期和时间处理

json：JSON 数据处理

requests：HTTP 请求（需单独安装）

示例：使用os模块操作文件

import os# 获取当前工作目录print(os.getcwd())# 列出目录内容print(os.listdir('.'))# 创建目录os.mkdir('new_dir')

七、模块的特殊属性

__name__ ：模块的名称，直接运行时为__main__，被导入时为模块名。

__doc__ ：模块的文档字符串（第一行注释）。

__file__ ：模块的文件路径。

示例：

# 查看模块属性import mathprint(math.__name__)   # 输出: mathprint(math.__doc__)    # 输出: 模块文档字符串print(math.__file__)   # 输出: 模块文件路径

八、第三方模块的安装与管理

使用pip工具安装第三方模块：

# 安装模块pip install requests# 查看已安装模块pip list# 更新模块pip install --upgrade requests# 删除模块pip uninstall requests

循环

在 Python 中，循环用于重复执行一段代码。Python 提供了两种主要的循环结构：for 循环和 while 循环，另外还有一些辅助的关键字如 break、continue 等。下面为你详细介绍：

1. `for` 循环

for 循环用于遍历可迭代对象（如列表、元组、字符串、字典等）中的元素。

基本语法：

for 变量 in 可迭代对象:    # 循环体代码    pass  # 替换为实际执行的代码

示例：遍历列表并打印每个元素

fruits = ["apple", "banana", "cherry"]for fruit in fruits:    print(fruit)

使用 range() 函数：如果需要遍历指定次数，可以使用 range() 函数。

for i in range(5):  # 生成 0 到 4 的整数    print(i)

2. `while` 循环

while 循环在条件为真时重复执行代码块。

基本语法：

while 条件:    # 循环体代码    pass  # 替换为实际执行的代码

示例：计算 1 到 10 的累加和

sum = 0i = 1while i <= 10:    sum += i    i += 1print(sum)  # 输出 55

3. 循环控制语句

break：用于跳出整个循环，不再执行后续的迭代。

for i in range(5):    if i == 3:        break    print(i)  # 输出 0, 1, 2

continue：用于跳过当前迭代，直接进入下一次迭代。

for i in range(5):    if i == 3:        continue    print(i)  # 输出 0, 1, 2, 4

else 子句：在循环正常结束（没有被 break 中断）时执行。

for i in range(3):    print(i)else:    print("循环结束")  # 会被执行

4. 嵌套循环

循环内部可以包含另一个循环，常用于处理多维数据结构。

示例：打印乘法表

for i in range(1, 10):    for j in range(1, i + 1):        print(f"{j}×{i}={i*j}", end="\t")    print()  # 换行

正则

在 Python 里，正则表达式（Regular Expression）是用于处理字符串的强大工具，它借助特定模式来匹配、查找、替换字符串。下面为你介绍 Python 正则表达式的核心功能和用法。

1. 正则表达式基础元字符

.

^

$

*

+

?

{n}

n

{n,}

n

{n,m}

n

m

[]

[a-z]

|

()

2. Python 中的 re 模块

Python 通过 re 模块提供了对正则表达式的支持，下面是一些常用函数：

（1）re.match()

此函数用于从字符串的起始位置开始匹配模式，如果匹配成功，就会返回一个匹配对象；否则返回 None。

import repattern = r'hello'text = 'hello world'match = re.match(pattern, text)if match:    print('匹配成功:', match.group())  # 输出: 匹配成功: helloelse:    print('匹配失败')

（2）re.search()

该函数会在整个字符串中查找第一个匹配的模式，若找到，就返回一个匹配对象；若未找到，则返回 None。

pattern = r'world'text = 'hello world'search = re.search(pattern, text)if search:    print('找到匹配:', search.group())  # 输出: 找到匹配: worldelse:    print('未找到匹配')

（3）re.findall()

此函数会在字符串中查找所有匹配的模式，并以列表的形式返回所有匹配的子字符串。

pattern = r'\d+'  # 匹配一个或多个数字text = 'I have 3 apples and 5 oranges'numbers = re.findall(pattern, text)print('找到的数字:', numbers)  # 输出: 找到的数字: ['3', '5']

（4）re.sub()

该函数用于替换字符串中匹配模式的部分，可以指定替换的次数。

pattern = r'apple'text = 'I like apple'new_text = re.sub(pattern, 'banana', text)print('替换后的字符串:', new_text)  # 输出: 替换后的字符串: I like banana

（5）re.split()

此函数依据匹配的模式对字符串进行分割，并返回分割后的列表。

pattern = r'\s+'  # 匹配一个或多个空白字符text = 'Hello   World'words = re.split(pattern, text)print('分割后的单词:', words)  # 输出: 分割后的单词: ['Hello', 'World']

3. 匹配对象的方法

当使用 match()、search() 等函数获得匹配对象后，可以利用以下方法获取更多信息：

group()

start()

end()

span()

pattern = r'(\d{3})-(\d{4})'text = 'My phone number is 123-4567'match = re.search(pattern, text)if match:    print('完整匹配:', match.group())  # 输出: 完整匹配: 123-4567    print('第一组:', match.group(1))    # 输出: 第一组: 123    print('第二组:', match.group(2))    # 输出: 第二组: 4567    print('开始位置:', match.start())   # 输出: 开始位置: 19    print('结束位置:', match.end())     # 输出: 结束位置: 28

4. 标志位

在使用正则表达式时，可以通过标志位来修改匹配的行为，常见的标志位有：

re.I

re.IGNORECASE

re.M

re.MULTILINE

^

$

re.S

re.DOTALL

.

pattern = r'hello'text = 'HELLO world'match = re.search(pattern, text, re.I)  # 使用 re.I 标志位忽略大小写if match:    print('匹配成功:', match.group())  # 输出: 匹配成功: HELLO

5. 贪婪匹配与非贪婪匹配

.*

?

.*?

text = '<html><body><h1>Hello</h1></body></html>'# 贪婪匹配greedy = re.search(r'<.*>', text)print('贪婪匹配:', greedy.group())  # 输出: 贪婪匹配: <html><body><h1>Hello</h1></body></html># 非贪婪匹配non_greedy = re.search(r'<.*?>', text)print('非贪婪匹配:', non_greedy.group())  # 输出: 非贪婪匹配: <html>

6. 常用正则表达式示例

r'[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+'

r'https?://\S+|www.\S+'

r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}'

r'1[3-9]\d{9}'

7. 编译正则表达式

当同一个正则表达式需要在多个地方使用时，可以先对其进行编译，这样能够提高匹配效率。

pattern = re.compile(r'\d+')  # 编译正则表达式text = 'I have 3 apples and 5 oranges'numbers = pattern.findall(text)print('找到的数字:', numbers)  # 输出: 找到的数字: ['3', '5']