【数据可视化】1991-2021年各国的失业数据集可视化分析

🧑 博主简介：曾任某智慧城市类企业算法总监，CSDN / 稀土掘金等平台人工智能领域优质创作者。

目前在美国市场的物流公司从事高级算法工程师一职，深耕人工智能领域，精通python数据挖掘、可视化、机器学习等，发表过AI相关的专利并多次在AI类比赛中获奖。

一、引言

失业率是衡量一个国家或地区经济健康状况的重要指标。高失业率通常反映出经济不景气、劳动力市场供需失衡等问题。本分析基于过去31年各国的失业率数据集，旨在通过可视化手段揭示失业率的变化趋势及其潜在影响因素。

二、数据集介绍

该数据集包含过去31年（1991-2021）各国的失业率数据，共33个特征，涵盖每年的失业率以及国家名称和代码：

Country Name：国家名称Country Code：国家代码1991-2021：每年的失业率（%）

三、技术工具

Python版本

代码编辑器

数据处理库

可视化库

四、导入数据与预处理

import pandas as pdimport matplotlib.pyplot as pltfrom collections import Counter# 导入数据df = pd.read_csv('unemployment_data.csv')# 查看数据大小print("数据大小:", df.shape)# 查看数据基本信息print("\n数据基本信息:")print(df.info())print("\n数据描述性统计:")print(df.describe())# 统计缺失值print("\n缺失值统计:")print(df.isnull().sum())# 统计重复值print("\n重复值数量:", df.duplicated().sum())

五、单变量分析

5.1 失业率分布

plt.figure(figsize=(12, 6))# 选择2021年的失业率数据进行分布分析plt.hist(df['2021'].dropna(), bins=30, color='skyblue', edgecolor='black')plt.title('2021年失业率分布', fontsize=14, fontweight='bold')plt.xlabel('失业率 (%)', fontsize=12)plt.ylabel('国家数量', fontsize=12)plt.grid(axis='y', linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

5.2 失业率随时间的变化趋势

plt.figure(figsize=(12, 6))years = df.columns[2:]mean_unemployment = df[years].mean()plt.plot(years, mean_unemployment, marker='o', color='blue')plt.title('全球平均失业率变化趋势（1991-2021）', fontsize=14, fontweight='bold')plt.xlabel('年份', fontsize=12)plt.ylabel('失业率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

六、多变量分析

6.1 不同国家失业率比较

plt.figure(figsize=(15, 8))# 选择2021年失业率最高的10个国家top_10_unemployment = df.sort_values(by='2021', ascending=False).head(10)plt.bar(top_10_unemployment['Country Name'], top_10_unemployment['2021'], color='lightgreen')plt.title('2021年失业率最高的10个国家', fontsize=14, fontweight='bold')plt.xlabel('国家', fontsize=12)plt.ylabel('失业率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

6.2 失业率与经济增长的关系

# 假设我们有一个经济增长率数据框（growth_df），结构与失业数据框类似# 这里使用示例数据代替实际数据import numpy as np# 生成示例经济增长率数据np.random.seed(42)growth_df = df.copy()growth_years = growth_df.columns[2:]for year in growth_years:    growth_df[year] = np.random.uniform(-5, 10, len(growth_df))plt.figure(figsize=(12, 6))correlation = df['2021'].corr(growth_df['2021'])plt.scatter(df['2021'], growth_df['2021'], alpha=0.5, color='purple')plt.title(f'2021年失业率与经济增长率的关系 (相关性: {correlation:.2f})', fontsize=14, fontweight='bold')plt.xlabel('失业率 (%)', fontsize=12)plt.ylabel('经济增长率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

6.3 不同时期失业率的比较

plt.figure(figsize=(12, 6))years_to_compare = ['1991', '2001', '2011', '2021']df_melted = pd.melt(df, id_vars=['Country Name', 'Country Code'], value_vars=years_to_compare, var_name='Year', value_name='Unemployment Rate')plt.boxplot([df_melted[df_melted['Year'] == year]['Unemployment Rate'].dropna() for year in years_to_compare], labels=years_to_compare)plt.title('不同年份失业率比较', fontsize=14, fontweight='bold')plt.xlabel('年份', fontsize=12)plt.ylabel('失业率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

6.4 失业率的区域差异

plt.figure(figsize=(12, 6))# 假设我们有一个区域数据框（region_df），包含国家和对应的区域信息# 这里使用示例数据代替实际数据region_df = pd.DataFrame({    'Country Name': df['Country Name'],    'Region': np.random.choice(['北美', '南美', '欧洲', '亚洲', '非洲', '大洋洲'], size=len(df))})df_with_region = pd.merge(df, region_df, on='Country Name')regions = df_with_region['Region'].unique()plt.boxplot([df_with_region[df_with_region['Region'] == region]['2021'].dropna() for region in regions], labels=regions)plt.title('不同区域失业率比较', fontsize=14, fontweight='bold')plt.xlabel('区域', fontsize=12)plt.ylabel('失业率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

6.5 失业率的时间序列分析（单个国家）

plt.figure(figsize=(12, 6))# 选择一个国家（例如，美国）进行时间序列分析usa_data = df[df['Country Name'] == 'United States']years = df.columns[2:]plt.plot(years, usa_data.iloc[0, 2:], marker='o', color='blue')plt.title('美国失业率变化趋势（1991-2021）', fontsize=14, fontweight='bold')plt.xlabel('年份', fontsize=12)plt.ylabel('失业率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

6.6 失业率与人口规模的关系

# 假设我们有一个人口数据框（population_df），结构与失业数据框类似# 这里使用示例数据代替实际数据population_df = df.copy()for year in years:    population_df[year] = np.random.uniform(1e6, 1e9, len(population_df))plt.figure(figsize=(12, 6))plt.scatter(population_df['2021'], df['2021'], alpha=0.5, color='orange')plt.title('2021年失业率与人口规模的关系', fontsize=14, fontweight='bold')plt.xlabel('人口规模', fontsize=12)plt.ylabel('失业率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

6.7 失业率与教育水平的关系

# 假设我们有一个教育水平数据框（education_df），结构与失业数据框类似# 这里使用示例数据代替实际数据education_df = df.copy()for year in years:    education_df[year] = np.random.uniform(0, 100, len(education_df))plt.figure(figsize=(12, 6))plt.scatter(education_df['2021'], df['2021'], alpha=0.5, color='gray')plt.title('2021年失业率与教育水平的关系', fontsize=14, fontweight='bold')plt.xlabel('教育水平 (%)', fontsize=12)plt.ylabel('失业率 (%)', fontsize=12)plt.grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

6.8 失业率变化的热力图

plt.figure(figsize=(12, 8))correlation_matrix = df[years].corr()plt.imshow(correlation_matrix, cmap='coolwarm', interpolation='nearest')plt.colorbar()plt.title('失业率年份间相关性热力图', fontsize=14, fontweight='bold')plt.xlabel('年份', fontsize=12)plt.ylabel('年份', fontsize=12)plt.xticks(range(len(years)), years, rotation=90)plt.yticks(range(len(years)), years)plt.tight_layout()plt.show()

6.9 失业率的词云图

from wordcloud import WordCloudfrom collections import Counterplt.figure(figsize=(10, 5))unemployment_text = ' '.join(df['Country Name'].unique().tolist())wordcloud = WordCloud(width=800, height=400, background_color='white').generate(unemployment_text)plt.imshow(wordcloud, interpolation='bilinear')plt.title('失业率涉及国家词云图', fontsize=14, fontweight='bold')plt.axis('off')plt.tight_layout()plt.show()

6.10 多维度组合分析

fig, axes = plt.subplots(2, 2, figsize=(18, 12))# 失业率分布axes[0, 0].hist(df['2021'].dropna(), bins=30, color='skyblue', edgecolor='black')axes[0, 0].set_title('2021年失业率分布', fontsize=12, fontweight='bold')axes[0, 0].set_xlabel('失业率 (%)', fontsize=10)axes[0, 0].set_ylabel('国家数量', fontsize=10)axes[0, 0].grid(linestyle='--', alpha=0.7)# 失业率变化趋势axes[0, 1].plot(years, mean_unemployment, marker='o', color='blue')axes[0, 1].set_title('全球平均失业率变化趋势', fontsize=12, fontweight='bold')axes[0, 1].set_xlabel('年份', fontsize=10)axes[0, 1].set_ylabel('失业率 (%)', fontsize=10)axes[0, 1].grid(linestyle='--', alpha=0.7)# 不同国家失业率比较axes[1, 0].bar(top_10_unemployment['Country Name'], top_10_unemployment['2021'], color='lightgreen')axes[1, 0].set_title('2021年失业率最高的10个国家', fontsize=12, fontweight='bold')axes[1, 0].set_xlabel('国家', fontsize=10)axes[1, 0].set_ylabel('失业率 (%)', fontsize=10)axes[1, 0].grid(linestyle='--', alpha=0.7)# 失业率与经济增长的关系axes[1, 1].scatter(df['2021'], growth_df['2021'], alpha=0.5, color='purple')axes[1, 1].set_title('失业率与经济增长率的关系', fontsize=12, fontweight='bold')axes[1, 1].set_xlabel('失业率 (%)', fontsize=10)axes[1, 1].set_ylabel('经济增长率 (%)', fontsize=10)axes[1, 1].grid(linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

七、总结

通过对失业数据集的可视化分析，我们得出了以下关键洞察：

失业率分布

失业率变化趋势

区域差异

单个国家分析

相关性分析

这些发现为政策制定者提供了宝贵的参考，帮助他们更好地理解失业问题的复杂性，并制定针对性的就业促进政策。希望通过数据的深入分析，能够在全球范围内推动更充分和更高质量的就业。

如果您在人工智能领域遇到技术难题，或是需要专业支持，无论是技术咨询、项目开发还是个性化解决方案，我都可以为您提供专业服务，如有需要可站内私信或添加下方VX名片（ID：xf982831907）

期待与您一起交流，共同探索AI的更多可能！

<--微信名片-->