cs.AI updates on arXiv.org 07月25日 12:28
MeAJOR Corpus: A Multi-Source Dataset for Phishing Email Detection
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文提出MeAJOR数据集,包含135894个钓鱼邮件样本,有效提升钓鱼邮件检测的准确率,达到98.34% F1,为网络安全研究提供有力支持。

arXiv:2507.17978v1 Announce Type: cross Abstract: Phishing emails continue to pose a significant threat to cybersecurity by exploiting human vulnerabilities through deceptive content and malicious payloads. While Machine Learning (ML) models are effective at detecting phishing threats, their performance largely relies on the quality and diversity of the training data. This paper presents MeAJOR (Merged email Assets from Joint Open-source Repositories) Corpus, a novel, multi-source phishing email dataset designed to overcome critical limitations in existing resources. It integrates 135894 samples representing a broad number of phishing tactics and legitimate emails, with a wide spectrum of engineered features. We evaluated the dataset's utility for phishing detection research through systematic experiments with four classification models (RF, XGB, MLP, and CNN) across multiple feature configurations. Results highlight the dataset's effectiveness, achieving 98.34% F1 with XGB. By integrating broad features from multiple categories, our dataset provides a reusable and consistent resource, while addressing common challenges like class imbalance, generalisability and reproducibility.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

钓鱼邮件 数据集 网络安全 机器学习 检测效果
相关文章