MarkTechPost@AI 2024年11月19日
LogLLM: Leveraging Large Language Models for Enhanced Log-Based Anomaly Detection
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

LogLLM 是一种基于日志的异常检测框架,它利用大型语言模型(LLM)如 BERT 和 Llama 来提高软件系统可靠性。传统方法在处理自然语言日志数据时难以捕捉语义细节,而 LogLLM 通过正则表达式预处理日志,并结合 BERT 提取语义向量、Llama 分类日志序列,以及投影器对齐向量空间来解决这一问题。它采用三阶段训练过程,包括数据预处理、模型架构和训练,并在四个公共数据集上取得了优异的性能,优于现有方法,尤其是在处理不稳定日志和不断变化的日志模板方面表现出色。

🤔**LogLLM 框架利用 BERT 和 Llama 等大型语言模型进行日志异常检测**: BERT 用于提取日志消息的语义向量,Llama 用于对日志序列进行分类,投影器则用于对齐 BERT 和 Llama 的向量空间,确保语义一致性。

🚀**LogLLM 使用正则表达式预处理日志**: 与传统方法需要日志解析器不同,LogLLM 使用正则表达式替换动态参数,简化了模型训练过程,提高了效率。

🔄**LogLLM 采用三阶段训练过程**: 包括过采样少数类解决数据不平衡问题、微调 Llama 用于答案模板、训练 BERT 和投影器用于日志嵌入,以及最终微调整个模型,这使得模型具有更好的性能和适应性。

📊**LogLLM 在四个公共数据集上表现优异**: 与 DeepLog、LogAnomaly、PLELog 和 RAPID 等方法相比,LogLLM 的 F1 分数平均高出 6.6%,证明了其在异常检测方面的有效性和优越性。

💡**LogLLM 强调使用标记异常进行训练**: 实验结果表明,使用标记的异常进行训练对于提高异常检测的准确性和召回率至关重要。

Log-based anomaly detection has become essential for improving software system reliability by identifying issues from log data. However, traditional deep learning methods often struggle to interpret the semantic details in log data, typically in natural language. LLMs, like GPT-4 and Llama 3, have shown promise in handling such tasks due to their advanced language comprehension. Current LLM-based methods for anomaly detection include prompt engineering, which uses LLMs in zero/few-shot setups, and fine-tuning, which adapts models to specific datasets. Despite their advantages, these methods face challenges in customizing detection accuracy and managing memory efficiency.

The study reviews approaches to log-based anomaly detection, focusing on deep learning methods, especially those using pretrained LLMs. Traditional techniques include reconstruction-based methods (such as autoencoders and GANs), which rely on training models to reconstruct normal log sequences and detect anomalies based on reconstruction errors. Binary classification methods, typically supervised, detect anomalies by classifying log sequences as normal or abnormal. LLMs, including BERT and GPT-based models, are employed in two primary strategies: prompt engineering, which utilizes the internal knowledge of LLMs, and fine-tuning, which customizes models for specific datasets to improve anomaly detection performance.

Researchers from SJTU, Shanghai, developed LogLLM, a log-based anomaly detection framework utilizing LLMs. Unlike traditional methods that require log parsers, LogLLM preprocesses logs with regular expressions. It leverages BERT to extract semantic vectors and uses Llama, a transformer decoder, for log sequence classification. A projector aligns the vector spaces of BERT and Llama to maintain semantic coherence. LogLLM’s innovative three-stage training process enhances its performance and adaptability. Experiments across four public datasets show that LogLLM outperforms existing methods, accurately detecting anomalies, even in unstable logs with evolving templates.

The LogLLM anomaly detection framework uses a three-step approach: preprocessing, model architecture, and training. Logs are first preprocessed using regular expressions to replace dynamic parameters with a constant token, simplifying model training. The model architecture combines BERT for extracting semantic vectors, a projector for aligning vector spaces, and Llama for classifying log sequences. The training process includes oversampling the minority class to address data imbalance, fine-tuning Llama for answer templates, training BERT and the projector for log embeddings, and finally, fine-tuning the entire model. QLoRA is used for efficient fine-tuning, minimizing memory usage while preserving performance.

The study evaluates LogLLM’s performance using four real-world datasets: HDFS, BGL, Liberty, and Thunderbird. LogLLM is compared with several semi-supervised, supervised, and non-deep learning methods, including DeepLog, LogAnomaly, PLELog, and RAPID. The evaluation uses metrics such as Precision, Recall, and F1-score. Results show LogLLM achieves superior performance across all datasets, with an average F1-score 6.6% higher than the best alternative, NeuralLog. The method efficiently balances precision and recall, outperforms others in anomaly detection, and demonstrates the importance of using labeled anomalies for training.

In conclusion, the study introduces LogLLM, a log-based anomaly detection framework that utilizes LLMs like BERT and Llama. BERT extracts semantic vectors from log messages, while Llama classifies log sequences. A projector is used to align the vector spaces of BERT and Llama for semantic consistency. Unlike traditional methods, LogLLM preprocesses logs with regular expressions, eliminating the need for log parsers. The framework is trained using a novel three-stage procedure to improve performance and adaptability. Experimental results on four public datasets show LogLLM outperforms existing methods, effectively detecting anomalies even in unstable log data.


Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter.. Don’t Forget to join our 55k+ ML SubReddit.

[FREE AI WEBINAR] Implementing Intelligent Document Processing with GenAI in Financial Services and Real Estate TransactionsFrom Framework to Production

The post LogLLM: Leveraging Large Language Models for Enhanced Log-Based Anomaly Detection appeared first on MarkTechPost.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

日志异常检测 大型语言模型 BERT Llama LogLLM
相关文章