This paper explores the use of Large Language Models (LLMs) to address challenges in Black Box Optimization (BBO), particularly multi-modality and task generalization. The authors propose framing BBO around sequence-based foundation models, leveraging LLMs’ capabilities to retrieve information from various modalities resulting in superior optimization strategies.Motivation: Traditional BBO techniques struggle with multi-modality and task generalizationThe position paper by [Son24P] advocates using LLM-basedfoundation models for Black Box Optimization (BBO). The goal of BBO is tooptimize an objective function given only the evaluations of the function (i.e.,no gradients or other second-order information about the function). A commonexample of a BBO task is neural network architecture search, where the objectiveis to maximize classification accuracy based on different architectures.Classical BBO approaches include grid search, random search, and Bayesianoptimization.Figure 1. [Son24P], Figure 1. FoundationModels can learn priors from a wide variety of sources, such as world knowledge,domain-specific documents, and actual experimental evaluations. Such models canthen perform black-box optimization over various search spaces (e.g.hyperparameters, code, natural language) and feedbacks (numeric values,categorical ratings, and subjective sentiment).More recent BBO algorithms typically try to incorporate inductive biases orpriors into the search problem, e.g., domain knowledge, parameter constraints,the search history etc. One particular goal of these approaches is to performmeta-learning, i.e., to develop algorithms that can automatically provide priorsfor various tasks from different domains without additional task-specifictraining. However, constructing reliable priors that work across multiple tasksand can take in data from multiple modalities (values, text, images) ischallenging.Position: LLMs can process multi-model data and be fine-tuned to different tasksFigure 2. [Son24P], Figure2. Black-box optimization loop with sequential foundation models. Using metadata$m$ and history $h$, the model proposes candidates $x$ which are checked forfeasibility, evaluated, and then appended to the history.The central point made in this paper is that LLMs are a promising candidate fortackling this challenge (Figure 1). The key idea is to interpret BBO asa problem of learning a sequence: Given a search space $\mathcal{X}$ containinghyperparameter setting $x \in \mathcal{X}$ and a sequence or history $h{1:t-1}$of previous settings $x{1:t-1}$ and corresponding objective function values$y_{1:t-1}$, the goal is to predict the next element in the sequence, i.e., anew hyperparameter setting $x_t$.Transformer-based LLMs exceed at sequence learning and meet several criticalrequirements for foundational BBO:Multi-modality: They can process large amounts of data from various modalities.Pre-training: They can be pre-trained to acquire extensive world knowledge.Fine-tuning: They can be fine-tuned with task-specific information.The workflow for using LLM-based foundation models for BBO is visualized inFigure 2.The authors also give an overview of common techniques for BBO, summarizingthe increasing capabilities as we move from hand-crafted genetic algorithms,model-based BBO and feature-based meta-learning to sequence-based,attention-based, token-based and finally LLM-based algorithms (table1, paper section 3.2).Table 1. [Son24P], Table2. Classes of methods organized by their capabilities. Note: Method names basedon increasing development order - e.g. “Attention-based” can consist oftechniques up to their development such as meta-learning, but not LLMs.Finally, the authors collect a set of challenges and open questions for BBOwith LLMs (paper section 4). They argue that there is a need forbetter data representation and multimodality datasets for training modelson multi-modal tasksa common guideline or format for encoding BBO (meta)-data to be processed byLLMslarge open-source evaluation datasetsbetter generalization and customization of LLMs for different tasksnew benchmarks for metadata-rich BBO to better test the capabilities ofLLMsThis paper is an interesting read and provides a comprehensive overview of thelimitations of classical BBO methods and the possibilities of Large LanguageModels for Black Box Optimization.