Putting Humans Continually in the AI Loop

Putting a human in the loop often refers to an artificial intelligence (AI), such as a target acquirer, that requires human authorization to proceed (as in, pull the trigger). However, in learning contexts such as acquiring a new skill, having a human in the loop refers to an optimization process through which a steep learning curve for the first newbie is progressively flattened for each succeeding-generation newbie by incorporating the lessons learned in each generation into improved teaching methods. Thus, the learning time is progressively shortened, until a fully optimized procedure is achieved.

Called human-in-the-loop optimization by expert Per Ola Kristensson, a professor of Interactive Systems Engineering at the U.K.’s University of Cambridge, and shortened to the acronym HiLO by Christian Holz, deputy head of the Institute of Intelligent Interactive Systems at Switzerland’s ETH Zurich, and colleagues in the paper “Continual Human-in-the-Loop Optimization,” presented at the Conference on Human Factors in Computing Systems (CHI 2025, April 26-May 1, Yokohama, Japan)—HiLO, when done well, can produce AIs that progressively shorten the time it takes humans to learn new skills.

Human-in-the-loop optimization, as Kristensson explained, was an improvement over “using traditional optimization, when the user sets objectives and constraints, runs an optimizer, and inspects the results. In human-in-the-loop optimization, the user is provided incremental feedback and given various opportunities to steer the optimization process. This has two potential advantages. First, the user can use their domain expertise of the task to more accurately steer the optimization process. Second, the user can learn important qualities about the task by observing the optimizer’s exploration.”

Kristensson added, “Further, in human-in-the-loop Bayesian optimization, the user does not have to prescribe objectives, which non-optimization experts find difficult to do. Our paper concluded that novice users found it easier to explore the optimization space using human-in-the-loop Bayesian optimization, but this benefit is at the cost of reduced agency—namely, that the users feel they are not fully in control of the optimization process.”

However, without human-in-the-loop optimization of specific design parameters, each new user must start from scratch when learning a new skill, according to mathematician Tianyi Bai et al., who suggested that each new user “starting from scratch” greatly lengthens the time required to optimize user preferences (often called calibration when done without HiLO), consequently taking 60 minutes or more just to learn how to optimally select an occluded three-dimensional object in virtual reality, augmented reality, and their mixture (mixed reality—MR). According to Holz, with continuous human-in-the-loop optimization (CHiLO), the learning curve for even the most tedious MR tasks can be flattened.

“Our optimization approach continually leverages experiences from previous users, which substantially reduces the time required for personalized adaptations. This means that future users can quickly achieve optimal performance without the typically needed calibration steps,” said Holz. “We think that our approach will benefit personalized input systems in Mixed Reality to continuously improve as more people learn them.”

According to Holz et al., CHiLO’s objective is to minimize the number of suboptimal trials while still finding an optimal solution for each user. It does so by modeling population-level user characteristics, continually integrating prior user data preferences, and thus increasingly shortening the learning time for new users (with only the first newbie having to start from scratch). This formulation substantially reduces the initial exploration time of typical optimization methods, but preserves the capability of efficiently tailoring interactions to each user’s unique motor abilities and preferences, according to Holz. In more detail, new users of CHiLO interact with the choices made by previous users to quickly ascertain which preferences might work best for them. By leveraging both the new user’s own observations with the knowledge gathered from previous users, the extra time required to try out known unsuitable configurations is minimized.

“In the era of rapid proliferation of human-aware AI systems, it is imperative that these systems can continuously adapt to the changes in human preferences, domain knowledge, and task requirements. As humans evolve, so should these systems, and continual human-in-the-loop learning and refinement are essential,” said Sriraam Natarajan, a professor and director of the Center for Machine Learning at the Erik Jonsson School of Engineering and Computer Science at the University of Texas at Dallas.

Holz et al. concluded CHiLO’s methodology has four major challenges to overcome before it (potentially) becomes a de facto standard for quicker learning of detailed skills. First is scalability; so far, CHiLO has proven to be efficiently scalable as datasets increase. However, to prove itself over the long term will require that it efficiently manage not only larger datasets, but larger skill sets than those with which it has so far been tested. The perceived future scalability problem must counter the increasing computational complexity of larger data sets and skill sets, which has so far been handled by cranking up the use of approximations to cancel out increasing computational complexity. For the long term, new methods are needed that do not significantly increase the rate of approximations.

The second major challenge, according to Holz et al., is that catastrophic forgetting needs to be headed off by knowledge retention methods that guarantee adapting to new tasks does not affect the retention of existing task mastery.

The third major challenge is establishing a mechanism that allows the optimizer to adapt to the more-specific needs of (some) new users—called plasticity—while maintaining the stability of its already perfected optimization methods for established general-purpose tasks.

The fourth major challenge for the CHiLO methodology is the ability to maintain its robust mastery of the skill space it conquers, even when portions of the solution space are underexplored (or, conversely, over-explored).

Finally, the CHiLO methodology needs to be tested and leveled for balance among the four challenges. For instance, scalability needs to be achieved for increasingly larger user bases without affecting stability. Likewise, a switch in optimizing for the most tedious user tasks needs to guard against catastrophic forgetting of how to handle the original set of general-purpose user tasks already optimized for early users.

The first solution discovered by Holtz et al. that met the four challenges, albeit so far tested for only a single optimization task (mid-air typing in virtual reality), was the use of a Continual Bayesian neural network (BNN) surrogate as the optimizer (ConBo). Using data synthesized from existing user models, the BNN showed significant performance improvement while meeting the four challenges (for a limited-sized user group mastering a single task). This BNN is called a surrogate, because it confronts increasing computational complexity—as new users are added—by appropriately approximating. The BNN surrogate, however, needs a more thorough testing period to make sure it can handle an increasingly number of users without catastrophic forgetting when confronted with multiple tasks, wider user spaces, and the more specific needs of (some) new users.

ConBo also makes use of a Gaussian Process (GP) adapted to each specific user, while combining transfer learning (pre-training of models to enhance efficiency) and meta-learning (that is, “learning how to learn” new tasks by leveraging accumulated experience from similar tasks).

The ConBo optimizer also minimizes the need for random exploration of the solution space, while preventing catastrophic forgetting by replaying early examples periodically. It also combines generative data synthesis with continual optimization to maximize scalability and robustness.

Future alternatives to the current ConBo solution to the CHiLO methodology will include multiple task applications of ConBo to ascertain how big a task-space to which it can be fruitfully applied. Other improvements to Bayesian optimization include combining low-fidelity evaluations to quickly discard suboptimal designs, combined with high-fidelity evaluations to refine promising candidates. Also, more sophisticated approaches using completely different optimizations methods will also be explored, from gradient-based optimizations with multiple restarts to Lipschitzian optimizations that speed the process with strictly enforced boundaries.

R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签