ΑΙhub 04月07日
#AAAI2025 invited talk round-up 1: labour economics, and reasoning about spatial information
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

AAAI 2025会议在费城举行,聚焦人工智能领域的最新进展。会议邀请了多位专家,探讨了使用基础模型预测职业转变、评估工资差异,以及大型语言模型在空间信息推理方面的能力。Susan Athey的研究展示了如何利用Transformer模型分析劳动力序列数据,而Anthony Cohn则评估了LLMs在常识性空间推理任务中的表现。会议内容丰富,反映了AI在经济学、空间认知等多个领域的应用与挑战。

💡 Susan Athey的研究集中在使用Transformer模型预测职业转变和评估工资差异。她介绍了CAREER模型,这是一个基于简历数据训练的预测模型,用于预测工人的下一个工作。随后,她使用LLAMA语言模型开发了LABOR-LLM,并通过微调LLM来预测职业,发现其准确性甚至超过了基于简历的模型。

🧭 Anthony Cohn的演讲探讨了大型语言模型(LLMs)在空间信息推理方面的能力。他展示了LLMs在解决常识性空间推理问题上的局限性,例如关于物体相对位置的推理。他通过测试LLMs对方向的理解,发现LLMs在需要事实回忆的任务中表现更好,而在需要空间推理的任务中表现较差。

🗺️ Cohn还讨论了多模态模型在生成图像方面的表现。他指出,虽然这些模型可以生成引人注目的图像,但在生成准确的地图、空间关系图和空间配置图像时,常常出现错误,例如地图标注错误和空间位置的混乱。

Yasmine Boudiaf & LOTI / Data Processing / Licenced by CC-BY 4.0

The 39th Annual AAAI Conference on Artificial Intelligence (AAAI 2025) took place in Philadelphia from Tuesday 25 February to Tuesday 4 March 2025. The programme featured eight invited talks. In this post, we give a flavour of two of those talks, namely:

Predicting Career Transitions and Estimating Wage Disparities Using Foundation Models

Susan Athey

Susan works at the intersection of computer science and economics. In the past she has researched problems relating to mechanism design, auctions, pricing, and causal inference, but recently she has turned her attention to modelling worker career transitions using transformer models. In her talk, Susan described the research in a few of her recent papers covering topics such as the gender wage gap and economic prediction of labour sequence data.

Labour economics is a highly empirical field, using data together with models to answer questions. Some popular questions that people have been working on for decades include the wage gap (based on a particular axis, such as gender, race, education level) conditional on career history, and the effects of job training programmes on productivity. Susan noted that the typical method for answering these questions in the past has been linear regression, therefore such problems were ripe for investigating with a new methodology. One motivating question for her research is whether foundation models can improve empirical economics. Other research aspects focus on the impact of both fine-tuning these models and on tailoring them specifically for economics objectives.

Screenshot from Susan’s talk, showing some of the papers that she covered during the plenary.

One of the projects that Susan talked about was predicting a worker’s next occupation. In 2024, Susan and colleagues published work entitled CAREER, A Foundation Model for Labor Sequence Data, in which they introduced a transformer-based predictive model that predicts a worker’s next job as a function of career history. This is a bespoke model, trained on resume data (24 million job sequences) and then fine-tuned on smaller, curated datasets.

The next step in this research was to replace the resume-based model with a large language model. This new model, called LABOR-LLM, was presented in this paper: LABOR-LLM: Language-Based Occupational Representations with Large Language Models. LABOR-LLM was trained on three datasets (which you can see in the image below), and used the language model LLAMA. The team tested three methodologies, 1) applying an embedding function derived from a LLM to generate latent vectors, 2) using LLAMA off-the-shelf to predict text which should be an occupation, 3) fine-tuning the LLM to predict text which should be an occupation. While the off-the shelf version was not particularly successful, Susan revealed that the fine-tuning method was actually more accurate in predicting next jobs than the bespoke resume-based model (CAREER) that the team invested had so much time in. However, this does mean that such approaches, based on fine-tuning publicly available LLMs, could also be useful in other settings.

Screenshot from Susan’s talk giving an overview of the LABOR-LLM model.

Can Large Language Models Reason about Spatial Information?

Anthony Cohn

Tony has been researching spatial information for much of his career and, with the advent of large language models (LLMs), turned his attention to investigating the extent to which these models can reason about such information. One particular area of focus in Tony’s research has been qualitative spatial reasoning. This is ubiquitous in natural language, and is something we use frequently in everyday speech, for example “they are sitting on the chair”, “the person is in the room”, and “I’m standing on the stage”.

During his talk, which was particularly timely given the release of GPT-4.5 just the day before, Tony showed some examples from testing a range of LLMs with “commonsense” scenarios. You can see one example in the screenshot below. In this case, the query is asking “The book couldn’t stand upright in the bookcase because it was too small. What does “it” refer to?” Tony highlighted the parts of the reasoning given by the LLM (in this case GPT-4) that are incorrect. In further examples he showed that there are many instances where the responses given by the models are not consistent with commonsense, highlighting that there is still much improvement to be made to LLMs regarding this type of problem.

Screenshot from Tony’s talk showing LLM response to a spatial reasoning question.

Another example that Tony gave pertained to reasoning about cardinal directions. This work was published in 2024 and entitled Evaluating the Ability of Large Language Models to Reason About Cardinal Directions. Tony and colleagues tested various scenarios in which the LLM had to work out the correct cardinal direction. In the simpler tests, with questions such as “You are watching the sun set. Which direction are you facing?”, the accuracy was greater than 80% for all LLMs tested. However, for the more complicated scenarios, such as “You are walking south along the east shore of a lake and then turn around to head back in the direction you came from, in which direction is the lake?”, the performance was much worse, with accuracies for the different LLMs ranging from 25 – 60%. Tony concluded that LLMs perform much better at the scenarios that require factual recall rather than spatial reasoning.

To end his talk, Tony touched on multimodal model testing, whereby you ask a generative model to create images. He explained that, although such models can produce very flashy pictures, they don’t perform well when you ask for outputs such as accurate maps, spatial relations in diagrams, and image generation of spatial configurations. You can see examples of such inaccuracies below in one of the slides from Tony’s talk. The maps include numerous errors, such as labelling France as Spain, and sticking the Bay of Biscay in the North Sea.

Screenshot from Tony’s talk showing inaccuracies in multimodal generative models.

Tony concluded by saying that spatial reasoning is core to commonsense understanding of the world and questioned whether this could be achieved without both embodiment and use of symbolic reasoning.


You can read our coverage of AAAI 2025 here.

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

AAAI 2025 人工智能 大型语言模型 职业转变 空间推理
相关文章