探索大型语言模式在公共交通中的潜力：圣安东尼奥案例研究（英）R-Jonnala等-2025-1-7

manyouzhe

1个月前更新

0.30MB10页0449

第1页 / 共10页

第2页 / 共10页

第3页 / 共10页

第4页 / 共10页

第5页 / 共10页

该文档为免费文档，您可直接下载完整版进行阅读

文章版权归作者所有，未经允许请勿转载。

THE END

智慧交通

文本预览

This work is accepted to AAAI 2025 Workshop on AI for Urban Planning.Table 1:GTFS Understanding Benchmarking dataset questionnaire and their categoriesQuestion TypeNumber of QuestionsTerm Definitions14Common Reasoning28File Structure17Attribute Mapping3迈Data Structure3Categorical Mapping74Total19535 361.The performance of these models is inherently contingent upon the quality and quantity of their training data.Consequently,erroneous LLM outputs can arise from various factors,including insufficient training data on specifictopics or architectural limitations in processing user input,such as inadequate embedding methods.To effectivelyevaluate learning-based LLMs,it is imperative to distinguish between pre-trained models and underlying architectures.This project aims to evaluate LLMs'capacity to comprehend GTFS and other public transportation information throughtwo experimental approaches:1.Performance of Pre-trained Models:We will assess the ability of a pre-trained LLM "as-is"to answertransportation-related questions without additional context.This evaluation will determine the model's inherentunderstanding of GTFS and public transit information.Errors in this phase may indicate limitations in themodel's pre-training data or architectural constraints related to processing transportation-specific queries.Werefer to this as the"understanding"task.2.Impact of LLM Architecture:To isolate the impact of LLM architecture,we will provide LLMs withexplicit GTFS data and public transportation information before posing transportation-related questions.Byquestion-answer tests that involve finding answers from the provided GTFS data,we can determine whether theinitial failures were due to information deficits or architectural limitations.We refer to this as the"informationretrieval"(IR)task.The findings from these tasks will offer valuable insights into the cause of errors.For instance,if the LLMs can answerthe questions correctly in the second experiment but not the first,it suggests insufficient pre-training data on the specifictopic within the models.Conversely,the results might indicate that even with adequate data,the LLM models strugglewith the questions,potentially due to architectural limitations.Approaches and Experiment DesignIn this project,we employ OpenAI's ChatGPT as the representative LLM due to its widespread public availabilitythrough both a web portal and a programmatic API.To assess LLM capabilities in understanding and retrievingtransportation information(Goals 1 and 2,respectively,as outlined in Section we conducted five experimentscomprising 3275 multiple-choice questions and 80 short-answer questions based on San Antonio's public transportationsystem.5.1 Experiments for Transportation UnderstandingThe transportation information understanding task,referred to as understanding,evaluates a pre-trained LLM's abilityto comprehend and answer questions about San Antonio's public transportation system.Following 31,we designed 195 multiple-choic

喜欢就支持一下吧

请登录后发表评论

登录注册

暂无评论内容