Agentic AI has become the center of gravity in AI transformation initiatives despite being complex and challenging to implement. Effective AI transformation requires fundamental changes to the organizational structure and processes. This needs long-term strategic planning and collaborative effort, but companies are under pressure to show results from Generative AI to keep the momentum sustained. A survey by Deloitte found that 55% of senior leaders want returns from Generative AI (GenAI) within two years and 34% are concerned that not achieving expected value from GenAI could slow down future adoption.¹ A tactical approach that produces early results while progressing towards AI transformation is the current need. This post explores how Structured Outputs, a functionality in large language models (LLMs), can help.

Data quality and seamless integration between IT systems and GenAI platforms are essential when scaling IT infrastructure for AI transformation. LLMs are efficient at interpreting natural language texts and producing natural language outputs because they are pretrained for conversation. But the freeform output impedes their integration with enterprise IT systems, as IT systems expect structured data as input. Structured Outputs enables LLMs to generate output that strictly conforms to the schema specified by the developer. This would improve the integration between LLMs and existing IT systems.

Use Cases

The primary use of Structured Outputs is enabling GenAI’s integration with other IT systems, but there are different facets to this integration. I have listed five illustrative use cases for Structured Outputs below.

Insights from Unstructured Data

Surveys estimate that more than 80% of enterprise data are unstructured.² The major sources of unstructured data are text and images from emails, websites, applications, social media feeds, logs, contract documents, and in recent times, conversational agents. Organizations struggle to process unstructured data even as they hold increasingly more value in them. Valuable data points on customer sentiment, operational efficiency, and risk and compliance issues lie hidden in plain sight within unstructured data.

Natural Language Processing (NLP) tools are the traditional choice for performing analysis on unstructured data, but they have limited capabilities. LLMs are more versatile in handling unstructured data, as they are trained on a large amount of unstructured Internet content. LLMs can extract key information from unstructured data and produce structured data adhering to schema specifications, using Structured Outputs. Extract, Transform, Load (ETL) tools can then load the structured data into databases for further analysis. Organizations can leverage their existing investments in business intelligence tools to analyze the structured data. This would improve the visibility of Key Performance Indicators and facilitate efficient risk management. According to a Gartner report, Generative AI would lead to 40% faster delivery of value for Data and Analytics programs by 2027.³

Feeding the AI Data Pipeline

Data engineering tools have started to integrate LLM Structured Outputs, simplifying the extraction of features and aggregation of values from unstructured data. MotherDuck, a serverless data warehousing platform, has introduced a SQL-based interface to LLMs. This functionality allows the querying, analysis, and transformation of unstructured data from within SQL using Structured Outputs.⁴ This example shows that Structured Outputs can enable LLMs to become an integral part of the data pipelines used to train AI models.

The presence of personal and sensitive information in the training data used for AI has been a concern for a long time. LLMs can mitigate this by generating synthetic data in ready-to-load form using Structured Outputs. Moreover, Structured Outputs can enrich unstructured documents with structured meta-data, improving retrieval accuracy in GenAI applications.⁵

Improving Reliability and Performance of RAG

Retrieval Augmented Generation (RAG) is a well-known technique that enables LLMs to generate content that is contextually relevant to the user’s query. RAG uses vector search to retrieve content related to the users’ query and feeds the content as context to an LLM, along with the user’s original query. The LLM responds to the query using the context, even if the query falls outside of the LLM’s training data distribution. This is supposed to reduce hallucination and make LLM responses grounded on facts. However, in practice RAG is vulnerable to hallucinations when LLMs deal with the task of comparing and summarizing data from multiple unstructured documents.

Effective techniques to mitigate hallucination in RAG often involve the transformation of unstructured data into structured data at the retrieval stage. For example, StructRAG is a RAG technique that aims to improve knowledge-intensive reasoning by building the context in various structured formats such as graphs and tables.⁶ Structured Outputs are a natural fit for building such context structures in RAG. Apart from this, Mistral has used Structured Outputs in its ‘LLM as a judge’ framework to evaluate the performance of RAG.⁷

Realtime Tracking and Observability

Web 2.0 has created a world where the consumer is also a producer. The new generation of Web users are ready to engage, providing original content, feedback, ideas, and alternatives. Companies already track customer sentiment using NLP tools, but LLMs expand the range of analysis that can be done on unstructured data. Structured Outputs in LLMs can extract and filter pointers of interest and alert stakeholders. It is even possible to graph and chart key indicators in real time.

Structured Outputs can create structured logs from semi-structured and unstructured IT systems logs and send them to downstream monitoring systems. The monitoring systems can track key parameters and raise alerts if needed. This can improve the observability of legacy systems which often lack structured logging, improving troubleshooting and threat detection.

Streamlining Automation

Processing unstructured data in current automation frameworks such as Robotic Process Automation is difficult. They often require human intervention when long text or images are involved. LLMs can fill these gaps in automation. They can accept text and images as input and feed structured data to automation workflows for further processing. This allows organizations to use traditional automation frameworks efficiently for use cases that do not need agentic automation.

Structured Outputs could also help to automate activities that are out of bounds for traditional automation frameworks. For example, ServiceNow uses Structured Outputs to automate the generation of workflow definitions from natural language user requests.⁸

Challenges

Structured Outputs are a recent innovation in LLMs, and have their fair share of challenges as LLMs continue to evolve. The performance of LLMs in Structured Outputs tasks is dependent on the quality of prompts. In the case of badly designed prompts or schema, LLMs may generate structurally correct but semantically flawed outputs. Performance may also vary across models.

Model cost is still a factor for consideration in GenAI solutions, even as McKinsey estimates them to be only 15% of the overall solution cost.⁹ Token count may increase in tasks that involve deeply nested schema,¹⁰ leading to increased latency and degraded performance, apart from spiraling costs. Focusing on proper prompt and schema design may offset these risks.

Structured Outputs may look like a straightforward functionality, but they require thorough planning and design. For maximum impact, it is important to implement Structured Outputs in line with the broader organizational strategy for AI transformation.

Conclusion

Data is digital gold when curated properly. LLM Structured Outputs can sift through unstructured data and output business critical structured data, which can help in AI transformation efforts. The ability to extract structured data from unstructured content may look plain and mundane, but this could unclog data pipelines, illuminate dashboards with new insights, streamline automation workflows, and improve reliability in IT and AI systems.

Structured Outputs use cases would fit within the augmentation tier of the Agentic AI Value Pyramid,¹¹ leading to early wins in the AI transformation journey. However, successful AI transformation will depend on a clear organizational strategy and roadmap for full-scale adoption of agentic AI.

References

1. Deloitte, Now decides next: Generating a new future, January 2025, https://www.deloitte.com/content/dam/assets-zone3/us/en/docs/campaigns/2025/us-state-of-gen-ai-2024-q4.pdf

2. Harbert, T. Tapping the power of unstructured data, MIT Sloan School of Management, February 2021, https://mitsloan.mit.edu/ideas-made-to-matter/tapping-power-unstructured-data

3. Gartner, Gartner Predicts 80% of D&A Governance Initiatives Will Fail by 2027, February 2024, https://bit.ly/4kxNaW5

4. Krishnan, A. LLM-driven data pipelines with prompt() in MotherDuck and dbt,” December 2024. [Online]. Available: https://motherduck.com/blog/llm-data-pipelines-prompt-motherduck-dbt/

5. Celik, T. Advanced RAG: Automated Structured Metadata Enrichment, April 2025, https://haystack.deepset.ai/cookbook/metadata_enrichment

6. Li, Z., Chen, X., Yu, H., Lin, H., Lu, Y., Tang, Q., Huang, F., Han, X., Sun, L., and Li, Y. StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization,” October 2024, https://doi.org/10.48550/arXiv.2410.08815

7. Mistral. Evaluating RAG with LLM as a Judge, April 2025, https://mistral.ai/news/llm-as-rag-judge

8. Béchard, P. and Ayala, O.M. Reducing hallucination in structured outputs via Retrieval-Augmented Generation, April 2024, https://doi.org/10.48550/arXiv.2404.08189

9. McKinsey. Moving past gen AI’s honeymoon phase: Seven hard truths for CIOs to get from pilot to scale, May 2024, https://bit.ly/3IemJHD

10. Snowflake. AI_COMPLETE Structured Outputs, https://docs.snowflake.com/en/user-guide/snowflake-cortex/complete-structured-outputs

11. Sudalaimuthu, S. AI Agents: Automation is Not Enough, Communications, January 2025, https://cacm.acm.org/blogcacm/ai-agents-automation-is-not-enough/

Shanmugam Sudalaimuthu is a software architect with more than 20 years of experience building innovative solutions for Fortune 500 companies across diverse industries. He specializes in Generative AI and Cloud technologies, and holds a degree in Physics and a master’s degree in Computer Applications.

Use Cases

Insights from Unstructured Data

Feeding the AI Data Pipeline

Improving Reliability and Performance of RAG

Realtime Tracking and Observability

Streamlining Automation

Challenges

Conclusion

References

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签