Large Language Models (LLM) have created an indelible mark for solving use cases in the textual medium. However, the same cannot be said about the image medium. Of course, the models can generate images and solve image-related use cases when fine-tuned, but there's still a gap between generative capabilities and practical enterprise solutions.
Consider this: When you prompt an LLM to generate images, say of a car accident - the results can be hit or miss. Some images might look convincing, but others have that distinctive AI-generated surreal quality that makes them unsuitable for professional computer vision applications. The issue is that we cannot use these images as-is to solve Computer Vision use cases.
However, we used LLM for specialized Optical Character Recognition (OCR) purposes. We wanted to explore large language models to extract dimensional data from Engineering Diagrams. Below is a handwritten engineering diagram with pipes and valves. (While we tested our approach using handwritten diagrams for proof of concept, real-world applications involve more complex, software-generated drawings with precise dimension lines and detailed annotations. Those diagrams are complex, and their information will be crowded.)
Our primary focus has been on developing a reliable system for extracting two critical pieces of information: pipe identifiers and their corresponding dimensions. This capability could transform how engineering teams handle document processing and data extraction from technical drawings.

The pipe names and dimensions to extract are
- Name: PS94 - Dimension: 7’
- Name: PS441-0 - Dimension: 7’ 3”
- Name: PS443 - Dimension: 7’
- Name: P4 - Dimension: 5’
The Reality Check
The Prompt Engineering Journey: Our initial success came through careful prompt engineering, which required multiple iterations and fine-tuning. Through a series of conversational exchanges with the LLM, we gradually refined our prompts, correcting and guiding the model to identify specific elements within the diagrams. This iterative process eventually led to accurate dimension and label extraction.
The Context Dilemma: However, we hit an unexpected roadblock. When attempting to replicate our success in a new chat session using the same diagram and our previously successful prompt, the results were disappointing. Despite using identical inputs, the model returned to producing inaccurate values - effectively erasing our previous progress.
Understanding the Limitations: Our experience highlighted several challenges in using Generative AI for engineering diagram analysis:
- Limited Training Scope: Current LLMs lack exposure to diverse engineering diagram formats and notation styles.
- Context Interpretation: The models struggle with industry-specific symbols and notations that require specialized domain knowledge.
- Consistency Issues: Results can vary significantly between sessions, even with identical inputs.
Alternative approaches given these challenges; we've identified more reliable traditional computer vision approaches:
- Combining OpenCV pattern matching with Optical Character Recognition technology
- Implementing specialized object detection and recognition systems
- Integrating OpenCV capabilities with targeted OCR solutions
This experience has reinforced that while LLMs show promise, a hybrid approach combining traditional computer vision techniques with newer AI technologies might be the most practical path forward for engineering diagram analysis.
Moving Forward: A Balanced Approach
As we continue to explore this space, we're focusing on developing hybrid solutions that combine the interpretative power of LLMs with the reliability of traditional computer vision techniques. This balanced strategy not only addresses current limitations but also points us to adapt as LLM technology evolves quickly.
Note: As with any AI implementation, data security comes first. Our standard practice includes thoroughly redacting all Personally Identifiable Information (PII) from source materials, including customer references and sensitive data, before processing any engineering diagrams.
Ready to explore AI's transformative power? Visit Quasar to know more.
Key Takeaways
- LLMs show promise in extracting dimensional data from engineering diagrams, but practical challenges remain.
- Prompt engineering can improve accuracy, but results lack consistency across sessions.
- Current limitations include limited training scope, context interpretation issues, and inconsistent outputs.
- A hybrid approach combining LLMs with traditional computer vision (OpenCV + OCR) offers a more reliable solution.
- Data security and PII redaction are essential in all AI implementations.
Frequently Asked Questions (FAQ)
- Q1: Why use LLMs for engineering diagram dimension extraction?
LLMs can interpret complex patterns and extract textual data from diagrams, reducing manual effort in engineering workflows. - Q2: What challenges did you encounter with LLMs?
Inconsistent results, lack of domain-specific training, and difficulty interpreting engineering symbols and notations. - Q3: How does prompt engineering help?
Iterative prompt refinement improves accuracy, but results may still vary between sessions. - Q4: What is the recommended approach for reliable extraction?
A hybrid solution combining OpenCV pattern matching, OCR, and selective LLM integration for interpretative tasks. - Q5: How do you ensure data security?
By redacting all Personally Identifiable Information (PII) before processing diagrams.
Glossary of Terms
- LLM (Large Language Model): AI models trained on vast text data to understand and generate language.
- OCR (Optical Character Recognition): Technology that converts images of text into machine-readable text.
- OpenCV: An open-source computer vision library for image processing and pattern recognition.
- Prompt Engineering: Crafting and refining prompts to optimize LLM outputs.
- Agentic Mesh: (If referenced later) A decentralized network of AI agents collaborating dynamically.
Best Practices & Common Pitfalls
Best Practices
- Use iterative prompt engineering for better extraction accuracy.
- Combine LLMs with traditional computer vision techniques for robust solutions.
- Implement PII redaction before processing sensitive diagrams.
- Validate outputs with domain experts for critical engineering applications.
Common Pitfalls
- Relying solely on LLMs for diagram interpretation.
- Ignoring contextual limitations of LLMs in specialized domains.
- Overlooking session variability, leading to inconsistent results.
- Neglecting security and compliance in AI workflows.
Solution Architect with extensive expertise in Artificial Intelligence, Machine Learning, and Gen AI systems.
Related reads
About Coforge
We are a global digital services and solutions provider, who leverage emerging technologies and deep domain expertise to deliver real-world business impact for our clients. A focus on very select industries, a detailed understanding of the underlying processes of those industries, and partnerships with leading platforms provide us with a distinct perspective. We lead with our product engineering approach and leverage Cloud, Data, Integration, and Automation technologies to transform client businesses into intelligent, high-growth enterprises. Our proprietary platforms power critical business processes across our core verticals. We are located in 23 countries with 30 delivery centers across nine countries.