System Architecture & Design
Intelligent Document Processing Platform
Overview
Globex Data Transformation is an intelligent document processing system that extracts and structures data from insurance documents using AI-powered transformation, multi-layer validation, and self-learning capabilities.
AI-Powered
LLM-based extraction using Groq
Self-Learning
RAG with FAISS vector database
Validated
2-layer validation system
High Performance
10-100x faster similarity search
Architectural Overview
User Interface
Web-based UI for document management
API Layer
Flask REST API
Request Handling
Processing Engine
Parser • Transformer
Validator • Verifier
AI & Intelligence Layer
LLM Engine
Groq API
RAG System
FAISS Vector DB
Embeddings
Transformers
Database
SQLite
Jobs & Metadata
Vector Store
FAISS Indexes
Embeddings
File Storage
Documents
Schemas & Logs
Data Flow Diagram
User Action
Legend
System Layers
User Interface
Web UI · File Upload · Job Monitoring · Correction Interface
API Layer
Flask REST API · Route Handlers · Request Validation
Business Logic
Core processing and transformation controllers
Data Layer
SQLite Database · FAISS Indexes · File Storage
External Services
Groq LLM API · Sentence Transformers
Processing Pipeline
File Upload
User uploads document (PDF/Excel)
Text Extraction
Parser extracts raw text
Schema Loading
LOB-specific schema loaded
RAG Retrieval
Fetch similar examples via FAISS
LLM Extraction
Groq LLM extracts structured data
2-Layer Validation
Check completeness & consistency
Schema Verification
Pydantic validates structure
Job Completion
Results stored in database
User Review
Optional corrections & training
Technology Stack
Backend
- • Python 3.13
- • Flask
- • pandas
- • PyPDF2, openpyxl
- • Pydantic
AI / ML
- • Groq LLM API
- • FAISS
- • Sentence-Transformers
- • scikit-learn
- • numpy
Frontend & Data
- • Tailwind CSS
- • Jinja2
- • SQLite
- • JSON