System Architecture & Design

Intelligent Document Processing Platform

Overview

Globex Data Transformation is an intelligent document processing system that extracts and structures data from insurance documents using AI-powered transformation, multi-layer validation, and self-learning capabilities.

AI-Powered

LLM-based extraction using Groq

Self-Learning

RAG with FAISS vector database

Validated

2-layer validation system

High Performance

10-100x faster similarity search

Architectural Overview

User Interface

Web-based UI for document management

API Layer

Flask REST API

Request Handling

Processing Engine

Parser • Transformer

Validator • Verifier

AI & Intelligence Layer

LLM Engine

Groq API

RAG System

FAISS Vector DB

Embeddings

Transformers

Database

SQLite

Jobs & Metadata

Vector Store

FAISS Indexes

Embeddings

File Storage

Documents

Schemas & Logs

Data Flow
API Calls
Bidirectional

Data Flow Diagram

User Upload
File + LOB Selection
Create Job
Store in Database
Parse Document
Extract Raw Text
Load Schema
Split into Sections
Query Similar
RAG / FAISS
Training Examples
LLM Extraction (Groq)
Section-by-Section Processing
2-Layer Validation
Quality Check
Schema Verification
Pydantic Check
Store Result
Update Job Status
Optional
User Action
User Correction
Submit & Train
Job Complete
Structured JSON Output

Legend

Primary Process
Secondary Process
External Service
Output

System Layers

1

User Interface

Web UI · File Upload · Job Monitoring · Correction Interface

2

API Layer

Flask REST API · Route Handlers · Request Validation

3

Business Logic

Core processing and transformation controllers

Parser
Transformer
Validator
Verifier
4

Data Layer

SQLite Database · FAISS Indexes · File Storage

5

External Services

Groq LLM API · Sentence Transformers

Processing Pipeline

1

File Upload

User uploads document (PDF/Excel)

2

Text Extraction

Parser extracts raw text

3

Schema Loading

LOB-specific schema loaded

4

RAG Retrieval

Fetch similar examples via FAISS

5

LLM Extraction

Groq LLM extracts structured data

6

2-Layer Validation

Check completeness & consistency

7

Schema Verification

Pydantic validates structure

8

Job Completion

Results stored in database

9

User Review

Optional corrections & training

Technology Stack

Backend

  • • Python 3.13
  • • Flask
  • • pandas
  • • PyPDF2, openpyxl
  • • Pydantic

AI / ML

  • • Groq LLM API
  • • FAISS
  • • Sentence-Transformers
  • • scikit-learn
  • • numpy

Frontend & Data

  • • Tailwind CSS
  • • Jinja2
  • • SQLite
  • • JSON

Performance Metrics

10-100x
Faster Search
FAISS vs Linear
1M+
Vectors
Scalable capacity
2-Layer
Validation
Quality assurance
< 1ms
Search Time
HNSW index