Architecture & FAQ | Mitanshu Bhasin AI Technical Documentation

Local Execution Engine

WebGPU Domination

Previously, AI models could only run on heavy, expensive servers. Mitanshu Bhasin AI directly hooks into your browser's WebGPU API. This API converts WGSL (WebGPU Shading Language) shaders into native GPU commands, allowing massive matrix multiplications to be processed on your local graphics card at near-native speeds.

Serverless Survival

There is absolutely no backend API processing your thoughts. Once you completely load the model (cached within IndexedDB), you can disconnect from the internet entirely and perfectly generate AI offline. The pure inference logic lives entirely inside your local RAM/VRAM.

Architecture Flow

graph TD A[User Input / Docs] -->|Prompt Injection| B(Local Context Parser) B -->|Tokenization| C{WebGPU AI Engine} C -->|Matrix Math| D[(IndexedDB Weights)] D -->|Fetch Tensors| C C -->|Stream Tokens| E[UI Renderer & Markdown] E -->|Final Output| A subgraph "Isolated Sandbox (Your Browser)" B C D E end style A fill:#ffffff,stroke:#000,stroke-width:2px,color:#000 style C fill:#002244,stroke:#00d2ff,stroke-width:2px,color:#fff style D fill:#111,stroke:#555,stroke-width:1px,color:#fff

Execution Log Analysis

Step 1: When you hit enter, your text is aggressively tokenized locally within your browser.
Step 2: Absolute zero external HTTPS requests or API calls are dispatched.
Step 3: Neural Weights (models) are pushed directly from IndexedDB right into your GPU memory.
Step 4: The UI instantly decodes and streams text token-by-token with effectively zero network latency.

The 3 Adaptive Cores

Every device has its own computational limits. I have aggressively quantized these neural models to 4-bit float (q4f16) precision so that everything from low-end mobile phones to massive gaming rigs can run them perfectly.

High Core Llama-3.2 Base

1 Billion Parameters | q4f16 Quantization

Requires VRAM: ~1.5 GB

The absolute most intelligent and flagship neural model. If you need to generate complex logic, functional code, or extensive essays, boot this engine. Ideal for modern gaming laptops (like the Asus FA506 with dedicated graphics) or high-end MacBooks.

Mid Core Qwen2.5 Base

0.5 Billion Parameters | q4f16 Quantization

Requires VRAM: ~800 MB

The ultimate perfect balance of performance and raw speed. Incredible for daily tasks, professional emails, and quick localized chats. This runs buttery-smooth on standard modern laptops (i3/i5 processors) and flagship mobile phones.

Nano Core SmolLM2 Base

360 Million Parameters | q4f16 Quantization

Requires RAM: ~400 MB

Engineered specifically for potato hardware. If you have an ultra-legacy 2012 Intel Pentium 2020M system or a barebones budget Android device, this Nano Core will provide instant replies without ever freezing your browser.

Absolute Zero-Knowledge

Corporate AI companies aggressively log your personal prompts so they can train their future models. Mitanshu Bhasin AI is completely, radically local. Your chat sessions are exclusively saved inside your browser's local storage (strictly limited to ~5MB to completely prevent bloat).

In case you choose to use the optional Cloud Sync feature, we implement military-grade Firestore security rules (`request.auth.uid == userId`) to structurally guarantee your synced encrypted history is accessible only by yourself. Not even I (the Developer) can see it.

No Telemetry No Prompt Harvesting 100% Client-Side Engine

VRAM & Speed Optimization

IndexedDB Caching

Massive neural model weights (ranging from 800MB to 1.5GB) are downloaded precisely once within your browser. They are chunked and persistently cached locally for near-instant boot-ups on all subsequent loads.

4-bit Quantization

These models are aggressively compressed into the q4f16 format. We drastically reduce the memory footprint by approximately ~70% without sacrificing any logical intelligence or output quality.

Dynamic Context

Context windows scale dynamically. If your RAM is about to hit its absolute limit, the engine automatically compresses older context so your browser tab never crashes (preventing OOM errors).

The Engine Behind The Rebellion.