Local Execution Engine
WebGPU Domination
Previously, AI models could only run on heavy, expensive servers. Mitanshu Bhasin AI directly hooks into your browser's WebGPU API. This API converts WGSL (WebGPU Shading Language) shaders into native GPU commands, allowing massive matrix multiplications to be processed on your local graphics card at near-native speeds.
Serverless Survival
There is absolutely no backend API processing your thoughts. Once you completely load the model (cached within IndexedDB), you can disconnect from the internet entirely and perfectly generate AI offline. The pure inference logic lives entirely inside your local RAM/VRAM.
Architecture Flow
Step 1: When you hit enter, your text is aggressively
tokenized locally within your browser.
Step 2: Absolute zero external HTTPS requests or API
calls
are dispatched.
Step 3: Neural Weights (models) are pushed directly
from
IndexedDB right into your GPU memory.
Step 4: The UI instantly decodes and streams text
token-by-token with effectively zero network latency.
The 3 Adaptive Cores
Every device has its own computational limits. I have aggressively quantized these neural models to 4-bit float (q4f16) precision so that everything from low-end mobile phones to massive gaming rigs can run them perfectly.
High Core Llama-3.2 Base
1 Billion Parameters | q4f16 Quantization
The absolute most intelligent and flagship neural model. If you need to generate complex logic, functional code, or extensive essays, boot this engine. Ideal for modern gaming laptops (like the Asus FA506 with dedicated graphics) or high-end MacBooks.
Mid Core Qwen2.5 Base
0.5 Billion Parameters | q4f16 Quantization
The ultimate perfect balance of performance and raw speed. Incredible for daily tasks, professional emails, and quick localized chats. This runs buttery-smooth on standard modern laptops (i3/i5 processors) and flagship mobile phones.
Nano Core SmolLM2 Base
360 Million Parameters | q4f16 Quantization
Engineered specifically for potato hardware. If you have an ultra-legacy 2012 Intel Pentium 2020M system or a barebones budget Android device, this Nano Core will provide instant replies without ever freezing your browser.
Absolute Zero-Knowledge
Corporate AI companies aggressively log your personal prompts so they can train their future models. Mitanshu Bhasin AI is completely, radically local. Your chat sessions are exclusively saved inside your browser's local storage (strictly limited to ~5MB to completely prevent bloat).
In case you choose to use the optional Cloud Sync feature, we implement military-grade Firestore security rules (`request.auth.uid == userId`) to structurally guarantee your synced encrypted history is accessible only by yourself. Not even I (the Developer) can see it.
VRAM & Speed Optimization
IndexedDB Caching
Massive neural model weights (ranging from 800MB to 1.5GB) are downloaded precisely once within your browser. They are chunked and persistently cached locally for near-instant boot-ups on all subsequent loads.
4-bit Quantization
These models are aggressively compressed into the q4f16 format. We drastically reduce the memory footprint by approximately ~70% without sacrificing any logical intelligence or output quality.
Dynamic Context
Context windows scale dynamically. If your RAM is about to hit its absolute limit, the engine automatically compresses older context so your browser tab never crashes (preventing OOM errors).