← Blog
Field note

Nebula: On-Device GraphRAG That Runs in Your Browser Tab

Thien Nguyen · Jun 30, 2026

Most AI note apps ship your notes to a cloud vector database and a hosted model, then ask you to trust the privacy policy. For the work we do (regulated industries, sensitive data) that is a non-starter. So we built the opposite and open-sourced it: Nebula, a private, local-first AI knowledge base that runs entirely inside a browser tab. No backend, no account, no server. Its tagline says it plainly: notes that think, nothing leaves your device.

Repo: github.com/beevr-labs/Nebula (Apache-2.0). Live demo, no signup: beevr-labs.github.io/Nebula. Here is why we went fully on-device, and what it cost.

Privacy by architecture, not by promise

The usual privacy pitch is a policy: "we won't look at your data." Nebula's is structural: there is nowhere for your data to go. Everything runs in the browser. Notes, embeddings, and the search index live in local browser storage. There is no sync service, no account system, and therefore no server to breach or to put under a data-processing agreement. For sensitive notes (client records, health information, anything you would not paste into a cloud chatbot) that is the whole point.

What runs where

It is a SvelteKit single-page app that does real ML in the browser:

  • On-device chat via WebLLM, GPU-accelerated with WebGPU. You pick the model, from tiny-and-fast to large-and-accurate, and Nebula shows the download size before you commit. Qwen and Llama models are supported.
  • Semantic search powered by bge-m3 (Apache-2.0), about 570 MB on first use, then cached and fully offline. It is multilingual, including Vietnamese, so it works across mixed-language notes.
  • WebAssembly handles the compute-heavy parts.
  • After the first model download, the whole thing works offline.

Why a graph, not just vectors

Flat vector search finds notes that are similar. It does not understand that "the client from the Tuesday call" and "Acme Corp" are the same entity across ten different notes. Nebula builds an entity knowledge graph automatically (people, projects, clients) and uses GraphRAG to answer questions by walking those relationships, then links every answer back to the source notes. You ask in plain language and get an answer you can trace, instead of a keyword hunt across disconnected files.

It is also just a good notes app

The AI is useless if the notes app underneath is not real, so it is: Markdown, wikilinks and backlinks, tabs, a quick switcher, daily notes, templates, tags, and folders. You can bring your own files (PDF, CSV, text) and export the whole vault as plain .md files whenever you want. No lock-in: your notes go in and out as portable Markdown. The codebase ships with 430+ automated tests, because local-first does not mean fragile.

The hard parts (what we learned)

  • On-device models are smaller, so structure has to carry more weight. The knowledge graph recovers context that a small local model alone would miss, which is a big part of why we went graph-first instead of leaning on raw model size.
  • Explainable retrieval matters as much as accuracy. Showing the path through the graph back to source notes is what makes the answer trustworthy, and for regulated buyers that traceability is not a nice-to-have.
  • The browser is a surprisingly capable runtime in 2026. WebGPU plus WebAssembly means "install nothing, runs offline, GPU-accelerated" is actually achievable, not a science project.

Why open source

The same reason we open-source the rest of our hardest work: in AI, "verifiable" beats "trust me." A buyer evaluating us for sensitive data can read exactly how retrieval works, and confirm for themselves that nothing leaves the device, instead of taking our word for it.

Nebula is Apache-2.0 at github.com/beevr-labs/Nebula, with a live demo at beevr-labs.github.io/Nebula. If you need AI built on sensitive or regulated data, on-device or otherwise, made to survive an audit rather than just a demo, here is how we work.

Anyone else running RAG fully in the browser? What model and hardware combo is actually working for you?