Local RAG for your files. Claude reads it via MCP.

Drop your PDFs, slide decks, spreadsheets, emails, screenshots, and code onto Chunky. Get a searchable knowledge graph — organised into projects and collections, indexed with hybrid FTS + semantic retrieval, and wired straight to Claude Desktop and Claude Code through an embedded MCP server.

Everything runs on your machine. No cloud database. No telemetry. No lock-in.

Chunky character on a chalkboard covered in interconnected nodes

What Chunky does

Universal ingest

Drag any supported file onto a project. Chunky extracts text from PDFs, DOCX / DOC, PPTX / PPT, XLSX / XLS, CSV, Outlook MSG, Markdown, TXT, images (PNG, JPG, GIF, WEBP, BMP, SVG), and dozens of source-code formats.

OCR for images

Screenshots, photos of whiteboards, diagram captures — Chunky pipes raster images through an LLM vision pass, so the extracted text becomes searchable alongside the source pixels.

Projects and collections

Assets auto-bucket by type (Documents, Slides, PDFs, Spreadsheets, Emails, Images, Code, Links). Create custom named collections and drag assets between them — the drag targets accept both files from disk and existing assets.

Hybrid retrieval

Every search fuses FTS5 lexical BM25 with sqlite-vec cosine similarity over BGE-small-en-v1.5 embeddings, weighted into a single ranked list. Cached summaries and keyPoints answer most questions without a follow-up fetch.

MCP server, auto-configured

On every launch Chunky registers itself as an MCP server with Claude Desktop, Claude Desktop MSIX, and Claude Code CLI — writing to the right config path on each OS. Claude gets nine read-only tools for exploring your graph.

Inline document rendering

Open a PowerPoint or PDF and Chunky renders the extracted text and images together in the order they appeared. Every image keeps its OCR text attached so agents can reason about screenshots as first-class content.

Per-project chat

Each project has its own chat session pre-scoped to that project's assets. Ask questions and the model calls Chunky's MCP tools with the right projectId already in context.

Local by default

Data lives in your OS app-data directory. No sync, no telemetry, no analytics beacon. The only outbound network traffic is LLM API calls that you initiate (chat, OCR) and a one-time embedding model download on first source-build.

Built for agent access

Chunky exposes its knowledge graph via Model Context Protocol (MCP) — the emerging standard for AI clients to discover and call external tools. Everything is read-only, so agents can freely explore without risk to your data.

Nine read-only tools

  • search_nodes — hybrid FTS + semantic search across the graph
  • get_node — read one node with byte paging for long bodies
  • get_nodes — bulk read up to 50 nodes in a single call
  • get_neighbors — walk the edge graph 1–2 hops out
  • list_assets_in_project — enumerate a project's assets
  • list_nodes_by_type — filter by node type, optionally by project
  • list_node_images — list images inside a node with OCR text
  • get_image — fetch image bytes plus OCR, inline for vision clients
  • summarise_artifacts — LLM-summarise a set of nodes

Zero-config integration

Chunky writes the MCP entry into whichever of these it finds on launch:

  • Claude Desktop (Windows standard install)
  • Claude Desktop MSIX sandbox (Windows Store install)
  • Claude Desktop (macOS)
  • Claude Desktop (Linux)
  • Claude Code CLI (any OS with ~/.claude.json)

Tool names get pre-authorised in Claude Code's permissions.allow so the agent doesn't prompt for approval on every call.

Example agent invocation

Ask Claude:

"What slides in my product-strategy project mention Azure? Give me the exact wording and which deck each came from."

Claude calls search_nodes({query: "Azure", types: ["slides"]}), gets back ranked slide hits with title + snippet + summary, and answers with citations — all without Chunky ever sending your files anywhere.

Example use cases

Product / research knowledge base

Drop a year of customer-research transcripts, competitor slide decks, spec docs, and Miro exports into one Chunky project. Ask Claude "what did customers say about pricing?" — hybrid search surfaces the exact quotes with source attribution.

Screenshot-heavy documentation

A shared drive full of Confluence exports and screenshots of legacy admin panels? OCR turns every image into indexed text. Search "click the export button in the admin UI" and Chunky finds the screenshots, not just prose that mentions them.

Sales enablement collateral

Import every case study, pitch deck, one-pager, and email template. Group them into collections by industry. Give Claude Code the MCP tool and let it draft account-specific proposals grounded in your actual collateral.

Legal / compliance discovery

Ingest thousands of PDFs and Outlook MSG files under NDA on an air-gapped laptop. Nothing ever leaves the machine unless you explicitly send a snippet to an LLM. Perfect for regulated environments where cloud RAG is a non-starter.

Personal second brain

Web bookmarks, PDFs from arXiv, meeting notes, photos of book margins — all under one roof, all searchable, all yours. Chunky replaces the Notion / Obsidian / bespoke-script stack most knowledge workers cobble together.

Codebase companion

Ingest a project's docs, RFCs, ADRs, and source. Then let Claude Code query it through MCP while pair-programming: "before I write this middleware, what did we decide about error envelopes in the auth service?"

Download Chunky

Free, MIT-licensed. Pick the installer for your OS. Currently v0.1.0-preview2 — a prerelease preview build. Not yet code-signed or notarized.

Windows

Windows 10 or 11, x64

Windows installer (.exe)

NSIS installer, per-user install (no admin needed). WebView2 runtime auto-installed if missing. ~74 MB.

macOS

macOS 10.15 Catalina or newer

Universal binary (.dmg)

Right-click → Open on first launch — build isn't notarized yet. ~95 MB (arch-specific) / 108 MB (universal).

Linux

Ubuntu 22.04+, Fedora 40+, or equivalent

AppImage needs no install (~169 MB). .deb/.rpm pull WebKit + GTK deps automatically (~91 MB).

Or build from source: detailed per-OS instructions on GitHub.