A paper-aware chat client where every cited sentence traces back to its source.
Multi-agent tool routing Β· in-repo RAG knowledge base Β· agentic per-paper retrieval Β· a Citation Canvas that links every [chunk] back to the exact passage in the paper Β· a conference-grade Beamer slide pipeline with decoupled, editable speaker notes.
PaperHub is built UX-first. Every retrieved chunk has a clickable provenance trail, every generation step writes an audit row, and every chat turn is reconstructible from SQLite alone. A single chat interface routes each turn to the right specialist agent β paper search, paper Q&A, NLβSQL library stats, memory curation, or slide generation.
- π Agentic paper retrieval. A per-paper subagent navigates each paper by its section table-of-contents (not blind top-k); a flagship model synthesises across papers over the raw cited chunks.
- π§· Citation Canvas. Inline
[chunk:N]markers link back to the exact passage β click to highlight it in both the rendered HTML and the source PDF. No ungrounded claims. - π Answers in your language. Ask in Chinese, get Chinese β citations preserved. A remembered "always reply in X" preference overrides per-turn detection.
- π Library stats in plain language. "How many papers do I have?" β a
library_statsagent runs read-only SQL over a table allowlist, self-repairs, and answers with the numbers and the SQL it ran. - π§ Session + global memory. Remembered facts/preferences persist per-chat or everywhere β with a safety gate (refuses secrets), LLM conflict-supersede, and a Memory Manager panel to view/edit/(de)activate.
- π§ Visible routing + tracing. A badge shows which agent + model handled each turn; an expandable trace panel replays every model/MCP/pipeline step from SQLite.
- π Discovery via web + Semantic Scholar.
paper_searchresolves even vague references ("that diffusion paper everyone cites") to a citable hit. - π Bring your own papers. Attach by arXiv ID, URL, or PDF upload β deduplicated + cached; a background Marker worker upgrades PDFs to real figures + captions + equationsβLaTeX.
- πΌοΈ Conference-grade slides. Generate a grounded Beamer deck that never cites a figure that doesn't exist. Speaker notes are an opt-in, any-language follow-up; diff-edit a single slide by chat ("make slide 3 more concise") β never a full regen.
- β Math renders. LaTeX in answers (
$β¦$,$$β¦$$) renders as real equations via KaTeX. - πΎ Pick up on any device. Sessions and their full chat record live in the backend, not the browser β open the app anywhere. Deleting a chat removes it everywhere (with Undo).
- π MCP-native. The agent's own tools are served over MCP (
/mcp); external clients (Claude Desktop, Cursor) reach the same surface.
Grounded answers β every claim traces back to the source.
Conference-grade slides β decoupled, opt-in notes.
Library intelligence + memory.
Routing + observability.
| Routing badge | Trace panel (replayable DAG) |
|---|---|
![]() |
![]() |
| Every turn shows which agent + model handled it. | Each model/MCP/pipeline step is an audit row β the full DAG replays from SQLite. |
Discovery + bringing your own papers.
| Paper search cards | Reference Sources drawer |
|---|---|
![]() |
![]() |
| Discovery via web + Semantic Scholar; the agent auto-adds its best picks. | Session-scoped reference set with per-paper enable/remove. |
More β app overview & answering in your language
| The shell | Answers in your language |
|---|---|
![]() |
![]() |
| One chat shell; every turn routed to a specialist agent. | Ask in any language β the answer follows, citations preserved. |
| Area | Choice |
|---|---|
| Backend | Python 3.11 Β· FastAPI Β· LangGraph Β· LiteLLM Β· SQLite (aiosqlite) Β· Pydantic v2 |
| Frontend | TypeScript Β· React 19 Β· Vite Β· Tailwind Β· Zustand Β· react-markdown + KaTeX |
| RAG | Chroma Β· BAAI/bge-small-en-v1.5 embedder Β· ms-marco-MiniLM cross-encoder (hosted in a sibling model-server process) |
| Slides | Beamer + pdflatex (metropolis theme) Β· datalab-to/marker PDF ingestion as a docker-compose service (optional, GPU-aware) |
| LLM | Gemini by default (any LiteLLM provider β small-tier subagents, flagship finalizer) |
| Tooling | uv Β· pytest Β· ruff Β· mypy --strict Β· Vitest Β· ESLint Β· Conventional Commits |
Local-only, single-user. No auth surface β point it at your own LLM key and run it on your machine.
If you want to run PaperHub rather than develop it, the whole stack runs in containers β no Python, Node, or LaTeX to install. You only need Docker and an LLM key. One docker compose up brings up all five services (backend, model-server, Marker PDF ingestion, web-search, and the web UI), so slides (incl. Chinese/CJK), RAG, and web discovery all work out of the box.
git clone https://github.com/whats2000/PaperHub.git
cd PaperHub
cp backend/.env.example backend/.env # then fill in GEMINI_API_KEY (or your provider's key)
docker compose up -d --build # CPU; first build downloads TeX Live + torch (a few GB, once)Open http://localhost:8080.
GPU (optional, NVIDIA + Container Toolkit): faster Marker ingestion + local embedding/rerank. Layer the GPU override:
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d --build
Data persists in named volumes (paperhub-workspace = DB + caches, model weights, Marker weights). docker compose down stops it; add -v to wipe the data too.
Prerequisites: Python 3.11 + uv, Node 18+, and an LLM API key (Gemini by default). Slide generation additionally needs a LaTeX distribution on PATH (pdflatex β e.g. winget install MiKTeX.MiKTeX); without it, only the slides intent is affected (it returns an "install a LaTeX distribution" message). PDF figure/equation extraction can optionally use the Dockerized marker service (docker compose up -d marker).
git clone https://github.com/whats2000/PaperHub.git
cd PaperHub
# Install both halves
cd backend && uv sync # Python deps from uv.lock
cd ../frontend && npm install # JS deps from package-lock.jsonConfigure your LLM key:
cd backend
cp .env.example .env # then fill in GEMINI_API_KEY (or your provider's key)Recommended (Windows, one command): scripts/start.ps1 orchestrates all
the sibling processes β it brings up the external MCP daemons (open-websearch)
via paperhub-mcp-up, the model-server, then the backend with hot-reload:
# Terminal 1 β backend stack (MCP daemons + model-server + FastAPI on :8000)
cd backend
.\scripts\start.ps1# Terminal 2 β frontend (Vite + React, hot-reload, :5173)
cd frontend
npm run devOpen http://localhost:5173 and start chatting.
Lower-level: run uvicorn directly
The model-server auto-spawns on first backend boot, so the minimum is:
cd backend
uv run uvicorn paperhub.app:app --reload --reload-dir src --port 8000Note: this path does not start the web-search daemon for you. On Windows,
uvicorn --reload runs on a SelectorEventLoop, so the in-worker autostart
falls back gracefully (papers-only) β bring web search up yourself with
uv run paperhub-mcp-up (or use scripts/start.ps1, which does it). See the
web-search note under Configuration.
No API key handy? Exercise the chat plumbing with mocked LLMs (PowerShell):
$env:PAPERHUB_ROUTER_MOCK = '{"intent":"chitchat","model_tier":"small","confidence":0.9,"reasoning":"dev"}' $env:PAPERHUB_CHITCHAT_MOCK = "Hello from PaperHub!" uv run uvicorn paperhub.app:app --reload --reload-dir src --port 8000
All settings live in backend/.env (grouped by function in .env.example). The ones you'll likely touch:
| Variable | Purpose | Default |
|---|---|---|
GEMINI_API_KEY |
LLM provider credential (or OPENAI_API_KEY / ANTHROPIC_API_KEY) |
β |
PAPERHUB_PAPER_QA_MODEL |
Flagship finalizer (cross-paper synthesis) | gemini/gemini-2.5-pro |
PAPERHUB_PAPER_QA_SUBAGENT_MODEL |
Per-paper section navigator (lightweight) | gemini/gemini-3.1-flash-lite |
PAPERHUB_DEVICE |
Embedder/reranker device (auto/cpu/cuda/mps) |
auto |
PAPERHUB_SEMANTIC_SCHOLAR_API_KEY |
Higher Semantic Scholar rate limit (optional) | β |
GPU (optional). torch installs CPU-only by default. For CUDA boxes: uv sync --extra cu124 / --extra cu126 / --extra cu130.
Web-search discovery (optional). paper_search / paper_suggest gain a no-key multi-engine discovery step when an open-websearch daemon is reachable on :3000. You don't install it by hand β scripts/start.ps1 (or uv run paperhub-mcp-up) reads mcp_servers.toml and launches every launch-declaring MCP server for you via npx -y, which fetches the package on first run (~25s, one-time):
cd backend
uv run paperhub-mcp-up # launches open-websearch on :3000 (skips if already up)When it's up, the backend's MCP registry auto-exposes web.search / web.fetch. When it's down, the agent falls back to a papers-only flow β no config needed. Spawned daemons are detached so they survive backend --reload; explicit teardown is start.ps1's job (otherwise they clear at reboot). Requires Node 18+ on PATH. (The paperhub-papers MCP surface ships in-process at /mcp; no install required.)
βββββββββββββββββββ SSE βββββββββββββββββββββββββββββββββββββββββββββ
β React shell β ββββββββββββββ β FastAPI Β· POST /chat β
β - Composer β β βββββββββββββββββββββββββββββββββββββββ β
β - Routing badgeβ β β LangGraph turn β β
β - Trace panel β β β Router ββΊ chitchat | paper_qa | β β
β - Citation β β β paper_search | slides | β β
β Canvas β β β library_stats β β
βββββββββββββββββββ β βββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ paper_qa: fan out one subagent β
β per paper β section nav β β
β flagship finalizer over raw chunks β
β βββββββββββ ββββββββββββ ββββββββββββββ β
β β LiteLLM β β Chroma β β SQLite β β
β β adapter β β (RAG) β β (audit + β β
β β β β β β schema) β β
β βββββββββββ ββββββββββββ ββββββββββββββ β
β β² embedder + reranker in a sibling β
β model-server process (:8001) β
βββββββββββββββββββββββββββββββββββββββββββββ
Every model call, MCP call, and pipeline step writes a tool_calls row before returning β enough state to reconstruct the full agent context from SELECT * FROM tool_calls WHERE run_id = ? alone. Paper content is deduplicated: one paper_content row + one cache dir + one set of chunks/vectors per unique paper, regardless of how many sessions reference it.
Full architecture lives in the SRS.
| Plan | Scope | State |
|---|---|---|
| A | Backend foundation + Router-only chat | β complete |
| B | Frontend foundation (React shell, SSE, routing badge, trace panel) | β complete |
| C | Paper Pipeline + Research Agent (ingest, RAG, paper_search, agentic paper_qa, MCP layer, model-server, PDF upload) | β complete β merged (SRS v2.10) |
| D | Search results + Reference Sources + Citation Canvas (HTML + PDF passage highlighting) | β complete β merged (SRS v2.13) |
| E | SQL Agent + library_stats (sqlite MCP) + session/global memory governance (gate, conflict-supersede, Memory Manager UI) |
β complete β merged (SRS v2.17) |
| F | Slide Pipeline + Report Agent β Marker ingestion (F2/F2.1), PhD-grade slide agent (F3), decoupled opt-in notes + diff-editing + length budget (F4), conference-grade metadata title page + title/style customization (F4.2) | β complete β merged (SRS v2.22) |
| F5 | Slide presentation mode (fullscreen window + BroadcastChannel sync + presenter controls) + Q&A-during-talk + version-history UI |
π planned |
| G | Compare view + filesystem / paperhub.* MCP |
π planned |
Each plan ships working, testable software on its own. Plans live under docs/superpowers/plans/.
PaperHub is built spec β plan β TDD, with subagent-driven implementation and per-task spec-compliance + code-quality review.
Backend gates (from backend/):
uv run pytest # 831 tests, hermetic
uv run ruff check src tests
uv run mypy src # --strictFrontend gates (from frontend/):
npm test # Vitest + RTL + MSW (309 tests)
npm run typecheck # tsc --strict
npm run lint # ESLint flat config
npm run build # Vite production buildReplay any past chat turn from SQLite (debugging the agent flow):
cd backend
uv run paperhub-replay --run-id 1End-to-end benchmark β pytest proves the wiring; the backend/benchmark/ harness proves the behaviour. It drives the live backend as a simulated user (attach cached papers β route prompts through /chat), collects grounding evidence (cited chunk text + agent trace), and scores each case 0/1 on correctness + grounding β by hand or via an LLM-as-Judge (fixed temperature, strict grounding). Cases are config-driven (TOML), so you can write your own:
# with the backend running (scripts/start.ps1), from backend/:
scripts/run-benchmark.ps1 -Judge # 20-case eval (16 paper_qa + 4 slides) + LLM judge
scripts/run-benchmark.ps1 -Resume <prior.json> # retry only failed cases after a dropContributing AI agents: read CLAUDE.md first β it carries the conventions, the fix-now policy, and the agent-flow observability rules.
.
βββ backend/
β βββ src/paperhub/ # FastAPI app Β· agents Β· pipelines Β· rag Β· mcp Β· modelserver Β· tracer
β βββ tests/ # pytest suite (831 tests, hermetic)
β βββ benchmark/ # config-driven real-API e2e benchmark + LLM-as-Judge
β βββ pyproject.toml # uv project Β· mypy --strict Β· ruff
βββ frontend/ # React 19 + Vite + Tailwind + Zustand
βββ docs/superpowers/
β βββ specs/ # SRS β authoritative architecture document
β βββ plans/ # implementation plans, one per sub-project
βββ reference/ # copied source from paper2slides-plus + Intro2GenAI-hw1
βββ CLAUDE.md # AI-agent orientation for this repo
βββ README.md
workspace/ (gitignored) holds runtime state β the SQLite database, the papers cache, and the Chroma index.
- System Requirements Specification β authoritative architecture, schema, scope, and acceptance criteria (currently v2.23.2).
- Implementation plans β one per sub-project, each executed via TDD.
- Backend developer docs β backend-specific notes.
Apache License 2.0 β Β© PaperHub contributors. You may use, modify, and distribute this software under the terms of the license, which includes an express grant of patent rights from contributors.











