Data Sources Dashboard & Admin Controls

Operational visibility into ingestion status, per-source stages, reprocess controls, and live logs so content teams can keep the knowledge base accurate without engineering help.

What Is This Feature?

The Data Sources Dashboard is the control room for your AI assistant's knowledge. It's where administrators can see the status of every connected content source, understand what's been successfully ingested, diagnose problems, and trigger updates — all without needing to involve an engineer. This deep dive explains what the dashboard provides and why it's essential for keeping your assistant accurate and reliable at scale.

Why It Matters to Your Business

An AI assistant is only as trustworthy as the content it's working from. If a document ingestion silently fails — say, a PDF didn't process correctly — your assistant might give outdated answers or miss important information entirely. Without visibility into what's happening, you won't know until a customer complains.

Proactive problem detection. Instead of finding out about ingestion failures through customer complaints, your team sees them immediately in the dashboard.
Faster recovery. When something does go wrong, you can identify exactly which step failed and re-run only that step — not start the whole process from scratch. This is the difference between a 5-minute fix and an hour-long reprocess.
Operational independence. Content teams and operators can manage the knowledge base without filing engineering tickets for every update or fix.
Auditability. Every reprocess action is logged — who triggered it, when, and what the outcome was. This matters for compliance and for understanding patterns in failures.

How It Works (No Technical Jargon)

Every content source (a document, a web page, a connected knowledge base) goes through several processing stages before the assistant can use it. The dashboard gives you full visibility into each stage.

What You Can See Per Source

Current status: Is this source healthy, being processed, or in an error state?
Last processed: When was this source last successfully updated?
Stage breakdown: Which stage is it currently at — fetching, extracting, chunking, embedding, or indexing?
Error details: If something failed, what exactly went wrong? (e.g., "PDF couldn't be parsed", "connection timed out")
Sample logs: A human-readable log of recent activity for that source

Triggering a Reprocess

You can choose exactly how much to redo when a source needs updating:

Re-fetch only — Downloads the latest version of the source (when content changed at the source)
Re-extract — Re-parses the downloaded content (when the extraction logic was updated)
Re-chunk — Re-splits the content into smaller pieces (when chunking strategy changed)
Re-embed — Regenerates the AI representations (when embedding model was upgraded)
Full reprocess — Starts from scratch (when something fundamental changed)

This granularity saves significant time and cost — you only redo the work that actually needs to be redone.

Live Progress Tracking

When you trigger a reprocess, you don't have to wonder if it's working. The dashboard streams live logs as each stage completes, and you get a job ID you can share with your team or reference in a support ticket.

Safeguards and Limits

To prevent accidental overload:
- Only organization administrators can trigger reprocess jobs
- There's a limit on how many concurrent reprocess jobs can run per organization
- All actions are logged with the operator's identity and timestamp

What to Expect on the Roadmap

The team is building:

Reprocess API with per-stage control, plus live log streaming in the dashboard (estimated 4 weeks)
Job status tracking with email or webhook notifications on completion or failure

Once live, your operations team will have everything they need to keep the knowledge base healthy and up to date, without relying on engineering support for routine maintenance.