Building DocOps Copilot: Notes from the First Slice

I am building a small side project called DocOps Copilot to think through what document AI looks like when it stops being chat-with-PDF and starts being workflow.

Most document AI demos look like this: upload a PDF, ask a question, get an answer. That is useful, but it does not match how operations teams actually work. A vendor packet review is not one question against one document. It is a packet of mixed files - invoice, PO, packing slip, contract, payment ledger - where "is this ready to approve?" depends on cross-document consistency, missing items, and exceptions that need a human to look at.

So the question I want to answer with this project is: can RAG, plus structured extraction, plus spreadsheet querying, with citations everywhere, turn a messy packet into a reviewable work item?

What I have learned in the first weeks

Embeddings are the wrong tool for spreadsheets. If someone asks "which invoice has the largest unpaid balance," that is a SQL query, not a similarity search. I am using DuckDB to import CSV/XLSX files locally, having the LLM generate read-only SQL, and validating the query before running it. The user sees the SQL, the result, and the explanation, which is more reviewable than a generated answer.

Citations are the product. The difference between a useful document assistant and an unsafe one is whether every claim points back to a document, page, and snippet. Without that, nobody operationally responsible will trust the output. With it, the workflow changes from "do I trust the AI?" to "let me check the evidence," which is a workflow people already do.

The interesting work is downstream of retrieval. Retrieval is solved enough. What turns retrieval into a product is classification, extraction, reconciliation, and the review UI around it. RAG is plumbing. The product is the workflow.

That is the thesis. I will keep posting build notes as the slices ship.