Proof point β Undocumented data is unusable data. I built the reflex into a tool: every CSV gets a README, automatically, identically.
π©» Problem
Folders of CSVs accumulate everywhere - exports, datasets, reports - and nobody remembers what’s in them. The cost is paid later, by whoever has to open each file to find out.
π¨ Solution
csvhero - scan a folder, and every *.csv gets a sibling <name>.readme.md describing its shape:
Architecture Overview
- Typer + Rich CLI - a
scancommand with progress bars,--recursive,--overwrite, and--encodingflags. - Careful semantics - row counts exclude headers; CSV dialect is sniffed with an Excel fallback; the output format is strictly uniform so generated READMEs are diffable - a machine-readable guarantee, documented.
- Real packaging at micro scale - clean
src/layout (cli,core,templates), pyproject with a console-script entry point, ruff linting config.
π Philosophy
Small scope, high finish. A 6.5 KB tool still deserves deterministic output guarantees, good ergonomics, and proper packaging - because tools are judged by their seams.
π Key learnings
- The modern Python CLI stack and pyproject-era packaging.
- Designing for machine consumers: uniform, diffable output as a feature.
π Output & impact
- A working data-documentation reflex, packaged - part of a broader pattern of published tooling (rms-scan on PyPI).
π Why this matters
Original Tools & IP Β· Platforms & Registries. Every organization drowns in spreadsheets - applications, reports, rosters. The habit of making data self-describing is the difference between a dataset that decays and one that audits cleanly years later.
π Hire me
Buried in undocumented data? Let’s talk β Β· See also: Kuda Statement Analyzer Β· The thesis