Proof point β†’ Undocumented data is unusable data. I built the reflex into a tool: every CSV gets a README, automatically, identically.


🩻 Problem

Folders of CSVs accumulate everywhere - exports, datasets, reports - and nobody remembers what’s in them. The cost is paid later, by whoever has to open each file to find out.

πŸ”¨ Solution

csvhero - scan a folder, and every *.csv gets a sibling <name>.readme.md describing its shape:

Architecture Overview

  1. Typer + Rich CLI - a scan command with progress bars, --recursive, --overwrite, and --encoding flags.
  2. Careful semantics - row counts exclude headers; CSV dialect is sniffed with an Excel fallback; the output format is strictly uniform so generated READMEs are diffable - a machine-readable guarantee, documented.
  3. Real packaging at micro scale - clean src/ layout (cli, core, templates), pyproject with a console-script entry point, ruff linting config.

πŸ“œ Philosophy

Small scope, high finish. A 6.5 KB tool still deserves deterministic output guarantees, good ergonomics, and proper packaging - because tools are judged by their seams.

πŸŽ“ Key learnings

  • The modern Python CLI stack and pyproject-era packaging.
  • Designing for machine consumers: uniform, diffable output as a feature.

πŸ“ˆ Output & impact

  • A working data-documentation reflex, packaged - part of a broader pattern of published tooling (rms-scan on PyPI).

🌍 Why this matters

Original Tools & IP Β· Platforms & Registries. Every organization drowns in spreadsheets - applications, reports, rosters. The habit of making data self-describing is the difference between a dataset that decays and one that audits cleanly years later.


πŸš€ Hire me

Buried in undocumented data? Let’s talk β†’ Β· See also: Kuda Statement Analyzer Β· The thesis