Follow Bank Statement Parser development. Subscribe via RSS or watch the GitHub repository for release notifications.
v0.0.8 — 2026-04-11 (Latest) — "Full Platform"
Closes competitive gaps with a complete platform for bank statement processing.
- Added multi-currency balance verification —
verify_balance_multi_currency()groups by currency, runs Golden Rule per group. - Added hledger + beancount export —
to_hledger()andto_beancount()inbankstatementparser.export. - Added bulk directory scanner —
scan_and_ingest()scans folder trees, deduplicates across batch. - Added account mapping rules —
AccountMapperwith ordered regex rules from JSON config. - Added REST API — FastAPI wrapper with
/ingestand/healthendpoints ([api]extra).
v0.0.7 — 2026-04-08 — "Universal Vision"
- Added direct Ollama bridge (
ollama_direct_completion) — bypasses LiteLLM long-prompt hang. - Added strip mode (
VisionExtractor.strip_rows=True) — splits dense pages into overlapping bands for small local models. - Changed recommended vision model from
llavatominicpm-v(better for OCR/document tasks).
v0.0.6 — 2026-04-08 — "Intelligence Layer"
- Breaking: Dropped Python 3.9 support (now 3.10–3.14). Python 3.9 reached EOL 2025-10-31.
- Added enrichment module (
Categorizer,EnrichedTransaction,DEFAULT_CATEGORY_SCHEMA) for LLM-powered transaction categorisation. - Added interactive review mode with
--type reviewCLI command. - Added per-row bounding box extraction (
Transaction.source_bbox) for downstream review UIs. - Expanded test suite to 718 tests with 100% branch coverage.
v0.0.5 — 2026-04-08 — "Universal Extraction"
- Added hybrid PDF pipeline (
smart_ingest()) with deterministic/text-LLM/vision-LLM routing. - Added
LLMExtractorfor digital PDFs via LiteLLM. - Added
VisionExtractorfor scanned PDFs via multimodal vision models. - Added Golden Rule balance verification (
opening + credits − debits == closing). - Added idempotent deduplication via
transaction_hash(MD5 fingerprint). - Added install extras:
[hybrid],[hybrid-plus],[hybrid-vision].
v0.0.4 — 2026-03-15
- Added parallel file parsing with
parse_files_parallel()using ProcessPoolExecutor. - Added true streaming for large PAIN.001 files (50 MB+) with bounded memory.
- Performance optimisations: CAMT throughput now exceeds 27,000 tx/s, PAIN.001 exceeds 52,000 tx/s.
- Added
Deduplicatorclass for detecting exact duplicates and suspected matches with confidence scores. - Added
from_string()andfrom_bytes()methods for in-memory parsing without disk I/O. - Added
iter_secure_xml_entries()for secure ZIP archive processing. - Extended CI with performance threshold enforcement.
v0.0.3 — 2025-11-20
- Added CSV, OFX, QFX, and MT940 parser support.
- Added format auto-detection with
detect_statement_format()andcreate_parser(). - Added PII redaction (on by default in CLI and streaming mode).
- Added export helpers for CSV, JSON, and Excel.
- Added optional Polars DataFrame support.
- Expanded test suite to 467 tests with 100% branch coverage.
v0.0.2 — 2025-06-10
- Added PAIN.001 parser (
Pain001Parser) for ISO 20022 credit transfer initiation files. - Added CLI interface (
python -m bankstatementparser.cli). - Added streaming mode with
parse_streaming(). - Added input validation and file size limits.
v0.0.1 — 2025-01-15
- Initial release.
- CAMT.053 parser (
CamtParser) for ISO 20022 bank-to-customer statements. - pandas DataFrame output.
- Basic XML security hardening (XXE protection, no_network).
View the full commit history on GitHub.