Bank Statement Parser is an open-source Python library that parses bank statements from seven formats (CAMT.053, PAIN.001, CSV, OFX, QFX, MT940, and PDF) into structured pandas DataFrames. All processing runs locally — deterministic output, automatic PII redaction, and an optional hybrid PDF pipeline that routes through local LLMs when needed.
Get Started in Seconds
pip install bankstatementparser
from bankstatementparser import create_parser, detect_statement_format
fmt = detect_statement_format("statement.xml")
parser = create_parser("statement.xml", fmt)
df = parser.parse() # pandas DataFrame, ready to use
# Parse PDFs with the hybrid pipeline (v0.0.5+)
from bankstatementparser.hybrid import smart_ingest
result = smart_ingest("statement.pdf")
print(result.source_method) # "deterministic" | "llm" | "vision"
print(result.verification.status) # VERIFIED | DISCREPANCY | FAILED
One Library, Seven Formats
Parse CAMT.053, PAIN.001, CSV, OFX, QFX, MT940, and PDF into structured pandas DataFrames with a single, unified API. No need to install separate packages for each format.
| Feature | Bank Statement Parser | Single-format OSS (mt940, ofxparse) | SaaS (Ocrolus, Parseur) |
|---|---|---|---|
| Formats supported | 7, unified API | 1 each | Many (via OCR) |
| PDF support | Hybrid pipeline (deterministic + LLM + vision) | No | Yes (cloud OCR) |
| Data privacy | 100% local (LLMs run locally via Ollama) | 100% local | Data sent externally |
| Cost | Free, Apache 2.0 | Free | $49-$1,000+/mo |
| Balance verification | Golden Rule (opening + credits − debits = closing) | No | Varies |
| PII redaction | Built-in, on by default | No | Varies |
| Streaming | Bounded memory | No | N/A |
| REST API | Built-in FastAPI microservice | No | Yes |
| Deduplication | Idempotent transaction hashes | No | Some |
| Ledger export | hledger + beancount | No | No |
Hybrid PDF Pipeline
Bank Statement Parser v0.0.5+ includes a three-path hybrid pipeline for PDF bank statements:
- Path A (Deterministic): Structured PDF tables parsed directly — free, fastest, no LLM needed.
- Path B (Text-LLM): Digital PDFs with complex layouts extracted via local LLM (LiteLLM/Ollama).
- Path C (Vision-LLM): Scanned or photocopied statements processed with multimodal vision models.
Every extraction is verified with the Golden Rule: opening balance + credits − debits == closing balance.
Built for the ISO 20022 Migration
SWIFT has set firm deadlines: all financial institutions must receive CAMT.053 by November 2027, and MT940/MT942/MT950 will be fully retired by November 2028. Bank Statement Parser handles both legacy MT940 and modern ISO 20022 formats (CAMT.053, PAIN.001) in a single API, so your parsing pipeline works during the transition and beyond.
Performance
- 27,000+ transactions/second for CAMT.053 parsing
- 52,000+ transactions/second for PAIN.001 parsing
- < 2 ms time to first result
- Constant memory from 1K to 50K+ transactions via streaming
- 718 tests with 100% branch coverage across Python 3.10 to 3.14
Why Bank Statement Parser?
- Hybrid PDF Extraction:
smart_ingest()handles digital and scanned PDFs with automatic routing and balance verification. - Format Auto-Detection:
detect_statement_format()identifies files automatically andcreate_parser()returns the right parser. - Privacy First: PII redaction is on by default. LLMs run locally via Ollama — no data leaves your machine.
- REST API: Deploy as a FastAPI microservice with
/ingestand/healthendpoints. - Enrichment: LLM-powered transaction categorisation with pluggable schemas (Plaid 13-category default).
- Ledger Export: Export to hledger and beancount journal formats for plaintext-accounting workflows.
- Bulk Scanning:
scan_and_ingest()processes folder trees with automatic cross-file deduplication. - Multi-Currency:
verify_balance_multi_currency()runs Golden Rule verification per currency group. - Production Ready: Secure ZIP ingestion, input validation, path traversal prevention, and interactive review mode.
- Flexible Output: Export to CSV, JSON, Excel, Polars, hledger, or beancount.
- Parallel Processing: Parse multiple files concurrently with
parse_files_parallel().
Built for Production
Bank Statement Parser is designed for treasury teams, fintech developers, and compliance officers processing sensitive financial data. The library is used in MT940-to-CAMT migration pipelines, automated reconciliation systems, PDF statement ingestion, and regulatory audit workflows across financial institutions.
- 718 tests with 100% branch coverage across Python 3.10 to 3.14
- SHA-256 hash-locked dependencies with CycloneDX SBOM for every release
- Deterministic output — identical input produces byte-identical results, every run
- Apache 2.0 licensed — use freely in commercial and internal systems
Evaluating alternatives? See how Bank Statement Parser compares ❯ | Explore real-world use cases ❯