About the Bank Statement Parser

One Library. Seven Formats. Hybrid PDF Pipeline. Zero Network Calls.

TL;DR: Bank Statement Parser is an open-source Python library that parses seven bank statement formats (CAMT.053, PAIN.001, CSV, OFX, QFX, MT940, and PDF) into pandas DataFrames. Hybrid PDF pipeline with balance verification, REST API, enrichment, ledger export, 27K+ tx/s throughput.

Bank Statement Parser is an open-source Python library that parses bank statements from seven formats into structured pandas DataFrames. The deterministic core processes structured formats locally with zero network calls. The optional hybrid PDF pipeline routes through local LLMs (via Ollama) for digital and scanned statements.

Who Is This For?

Supported Formats

Format Standard File Types Parser/Method
CAMT.053 ISO 20022 Bank-to-Customer Statement .xml CamtParser
PAIN.001 ISO 20022 Credit Transfer Initiation .xml Pain001Parser
CSV Generic bank exports .csv CsvStatementParser
OFX Open Financial Exchange .ofx OfxParser
QFX Quicken Financial Exchange .qfx QfxParser
MT940 SWIFT standard .mt940, .sta Mt940Parser
PDF Digital and scanned statements .pdf smart_ingest()

All formats produce normalised pandas DataFrames with consistent column names, making downstream processing format-agnostic.

Key Capabilities

Security And Privacy

Performance

Metric Value
CAMT.053 throughput 27,000+ tx/s
PAIN.001 throughput 52,000+ tx/s
Per-transaction latency (CAMT) 37 microseconds
Per-transaction latency (PAIN.001) 19 microseconds
Time to first result < 2 ms
Memory scaling (1K-50K tx) Constant (streaming)
Test coverage 100% branch coverage
Tests 718 across 29 test files

Start Building

Get started with installation and examples ❯

"GitHub Repository"