TL;DR: Bank Statement Parser is an open-source Python library that parses six bank statement formats (CAMT.053, PAIN.001, CSV, OFX, QFX, MT940) into pandas DataFrames. 100% local processing, PII redaction by default, 27K+ tx/s throughput.
Bank Statement Parser is an open-source Python library that parses bank statements from six formats into structured pandas DataFrames. All processing happens locally -- zero network calls, deterministic output, and automatic PII redaction.
Who Is This For?
- Treasury teams migrating from MT940 to CAMT.053 who need a parser that handles both old and new formats during the transition.
- Fintech developers building reconciliation, reporting, or accounting pipelines who want a single dependency instead of stitching together mt940 + ofxparse + custom CSV logic.
- Compliance teams who need PII redaction by default and audit-ready, deterministic output that never sends data to external services.
- Anyone who refuses to send sensitive financial data to a third-party SaaS when a local, open-source tool can do the job.
Supported Formats
| Format | Standard | File Types | Parser Class |
|---|---|---|---|
| CAMT.053 | ISO 20022 Bank-to-Customer Statement | .xml |
CamtParser |
| PAIN.001 | ISO 20022 Credit Transfer Initiation | .xml |
Pain001Parser |
| CSV | Generic bank exports | .csv |
CsvStatementParser |
| OFX | Open Financial Exchange | .ofx |
OfxParser |
| QFX | Quicken Financial Exchange | .qfx |
QfxParser |
| MT940 | SWIFT standard | .mt940, .sta |
Mt940Parser |
All formats produce normalised pandas DataFrames with consistent column names, making downstream processing format-agnostic.
Key Capabilities
- Format Auto-Detection:
detect_statement_format()identifies the format;create_parser()instantiates the right parser. - Streaming Parsing: Process large files (50 MB+, 50K+ transactions) with bounded memory using
parse_streaming(). - Parallel Processing: Parse multiple files concurrently with
parse_files_parallel()using ProcessPoolExecutor. - Deduplication: Detect exact duplicates and suspected matches with explainable confidence scores.
- In-Memory Parsing:
from_string()andfrom_bytes()for SFTP and API workflows with no disk I/O. - Secure ZIP Processing:
iter_secure_xml_entries()with compression ratio limits, entry size caps, and encrypted entry rejection. - Export: CSV, JSON, Excel (
.xlsx), and optional Polars DataFrames.
Security And Privacy
- PII Redaction: Names, IBANs, and addresses are masked by default in CLI output. Opt in with
--show-pii. - XXE Protection: XML parsing uses
resolve_entities=False,no_network=True,load_dtd=False. - ZIP Bomb Protection: Compression ratio limits (100:1 default), entry size caps (10 MB), encrypted entry rejection.
- Path Traversal Prevention: Dangerous pattern blocklist and symlink resolution.
- Supply Chain Security: SHA-256 hash-locked dependencies, CycloneDX SBOM, build provenance attestation.
Performance
| Metric | Value |
|---|---|
| CAMT.053 throughput | 27,000+ tx/s |
| PAIN.001 throughput | 52,000+ tx/s |
| Per-transaction latency (CAMT) | 37 microseconds |
| Per-transaction latency (PAIN.001) | 19 microseconds |
| Time to first result | < 2 ms |
| Memory scaling (1K-50K tx) | Constant (streaming) |
| Test coverage | 100% branch coverage |
| Tests | 467 across 29 test files |
Start Building
Get started with installation and examples ❯
"GitHub Repository"