Getting Started

Start Building Secure Applications with Bank Statement Parser

Requirements

Install

# Core install (deterministic parsers only)
pip install bankstatementparser

Optional extras for additional capabilities:

# Text-LLM path for digital PDFs (litellm + pypdf)
pip install 'bankstatementparser[hybrid]'

# Higher-fidelity table extraction (adds pdfplumber)
pip install 'bankstatementparser[hybrid-plus]'

# Vision-LLM path for scanned PDFs (adds pypdfium2)
pip install 'bankstatementparser[hybrid-vision]'

# LLM-powered transaction categorisation
pip install 'bankstatementparser[enrichment]'

# REST API microservice (FastAPI + uvicorn)
pip install 'bankstatementparser[api]'

# Optional Polars DataFrame support
pip install 'bankstatementparser[polars]'

Quick Start

Auto-Detect and Parse Any Structured Format

from bankstatementparser import create_parser, detect_statement_format

fmt = detect_statement_format("transactions.ofx")
parser = create_parser("transactions.ofx", fmt)
df = parser.parse()  # pandas DataFrame
print(df.head())

This works with .xml (CAMT/PAIN.001), .csv, .ofx, .qfx, .mt940, and .sta files.

Parse CAMT.053

from bankstatementparser import CamtParser

parser = CamtParser("statement.xml")
transactions = parser.parse()

Parse PAIN.001

from bankstatementparser import Pain001Parser

parser = Pain001Parser("payment.xml")
payments = parser.parse()

Parse PDF Bank Statements (Hybrid Pipeline)

The hybrid pipeline intelligently routes PDFs through three extraction paths:

from bankstatementparser.hybrid import smart_ingest

result = smart_ingest("statement.pdf")
print(result.source_method)         # "deterministic" | "llm" | "vision"
print(result.verification.status)   # VERIFIED | DISCREPANCY | FAILED
print(result.transactions)          # List of extracted transactions

Every extraction is verified with the Golden Rule: opening + credits − debits == closing.

Streaming Large Files

For files with thousands of transactions, use streaming to keep memory bounded:

parser = CamtParser("large_statement.xml")
for transaction in parser.parse_streaming(redact_pii=True):
    process(transaction)  # Memory stays constant

In-Memory Parsing

Parse from bytes without disk I/O -- useful for SFTP or API workflows:

xml_bytes = download_from_sftp()
parser = CamtParser.from_bytes(xml_bytes, source_name="daily.xml")
transactions = parser.parse()

Parallel File Processing

Parse multiple files concurrently:

from bankstatementparser import parse_files_parallel

results = parse_files_parallel([
    "statements/jan.xml",
    "statements/feb.xml",
    "statements/mar.xml",
])
for r in results:
    print(r.path, r.status, len(r.transactions), "rows")

Bulk Directory Scanning

Process entire folder trees with automatic deduplication:

from bankstatementparser.hybrid import scan_and_ingest

batch = scan_and_ingest("statements/2026/", pattern="**/*.pdf")
print(f"Processed: {len(batch.results)} files")
print(f"Unique transactions: {batch.unique_count}")

Deduplication

Idempotent transaction hashes for safe incremental ingestion:

from bankstatementparser import CamtParser, Deduplicator

parser = CamtParser("statement.xml")
dedup = Deduplicator()
result = dedup.deduplicate(dedup.from_dataframe(parser.parse()))

print(f"Unique: {len(result.unique_transactions)}")
print(f"Exact duplicates: {len(result.exact_duplicates)}")
print(f"Suspected matches: {len(result.suspected_matches)}")

Transaction Categorisation (Enrichment)

Automatically categorise transactions using LLM-powered classification:

from bankstatementparser.enrichment import Categorizer

categorizer = Categorizer()
enriched = categorizer.categorize_batch(transactions)
for txn in enriched:
    print(f"{txn.description}: {txn.category}")

Ledger Export (hledger / beancount)

Export transactions to plaintext-accounting journal formats:

from bankstatementparser.export import to_hledger, to_beancount

journal = to_hledger(transactions, account="Assets:Bank:Checking")
beancount_journal = to_beancount(transactions, account="Assets:Bank:Checking")

Multi-Currency Balance Verification

Verify balances independently per currency group:

from bankstatementparser.hybrid import verify_balance_multi_currency

results = verify_balance_multi_currency(transactions)
for currency, verification in results.items():
    print(f"{currency}: {verification.status}")

REST API

Deploy as a FastAPI microservice:

# Start the API server
bankstatementparser-api --port 8000

# For container deployments
bankstatementparser-api --host 0.0.0.0 --port 9000

Endpoints:

Secure ZIP Processing

Process zipped XML files with built-in security checks (bomb protection, encrypted entry rejection):

from bankstatementparser import iter_secure_xml_entries, CamtParser

for entry in iter_secure_xml_entries("statements.zip"):
    parser = CamtParser.from_bytes(entry.xml_bytes, source_name=entry.source_name)
    print(f"{entry.source_name}: {len(parser.parse())} transactions")

Export

parser = CamtParser("statement.xml")
parser.export_csv("output.csv")
parser.export_json("output.json")

# Polars (requires bankstatementparser[polars])
polars_df = parser.to_polars()

# Excel
parser.camt_to_excel("output.xlsx")

CLI Usage

# Parse structured formats
bankstatementparser --type camt --input statement.xml
bankstatementparser --type pain001 --input payment.xml

# Hybrid PDF pipeline
bankstatementparser --type ingest --input statement.pdf
bankstatementparser --type ingest --input statement.pdf --output ledger.csv

# Interactive review mode
bankstatementparser --type review --input result.json
bankstatementparser --type review --input result.json --output reviewed.json

# Export to CSV with streaming
bankstatementparser --type camt --input statement.xml --output transactions.csv
bankstatementparser --type camt --input statement.xml --streaming --show-pii

CLI options:

Local Development Setup

git clone https://github.com/sebastienrousseau/bankstatementparser.git
cd bankstatementparser
python3 -m venv .venv && source .venv/bin/activate
pip install poetry && poetry install --with dev
make install-hooks   # pre-commit hook runs `make verify` before every commit

Run the test suite:

pytest

API Reference

Parser Classes

Class Format Import
CamtParser CAMT.053 (ISO 20022) from bankstatementparser import CamtParser
Pain001Parser PAIN.001 (ISO 20022) from bankstatementparser import Pain001Parser
CsvStatementParser CSV from bankstatementparser import CsvStatementParser
OfxParser OFX from bankstatementparser import OfxParser
QfxParser QFX from bankstatementparser import QfxParser
Mt940Parser MT940 from bankstatementparser import Mt940Parser
smart_ingest() PDF (hybrid pipeline) from bankstatementparser.hybrid import smart_ingest

Utility Functions

Function Purpose
detect_statement_format(path) Auto-detect file format
create_parser(path, fmt) Create the appropriate parser
parse_files_parallel(paths) Parse multiple files concurrently
iter_secure_xml_entries(zip_path) Iterate ZIP entries securely
smart_ingest(path) Hybrid PDF extraction with verification
scan_and_ingest(dir, pattern) Bulk directory scanning
verify_balance_multi_currency(txns) Per-currency balance verification
to_hledger(txns, account) Export to hledger journal format
to_beancount(txns, account) Export to beancount journal format

Data Classes

Class Purpose
Deduplicator Detect duplicate transactions
DeduplicationResult Result with unique, exact, and suspected matches
InputValidator Validate file paths and formats
Transaction Normalised transaction record
FileResult Result from parallel parsing
ZipXMLSource ZIP member wrapper
IngestResult Hybrid pipeline result with verification
VerificationResult Balance verification outcome
Categorizer LLM-powered transaction categorisation
AccountMapper Regex-based account mapping rules

Exceptions

Exception When Raised
ParserError Parsing failures
ExportError Export failures (CSV/JSON/Excel)
ValidationError Input validation failures
ZipSecurityError ZIP security check failures