Security

How We Protect Your Financial Data

TL;DR: Bank Statement Parser processes all data locally, redacts PII by default, hardens XML parsing against XXE attacks, runs LLMs locally via Ollama, and ships with SHA-256 hash-locked dependencies and a CycloneDX SBOM.

Security by Design

Bank Statement Parser is built for processing sensitive financial data. Every design decision prioritises security, privacy, and auditability.

Zero Cloud Dependency

All processing happens locally within your runtime. The deterministic parsers make zero network calls. The hybrid PDF pipeline uses Ollama for local LLM inference — no data is sent to cloud APIs. XML parsers are explicitly configured with no_network=True, resolve_entities=False, and load_dtd=False to prevent any outbound access.

PII Redaction

Personally identifiable information (names, IBANs, postal addresses) is automatically redacted in CLI output and streaming mode. This is on by default.

XML Security (XXE Protection)

All XML parsing uses lxml with hardened settings:

ZIP Archive Security

iter_secure_xml_entries() validates every ZIP member before extraction:

Path Traversal Prevention

Input validation blocks dangerous file paths:

Balance Verification (Golden Rule)

Every PDF extraction is verified with the equation: opening balance + credits − debits == closing balance. Results are tagged as VERIFIED, DISCREPANCY, or FAILED. Discrepancies can be reviewed interactively with --type review.

Deterministic Output

For structured formats (CAMT, PAIN.001, CSV, OFX, QFX, MT940), given the same input file, the parser produces byte-identical output every run. No randomness, no model inference, no heuristic sampling. This is critical for:

Supply Chain Security

Verify Locally

python -m pytest                          # 718 tests, 100% branch coverage
python scripts/verify_locked_hashes.py    # SHA-256 hash verification
git log --show-signature -1               # Verify commit signature