Security

How We Protect Your Financial Data

TL;DR: Bank Statement Parser makes zero network calls, redacts PII by default, hardens XML parsing against XXE attacks, and ships with SHA-256 hash-locked dependencies and a CycloneDX SBOM.

Security by Design

Bank Statement Parser is built for processing sensitive financial data. Every design decision prioritises security, privacy, and auditability.

Zero Network Access

All processing happens locally within your runtime. The library makes zero API calls, zero cloud connections, and collects zero telemetry. XML parsers are explicitly configured with no_network=True, resolve_entities=False, and load_dtd=False to prevent any outbound access.

PII Redaction

Personally identifiable information (names, IBANs, postal addresses) is automatically redacted in CLI output and streaming mode. This is on by default.

XML Security (XXE Protection)

All XML parsing uses lxml with hardened settings:

ZIP Archive Security

iter_secure_xml_entries() validates every ZIP member before extraction:

Path Traversal Prevention

Input validation blocks dangerous file paths:

Deterministic Output

Given the same input file, the parser produces byte-identical output every run. No randomness, no model inference, no heuristic sampling. This is critical for:

Supply Chain Security

Verify Locally

python -m pytest                          # 467 tests, 100% branch coverage
python scripts/verify_locked_hashes.py    # SHA-256 hash verification
git log --show-signature -1               # Verify commit signature