Bibẹrẹ

Bẹrẹ Ilé Awọn ohun elo to ni aabo pẹlu Parser Gbólóhùn Bank

Awọn Ibeere

Fi Sori Ẹrọ

# Fifi sori ẹrọ ipilẹ (awọn parser ipinnu nikan)
pip install bankstatementparser

Awọn afikun iyan fun awọn agbara afikun:

# Ọna ọrọ-LLM fun awọn PDF oni-nọmba (litellm + pypdf)
pip install 'bankstatementparser[hybrid]'

# Isediwon tabili didara julọ (fi pdfplumber kun)
pip install 'bankstatementparser[hybrid-plus]'

# Ọna iran-LLM fun awọn PDF ti a ṣe scan (fi pypdfium2 kun)
pip install 'bankstatementparser[hybrid-vision]'

# Isọri iṣowo ti LLM ṣe
pip install 'bankstatementparser[enrichment]'

# Iṣẹ-kekere REST API (FastAPI + uvicorn)
pip install 'bankstatementparser[api]'

# Atilẹyin Polars DataFrame ti o jẹ iyan
pip install 'bankstatementparser[polars]'

Ibẹrẹ Kiakia

Ṣewadii-Laifọwọyi Ki O Ṣe Itupalẹ Eyikeyi Ọna Kika Ti A Ṣeto

from bankstatementparser import create_parser, detect_statement_format

fmt = detect_statement_format("transactions.ofx")
parser = create_parser("transactions.ofx", fmt)
df = parser.parse()  # pandas DataFrame
print(df.head())

Eyi ṣiṣẹ pẹlu awọn faili .xml (CAMT/PAIN.001), .csv, .ofx, .qfx, .mt940, ati .sta.

Parse CAMT.053

from bankstatementparser import CamtParser

parser = CamtParser("statement.xml")
transactions = parser.parse()

Parse PAIN.001

from bankstatementparser import Pain001Parser

parser = Pain001Parser("payment.xml")
payments = parser.parse()

Ṣe Itupalẹ Awọn Alaye Banki PDF (Opo Gigun Aladapọ)

Opo gigun aladapọ naa darí awọn PDF nipasẹ awọn ọna isediwon mẹta ni ọgbọn:

from bankstatementparser.hybrid import smart_ingest

result = smart_ingest("statement.pdf")
print(result.source_method)         # "deterministic" | "llm" | "vision"
print(result.verification.status)   # VERIFIED | DISCREPANCY | FAILED
print(result.transactions)          # List of extracted transactions

A ṣe idanwo gbogbo isediwon pẹlu Ofin Goolu: opening + credits − debits == closing.

Sisanwọle Awọn Faili Nla

Fun awọn faili pẹlu ẹgbẹẹgbẹrun awọn iṣowo, lo sisanwọle lati jẹ ki iranti di opin:

parser = CamtParser("large_statement.xml")
for transaction in parser.parse_streaming(redact_pii=True):
    process(transaction)  # Memory stays constant

Itupalẹ Inu-Iranti

Ṣe itupalẹ lati awọn baiti laisi disk I/O -- wulo fun SFTP tabi awọn ṣiṣan iṣẹ API:

xml_bytes = download_from_sftp()
parser = CamtParser.from_bytes(xml_bytes, source_name="daily.xml")
transactions = parser.parse()

Sisẹ Faili Ni Afiwe

Ṣe itupalẹ awọn faili lọpọlọpọ ni akoko kanna:

from bankstatementparser import parse_files_parallel

results = parse_files_parallel([
    "statements/jan.xml",
    "statements/feb.xml",
    "statements/mar.xml",
])
for r in results:
    print(r.path, r.status, len(r.transactions), "rows")

Ayẹwo Folda Pupọ

Ṣe ilana awọn igi folda odidi pẹlu yiyọ ẹda laifọwọyi:

from bankstatementparser.hybrid import scan_and_ingest

batch = scan_and_ingest("statements/2026/", pattern="**/*.pdf")
print(f"Processed: {len(batch.results)} files")
print(f"Unique transactions: {batch.unique_count}")

Yiyọ Ẹda

Awọn hash iṣowo aláìyípadà fun gbigbe afikun to ni aabo:

from bankstatementparser import CamtParser, Deduplicator

parser = CamtParser("statement.xml")
dedup = Deduplicator()
result = dedup.deduplicate(dedup.from_dataframe(parser.parse()))

print(f"Unique: {len(result.unique_transactions)}")
print(f"Exact duplicates: {len(result.exact_duplicates)}")
print(f"Suspected matches: {len(result.suspected_matches)}")

Isọri Iṣowo (Imudara)

Ṣe isọri awọn iṣowo laifọwọyi nipa lilo ipinpin ti LLM ṣe:

from bankstatementparser.enrichment import Categorizer

categorizer = Categorizer()
enriched = categorizer.categorize_batch(transactions)
for txn in enriched:
    print(f"{txn.description}: {txn.category}")

Okeere Iwe Iroyin (hledger / beancount)

Ṣe okeere awọn iṣowo si awọn ọna kika iwe iroyin iṣiro ọrọ-lasan:

from bankstatementparser.export import to_hledger, to_beancount

journal = to_hledger(transactions, account="Assets:Bank:Checking")
beancount_journal = to_beancount(transactions, account="Assets:Bank:Checking")

Iṣayẹwo Iwọntunwọnsi Owo Pupọ

Ṣe iṣayẹwo awọn iwọntunwọnsi ni ominira fun ẹgbẹ owo kọọkan:

from bankstatementparser.hybrid import verify_balance_multi_currency

results = verify_balance_multi_currency(transactions)
for currency, verification in results.items():
    print(f"{currency}: {verification.status}")

REST API

Ṣe imuṣiṣẹ gẹgẹbi iṣẹ-kekere FastAPI:

# Bẹrẹ olupin API
bankstatementparser-api --port 8000

# Fun imuṣiṣẹ ninu apoti
bankstatementparser-api --host 0.0.0.0 --port 9000

Awọn endpoint:

Sisẹ ZIP To Ni Aabo

Ṣe ilana awọn faili XML ti a fi sinu ZIP pẹlu awọn ayẹwo aabo ti a ṣe sinu (idaabobo bombu, ijusile titẹsi fifi ẹnọ kọ nkan):

from bankstatementparser import iter_secure_xml_entries, CamtParser

for entry in iter_secure_xml_entries("statements.zip"):
    parser = CamtParser.from_bytes(entry.xml_bytes, source_name=entry.source_name)
    print(f"{entry.source_name}: {len(parser.parse())} transactions")

Okeere

parser = CamtParser("statement.xml")
parser.export_csv("output.csv")
parser.export_json("output.json")

# Polars (requires bankstatementparser[polars])
polars_df = parser.to_polars()

# Excel
parser.camt_to_excel("output.xlsx")

Lilo CLI

# Ṣe itupalẹ awọn ọna kika ti a ṣeto
bankstatementparser --type camt --input statement.xml
bankstatementparser --type pain001 --input payment.xml

# Opo gigun PDF aladapọ
bankstatementparser --type ingest --input statement.pdf
bankstatementparser --type ingest --input statement.pdf --output ledger.csv

# Ipo atunyẹwo ifọrọwanilẹnuwo
bankstatementparser --type review --input result.json
bankstatementparser --type review --input result.json --output reviewed.json

# Okeere si CSV pẹlu sisanwọle
bankstatementparser --type camt --input statement.xml --output transactions.csv
bankstatementparser --type camt --input statement.xml --streaming --show-pii

Awọn aṣayan CLI:

Eto Idagbasoke Agbegbe

git clone https://github.com/sebastienrousseau/bankstatementparser.git
cd bankstatementparser
python3 -m venv .venv && source .venv/bin/activate
pip install poetry && poetry install --with dev
make install-hooks   # pre-commit hook runs `make verify` before every commit

Ṣiṣe akojọpọ idanwo:

pytest

Itọkasi API

Awọn Kilasi Parser

Kilasi Ọna kika Agbewọle
CamtParser CAMT.053 (ISO 20022) from bankstatementparser import CamtParser
Pain001Parser PAIN.001 (ISO 20022) from bankstatementparser import Pain001Parser
CsvStatementParser CSV from bankstatementparser import CsvStatementParser
OfxParser OFX from bankstatementparser import OfxParser
QfxParser QFX from bankstatementparser import QfxParser
Mt940Parser MT940 from bankstatementparser import Mt940Parser
smart_ingest() PDF (opo gigun aladapọ) from bankstatementparser.hybrid import smart_ingest

Awọn Iṣẹ Iwulo

Iṣẹ Idi
detect_statement_format(path) Ṣe iwari ọna kika faili laifọwọyi
create_parser(path, fmt) Ṣẹda parser ti o yẹ
parse_files_parallel(paths) Ṣe itupalẹ awọn faili lọpọlọpọ ni akoko kanna
iter_secure_xml_entries(zip_path) Ṣe atunwo awọn titẹ sii ZIP ni aabo
smart_ingest(path) Isediwon PDF aladapọ pẹlu iṣayẹwo
scan_and_ingest(dir, pattern) Ayẹwo folda pupọ
verify_balance_multi_currency(txns) Iṣayẹwo iwọntunwọnsi fun owo kọọkan
to_hledger(txns, account) Okeere si ọna kika iwe iroyin hledger
to_beancount(txns, account) Okeere si ọna kika iwe iroyin beancount

Awọn Kilasi Data

Kilasi Idi
Deduplicator Ṣe iwari awọn iṣowo ẹda-iwe
DeduplicationResult Abajade pẹlu alailẹgbẹ, deede, ati awọn ibaamu ti a fura si
InputValidator Ṣe afọwọsi awọn ọna faili ati awọn ọna kika
Transaction Igbasilẹ iṣowo deede
FileResult Abajade lati itupalẹ afiwe
ZipXMLSource Apoti ọmọ ẹgbẹ ZIP
IngestResult Abajade opo gigun aladapọ pẹlu iṣayẹwo
VerificationResult Abajade iṣayẹwo iwọntunwọnsi
Categorizer Isọri iṣowo ti LLM ṣe
AccountMapper Awọn ofin ìtọ́ka àkáǹtì ti o da lori regex

Awọn Imukuro

Imukuro Igba Ti A Gbe Soke
ParserError Awọn ikuna itupalẹ
ExportError Awọn ikuna okeere (CSV/JSON/Excel)
ValidationError Awọn ikuna afọwọsi igbewọle
ZipSecurityError Awọn ikuna ayẹwo aabo ZIP