TL;DR: Bank Statement Parser ɗakin karatu ne na Python mai buɗewa wanda ke fassara tsarin bayanin banki guda bakwai (CAMT.053, PAIN.001, CSV, OFX, QFX, MT940, da PDF) zuwa pandas DataFrames. Hybrid PDF pipeline tare da tabbatar da balance, REST API, ƙarin bayani, fitar da ledger, 27K+ tx/s kayan aiki.
Bank Statement Parser ɗakin karatu ne na Python mai buɗewa wanda ke tantance bayanan banki daga tsari bakwai zuwa tsararrun pandas DataFrames. Bangaren deterministic yana sarrafa tsarin da aka tsara a gida ba tare da kiran cibiyar sadarwa ba. Zaɓin hybrid PDF pipeline yana amfani da LLMs na gida (ta Ollama) don bayanan dijital da na bugu.
Wa Wannan Yake Yi Wa?
- Ƙungiyoyin baitulmali da ke ƙaura daga MT940 zuwa CAMT.053 waɗanda ke buƙatar parser wanda ke sarrafa tsofaffi da sabbin tsare-tsare yayin sauyin, da kuma bayanan PDF daga bankuna da ba sa ba da fitar da bayanai masu tsari.
- Masu haɓaka fintech da ke gina sulhu, bayar da rahoto, ko bututun lissafin kuɗi waɗanda ke son dogaro guda ɗaya tare da tabbatar da balance, rarraba ma'amaloli, da fitar da ledger.
- Ƙungiyoyin bin doka waɗanda ke buƙatar share PII ta tsohuwa, fitarwa tabbatacciya, da tabbatar da Golden Rule wanda ke gano bambance-bambance kafin su kai ga ledger.
- Masu amfani da plaintext-accounting waɗanda ke son shigar da bayanan PDF na banki kai tsaye zuwa hledger ko beancount journals.
- Duk wanda ya ƙi aika bayanan kuɗi masu mahimmanci zuwa SaaS na ɓangare na uku alhali kayan aiki na gida, buɗe tushen za su iya yin aikin.
Tsarin da Ake Tallafawa
| Tsarin | Ma'auni | Nau'in Fayil | Parser/Hanya |
|---|---|---|---|
| CAMT.053 | ISO 20022 Bank-to-Customer Statement | .xml |
CamtParser |
| PAIN.001 | ISO 20022 Credit Transfer Initiation | .xml |
Pain001Parser |
| CSV | Fitar da banki na gaba ɗaya | .csv |
CsvStatementParser |
| OFX | Open Financial Exchange | .ofx |
OfxParser |
| QFX | Quicken Financial Exchange | .qfx |
QfxParser |
| MT940 | SWIFT standard | .mt940, .sta |
Mt940Parser |
| Bayanan dijital da na bugu | .pdf |
smart_ingest() |
Duk tsarukan suna samar da daidaitattun pandas DataFrames tare da daidaitattun sunayen ginshiƙai, suna sa tsarin sarrafa bayan haka ya zama format-agnostic.
Manyan Iyawa
- Hybrid PDF Pipeline:
smart_ingest()yana turawa PDFs ta hanyoyi uku — ciro tebur na deterministic, text-LLM, ko vision-LLM — tare da tabbatar da Golden Rule balance ta atomatik. - Gano Tsari Ta Atomatik:
detect_statement_format()yana gano tsarin;create_parser()yana ƙirƙira parser ɗin da ya dace. - Tabbatar da Balance: Duba Golden Rule (
opening + credits − debits == closing) tare da matsayin VERIFIED/DISCREPANCY/FAILED. - Tabbatar da Kuɗi Da Yawa:
verify_balance_multi_currency()yana rarraba ma'amaloli ta kuɗi don tabbatarwa mai zaman kanta. - REST API: FastAPI microservice tare da
/ingestda/healthendpoints don amfani a samarwa. - Ƙarin Bayani: Rarraba ma'amaloli ta LLM tare da schemas masu sauƙin canzawa (Plaid 13-category ta tsohuwa).
- Bitar Hulɗa: Duba bambance-bambance tare da ayyukan accept/edit/skip/delete ta
--type review. - Fitar da Ledger:
to_hledger()dato_beancount()don tsarin plaintext-accounting. - Binciken Tari:
scan_and_ingest()yana sarrafa manyan fayiloli tare da cire kwafi ta atomatik. - Taswirar Asusun: Ƙa'idodin taswirar asusun bisa regex daga JSON config don fitar da ledger.
- Streaming Parsing: Sarrafa manyan fayiloli (50 MB+, ma'amaloli 50K+) tare da ƙayyadaddun ƙwaƙwalwar ajiya ta amfani da
parse_streaming(). - Sarrafa Daidaitawa: Fassara fayiloli da yawa a lokaci guda tare da
parse_files_parallel()ta amfani da ProcessPoolExecutor. - Cire Kwafi: Idempotent
transaction_hash(MD5 fingerprint) don amintaccen shigar da bayani a hankali. - Fassara a Ƙwaƙwalwar Ajiya:
from_string()dafrom_bytes()don SFTP da API workflows ba tare da faifai I/O ba. - Amintaccen Sarrafa ZIP:
iter_secure_xml_entries()tare da iyakokin rabo na matsi, iyakokin girman shigarwa, da ƙin shigar da rufaffen. - Fitarwa: CSV, JSON, Excel (
.xlsx), Polars DataFrames, hledger, da beancount journals.
Tsaro Da Sirri
- Share PII: Sunaye, IBANs, da adireshi an rufe su ta tsohuwa a fitowar CLI. Kunna da
--show-pii. - Kariyar XXE: Amfani da XML
resolve_entities=False,no_network=True,load_dtd=False. - Kariyar Bam na ZIP: Iyakokin rabo na matsi (tsohuwa 100:1), iyakokin girman shigarwa (10 MB), ƙin shigar da rufaffen.
- Rigakafin Ƙetare Tafarki: Toshe ƙirar haɗari da warware symlink.
- Tsaron Sarkar Kaya: SHA-256 hash-locked dependencies, CycloneDX SBOM, shaidar tabbatar da gini.
- LLMs Na Gida Kaɗai: Hybrid PDF pipeline yana amfani da Ollama don sarrafa gida — ba a aika bayanai zuwa APIs na cloud ba.
Aiki
| Ma'auni | Daraja |
|---|---|
| CAMT.053 throughput | 27,000+ tx/s |
| PAIN.001 throughput | 52,000+ tx/s |
| Jinkirin kowane ma'amala (CAMT) | 37 microseconds |
| Jinkirin kowane ma'amala (PAIN.001) | 19 microseconds |
| Lokaci zuwa sakamako na farko | < 2 ms |
| Girman ƙwaƙwalwar ajiya (1K-50K tx) | Constant (streaming) |
| Rufe gwaji | 100% branch coverage |
| Gwaje-gwaje | 718 a cikin fayilolin gwaji 29 |
Fara Ginawa
"Majigin GitHub"