TL;DR: Bank Statement Parser jẹ ile-ikawe Python orisun-ṣiṣi ti o ṣe itupalẹ awọn ọna kika alaye banki meje (CAMT.053, PAIN.001, CSV, OFX, QFX, MT940, ati PDF) sinu pandas DataFrames. Opo gigun PDF aladapọ pẹlu iṣayẹwo iwọntunwọnsi, REST API, imudara, okeere iwe iroyin, iyara 27K+ tx/s.
Bank Statement Parser jẹ ile-ikawe Python orisun-ṣiṣi ti o ṣe itupalẹ awọn alaye banki lati awọn ọna kika meje sinu pandas DataFrames ti a ṣeto. Ipilẹ ipinnu n ṣe ilana awọn ọna kika ti a ṣeto ni agbegbe lai pe nẹtiwọọki rara. Opo gigun PDF aladapọ ti o jẹ iyan n darí nipasẹ awọn LLM agbegbe (nipasẹ Ollama) fun awọn alaye oni-nọmba ati ti a ṣe scan.
Tani Eyi Fun?
- Awọn ẹgbẹ iṣura ti n yi lati MT940 si CAMT.053 ti o nilo parser ti o le mu awọn ọna kika atijọ ati tuntun lakoko iyipada, pẹlu awọn alaye PDF lati awọn banki ti ko pese okeere ti a ṣeto.
- Awọn olupilẹṣẹ fintech ti n kọ awọn opo gigun ilaja, ijabọ, tabi iṣiro ti o fẹ igbẹkẹle kan ṣoṣo pẹlu iṣayẹwo iwọntunwọnsi, ipin, ati okeere iwe iroyin ti a ṣe sinu.
- Awọn ẹgbẹ ibamu ti o nilo isọdọtun PII nipasẹ aiyipada, iṣelọpọ ipinnu, ati iṣayẹwo Ofin Goolu ti o samisi awọn aiyede ṣaaju ki wọn de iwe iroyin.
- Awọn olumulo iṣiro ọrọ-lasan ti o fẹ gbigbe aifọwọyi lati awọn alaye banki PDF taara sinu awọn iwe iroyin hledger tabi beancount.
- Ẹnikẹni ti o kọ lati fi data owo ifura ranṣẹ si SaaS ẹni-kẹta nigbati ohun elo agbegbe orisun-ṣiṣi le ṣe iṣẹ naa.
Awọn Ọna Kika Ti A Ṣe Atilẹyin
| Ọna kika | Boṣewa | Awọn Oriṣi Faili | Parser/Ọna |
|---|---|---|---|
| CAMT.053 | ISO 20022 Banki-si-Onibara Gbólóhùn | .xml |
CamtParser |
| PAIN.001 | ISO 20022 Ibẹrẹ Gbigbe Kirẹditi | .xml |
Pain001Parser |
| CSV | Okeere banki gbogbogbo | .csv |
CsvStatementParser |
| OFX | Open Financial Exchange | .ofx |
OfxParser |
| QFX | Quicken Financial Exchange | .qfx |
QfxParser |
| MT940 | Boṣewa SWIFT | .mt940, .sta |
Mt940Parser |
| Awọn alaye oni-nọmba ati ti a ṣe scan | .pdf |
smart_ingest() |
Gbogbo awọn ọna kika ṣe agbejade pandas DataFrames deede pẹlu awọn orukọ ọwọn ti o ni ibamu, ti o jẹ ki sisẹ ṣiṣan isalẹ jẹ aibikita ọna kika.
Awọn Agbara Pataki
- Opo Gigun PDF Aladapọ:
smart_ingest()darí awọn PDF nipasẹ ọna mẹta — isediwon tabili ipinnu, ọrọ-LLM, tabi iran-LLM — pẹlu iṣayẹwo iwọntunwọnsi Ofin Goolu laifọwọyi. - Iwari Ọna Kika Aifọwọyi:
detect_statement_format()ṣe idanimọ ọna kika;create_parser()ṣe imuṣiṣẹ parser to tọ. - Iṣayẹwo Iwọntunwọnsi: Ayẹwo Ofin Goolu (
opening + credits − debits == closing) pẹlu ipo VERIFIED/DISCREPANCY/FAILED. - Iṣayẹwo Owo Pupọ:
verify_balance_multi_currency()ṣe akojọpọ awọn iṣowo nipasẹ owo fun iṣayẹwo ominira. - REST API: Iṣẹ-kekere FastAPI pẹlu awọn endpoint
/ingestati/healthfun imuṣiṣẹ iṣelọpọ. - Imudara: Isọri iṣowo ti LLM ṣe pẹlu awọn ero ti a le ṣe atunṣe (aiyipada ẹka 13 Plaid).
- Atunyẹwo Ifọrọwanilẹnuwo: Rin nipasẹ awọn aiyede pẹlu awọn iṣe gba/ṣatunkọ/fo/pa nipasẹ
--type review. - Okeere Iwe Iroyin:
to_hledger()atito_beancount()fun awọn ṣiṣan iṣiro ọrọ-lasan. - Ayẹwo Pupọ:
scan_and_ingest()ṣe ilana awọn igi folda pẹlu yiyọ ẹda laifọwọyi kọja awọn faili. - Ìtọ́ka Àkáǹtì: Awọn ofin ìtọ́ka àkáǹtì ti o da lori regex lati eto JSON fun okeere iwe iroyin.
- Itupalẹ Sisanwọle: Ṣe ilana awọn faili nla (50 MB+, awọn iṣowo 50K+) pẹlu iranti ti o ni opin nipa lilo
parse_streaming(). - Sisẹ ni Afiwe: Ṣe itupalẹ awọn faili lọpọlọpọ nigbakanna pẹlu
parse_files_parallel()nipa lilo ProcessPoolExecutor. - Yiyọ Ẹda:
transaction_hashaláìyípadà (àtẹ́ka MD5) fun gbigbe afikun to ni aabo. - Itupalẹ Inu-Iranti:
from_string()atifrom_bytes()fun awọn ṣiṣan SFTP ati API lai si disk I/O. - Sisẹ ZIP To Ni Aabo:
iter_secure_xml_entries()pẹlu awọn opin ipin funmorawon, awọn bọtini iwọn titẹsi, ati ijusile iwọle ti paroko. - Okeere: CSV, JSON, Excel (
.xlsx), Polars DataFrames, hledger, ati awọn iwe iroyin beancount.
Aabo Ati Ikọkọ
- Isọdọtun PII: Awọn orukọ, IBANs, ati awọn adirẹsi ni a boju nipasẹ aiyipada ninu iṣelọpọ CLI. Yan wọle pẹlu
--show-pii. - Idaabobo XXE: Itupalẹ XML lo
resolve_entities=False,no_network=True,load_dtd=False. - Idaabobo Bombu ZIP: Awọn opin ipin funmorawon (aiyipada 100:1), awọn bọtini iwọn iwọle (10 MB), ijusile iwọle ti paroko.
- Idena Ipa-ọna Rin-kakiri: Atokọ apẹrẹ eewu ati ipinnu ami asopọ.
- Aabo Ẹwọn Ipese: Awọn igbẹkẹle ti SHA-256 hash tiipa, CycloneDX SBOM, ẹri iṣelọpọ ikole.
- Awọn LLM Agbegbe Nikan: Opo gigun PDF aladapọ lo Ollama fun iṣawari agbegbe — ko si data ti a fi ranṣẹ si awọn API awọsanma.
Iṣẹ
| Metiriki | Iye |
|---|---|
| Igbejade CAMT.053 | 27,000+ tx/s |
| Igbejade PAIN.001 | 52,000+ tx/s |
| Idaduro fun-iṣowo kan (CAMT) | 37 microseconds |
| Idaduro fun-iṣowo kan (PAIN.001) | 19 microseconds |
| Akoko si abajade akọkọ | < 2 ms |
| Iwọn iranti (1K-50K tx) | Ibakan (sisanwọle) |
| Agbegbe idanwo | 100% agbegbe ẹka |
| Awọn idanwo | 718 kọja awọn faili idanwo 29 |
Bẹrẹ Kikọ
Bẹrẹ pẹlu fifi sori ẹrọ ati awọn apẹẹrẹ ❯
"Ibi ipamọ GitHub"