Bank Statement Parser ɗakin karatu ne na Python mai buɗewa wanda ke tantance bayanan banki daga tsari bakwai (CAMT.053, PAIN.001, CSV, OFX, QFX, MT940, da PDF) zuwa tsararrun pandas DataFrames. Duk aiki yana gudana a gida — fitarwa tabbatacciya, share PII ta atomatik, da zaɓin hybrid PDF pipeline wanda ke amfani da LLMs na gida idan ana buƙata.
Fara cikin Daƙiƙu
pip install bankstatementparser
from bankstatementparser import create_parser, detect_statement_format
fmt = detect_statement_format("statement.xml")
parser = create_parser("statement.xml", fmt)
df = parser.parse() # pandas DataFrame, ready to use
# Parse PDFs with the hybrid pipeline (v0.0.5+)
from bankstatementparser.hybrid import smart_ingest
result = smart_ingest("statement.pdf")
print(result.source_method) # "deterministic" | "llm" | "vision"
print(result.verification.status) # VERIFIED | DISCREPANCY | FAILED
Laburare Ɗaya, Tsari Bakwai
Fassara CAMT.053, PAIN.001, CSV, OFX, QFX, MT940, da PDF zuwa tsararrun pandas DataFrames tare da API guda ɗaya mai haɗin kai. Babu buƙatar shigar da fakiti daban don kowane tsari.
| Siffar | Bank Statement Parser | OSS na tsari guda ɗaya (mt940, ofxparse) | SaaS (Ocrolus, Parseur) |
|---|---|---|---|
| Tsarin da ake tallafawa | 7, API mai haɗin kai | 1 kowanne | Da yawa (ta hanyar OCR) |
| Tallafin PDF | Hybrid pipeline (deterministic + LLM + vision) | A'a | Eh (cloud OCR) |
| Sirrin bayanai | 100% na gida (LLMs suna gudana ta Ollama) | 100% na gida | Ana aika bayanai waje |
| Farashi | Kyauta, Apache 2.0 | Kyauta | $49-$1,000+/mo |
| Tabbatar da balance | Golden Rule (buɗewa + credits − debits = rufewa) | A'a | Ya bambanta |
| Share PII | Gina-cikin, kunna ta tsohuwa | A'a | Ya bambanta |
| Streaming | Ƙwaƙwalwar ajiya mai iyaka | A'a | N/A |
| REST API | Gina-cikin FastAPI microservice | A'a | Eh |
| Cire kwafi | Idempotent transaction hashes | A'a | Wasu |
| Fitar da Ledger | hledger + beancount | A'a | A'a |
Hybrid PDF Pipeline
Bank Statement Parser v0.0.5+ ya haɗa da hybrid pipeline mai hanyoyi uku don bayanan banki na PDF:
- Hanya A (Deterministic): Teburorin PDF masu tsari ana fassara su kai tsaye — kyauta, mafi sauri, babu buƙatar LLM.
- Hanya B (Text-LLM): PDF na dijital masu tsari masu rikitarwa ana ciro su ta LLM na gida (LiteLLM/Ollama).
- Hanya C (Vision-LLM): Bayanan da aka duba ko aka kwafa ana sarrafa su da ƙirar vision.
Ana tabbatar da kowane cirowa da Golden Rule: opening balance + credits − debits == closing balance.
An Gina don Ƙaurar ISO 20022
SWIFT ya saita tabbataccen kwanakin ƙarshe: duk cibiyoyin kuɗi dole ne su karɓi CAMT.053 kafin Nuwamba 2027, kuma MT940/MT942/MT950 za su yi ritaya gabaɗaya a Nuwamba 2028. Bank Statement Parser yana sarrafa MT940 na gado da tsarin ISO 20022 na zamani (CAMT.053, PAIN.001) a cikin API guda ɗaya, don haka pipeline ɗin ku yana aiki a lokacin sauyi da kuma bayan haka.
Aiki
- 27,000+ ma'amaloli/daƙiƙa don fassarar CAMT.053
- 52,000+ ma'amaloli/daƙiƙa don fassarar PAIN.001
- < 2 ms lokacin sakamako na farko
- Ƙwaƙwalwar ajiya madaidaiciya daga 1K zuwa 50K+ ma'amaloli ta hanyar streaming
- Gwaje-gwaje 718 tare da rufe reshe 100% a cikin Python 3.10 zuwa 3.14
Me yasa Bank Statement Parser?
- Hybrid PDF Extraction:
smart_ingest()yana sarrafa PDF na dijital da na bugu tare da turawa ta atomatik da tabbatar da balance. - Gano Tsari Ta Atomatik:
detect_statement_format()yana gano fayiloli ta atomatik kumacreate_parser()yana mayar da parser ɗin da ya dace. - Sirri Da Farko: Ana kunna share PII ta tsohuwa. LLMs suna gudana a gida ta Ollama — babu bayanan da ke barin injin ku.
- REST API: Aika azaman FastAPI microservice tare da
/ingestda/healthendpoints. - Ƙarin Bayani: Rarraba ma'amaloli ta LLM tare da schemas masu sauƙin canzawa (Plaid 13-category ta tsohuwa).
- Fitar da Ledger: Fitar zuwa hledger da beancount don tsarin plaintext-accounting.
- Binciken Tari:
scan_and_ingest()yana sarrafa manyan fayiloli tare da cire kwafi ta atomatik. - Kuɗi Da Yawa:
verify_balance_multi_currency()yana gudanar da Golden Rule ga kowane rukunin kuɗi. - Shirye Don Samarwa: Amintaccen shigar da ZIP, tabbatar da shigarwa, hana ƙetare tafarki, da yanayin bita mai hulɗa.
- Fitarwa Mai Sauƙi: Fitar zuwa CSV, JSON, Excel, Polars, hledger, ko beancount.
- Sarrafa Daidaitawa: Fassara fayiloli da yawa a lokaci guda tare da
parse_files_parallel().
An Gina don Samarwa
Bank Statement Parser an ƙirƙira shi don ƙungiyoyin baitulmali, masu haɓaka fintech, da jami'an bin doka waɗanda ke sarrafa bayanan kuɗi masu mahimmanci. Ana amfani da ɗakin karatu a cikin bututun ƙaura na MT940-zuwa-CAMT, tsarin sulhu mai sarrafa kansa, shigar da bayanan PDF, da gudanar da bincike na tsari a cikin cibiyoyin kuɗi.
- Gwaje-gwaje 718 tare da rufe reshe 100% a cikin Python 3.10 zuwa 3.14
- SHA-256 hash-locked dependencies tare da CycloneDX SBOM don kowane saki
- Fitarwa tabbatacciya — shigarwa iri ɗaya yana haifar da sakamako iri ɗaya na byte, kowane gudu
- Apache 2.0 mai lasisi — amfani da yardar kaina a cikin tsarin kasuwanci da na ciki
Kuna kimanta madadin? Duba yadda Bank Statement Parser ke kwatanta ❯ | Bincika shari'o'in amfani na ainihi ❯