TL;DR: Bank Statement Parser ṣe ilana gbogbo data ni agbegbe, ṣe isọdọtun PII nipasẹ aiyipada, ṣe lile itupalẹ XML si ikọlu XXE, ṣe iṣiṣẹ awọn LLM ni agbegbe nipasẹ Ollama, ati gbe pẹlu awọn igbẹkẹle ti SHA-256 hash tiipa ati CycloneDX SBOM.
Aabo Nipasẹ Apẹrẹ
Bank Statement Parser ni a kọ fun sisẹ data inawo ifura. Gbogbo ipinnu apẹrẹ ṣe pataki aabo, ikọkọ, ati agbara iṣayẹwo.
Igbẹkẹle Awọsanma Odo
Gbogbo sisẹ n ṣẹlẹ ni agbegbe laarin asiko ṣiṣe rẹ. Awọn parser ipinnu ko ṣe ipe nẹtiwọọki rara. Opo gigun PDF aladapọ lo Ollama fun iṣawari LLM agbegbe — ko si data ti a fi ranṣẹ si awọn API awọsanma. A ṣe iṣeto awọn parser XML ni kedere pẹlu no_network=True, resolve_entities=False, ati load_dtd=False lati ṣe idiwọ wiwọle eyikeyi ti o njade.
Isọdọtun PII
Alaye idanimọ tikalararẹ (awọn orukọ, IBANs, awọn adirẹsi ifiweranṣẹ) ni a sọ di mimọ laifọwọyi ninu iṣelọpọ CLI ati ipo sisanwọle. Eyi wa ni titan nipasẹ aiyipada.
- CLI: Awọn aaye ifura fihan bi
***REDACTED*** - Sisanwọle:
parse_streaming(redact_pii=True)(aiyipada) - Awọn okeere: CSV/JSON/Excel ṣe idaduro data kikun fun sisẹ isalẹ
- Yan wọle: Lo
--show-piitabiredact_pii=Falsenigbati o nilo iṣelọpọ ti ko ni isọdọtun
Aabo XML (Idaabobo XXE)
Gbogbo itupalẹ XML lo lxml pẹlu awọn eto lile:
resolve_entities=False-- ṣe idiwọ awọn ikọlu imugboroja nkan XMLno_network=True-- dina gbogbo iraye si nẹtiwọọki ti njade lati parserload_dtd=False-- ṣe idiwọ awọn ikọlu ti o da lori DTD- Yiyọ aaye orukọ ṣaaju sisẹ -- mu eyikeyi iyatọ CAMT.053 lailewu
Aabo Ile Ipamọ ZIP
iter_secure_xml_entries() ṣe ifọwọsi gbogbo ọmọ ẹgbẹ ZIP ṣaaju isediwon:
- Fila iwọn titẹ sii: 10 MB fun titẹ sii (atunto ṣee ṣe)
- Lapapọ iwọn fila: 50 MB lapapọ ti a ko fi sinu (atunto ṣee ṣe)
- Opin ipin funmorawon: aiyipada 100:1 -- ṣe iwari awọn bombu ZIP
- Ijusile titẹsi ti paroko: Awọn titẹ sii ti paroko ni a fo pẹlu ikilọ
- Ko si kikọ si disk: Awọn baiti XML kọja taara si parser nipasẹ
from_bytes()
Idena Ipa-ọna Rin-kakiri
Ifọwọsi titẹ sii dina awọn ọna faili ti o lewu:
- Awọn baiti asan, awọn ilana rin-kakiri itọsọna (
../), ati awọn symlinks ni a kọ - Ifọwọsi itẹsiwaju faili lodi si awọn ọna kika ti a nireti
- Awọn opin iwọn faili (aiyipada 100 MB, atunto ṣee ṣe)
Iṣayẹwo Iwọntunwọnsi (Ofin Goolu)
A ṣe iṣayẹwo gbogbo isediwon PDF pẹlu idogba: opening balance + credits − debits == closing balance. A samisi awọn abajade bi VERIFIED, DISCREPANCY, tabi FAILED. Awọn aiyede le ṣe atunyẹwo ni ifọrọwanilẹnuwo pẹlu --type review.
Iṣelọpọ Ipinnu
Fun awọn ọna kika ti a ṣeto (CAMT, PAIN.001, CSV, OFX, QFX, MT940), fifun faili igbewọle kanna, parser ṣe agbejade iṣelọpọ baiti-kan-naa ni gbogbo ṣiṣe. Ko si aileto, ko si itọkasi awoṣe, ko si iṣapẹẹrẹ heuristic. Eyi ṣe pataki fun:
- Isọdọtun iṣayẹwo: Ṣiṣe faili kanna lẹẹmeji ki o ṣe afiwe iṣelọpọ
- Ibamu ilana: Ṣe afihan sisẹ deede
- Ijẹrisi CI: Awọn idanwo 718 fi agbara mu ipinnu pẹlu agbegbe ẹka 100%
Aabo Ẹwọn Ipese
- Awọn igbẹkẹle ti SHA-256 hash tiipa: Gbogbo package ninu
poetry.lockni awọn hash faili ti a ṣe idaniloju - CycloneDX SBOM: Gbogbo itusilẹ pẹlu Iwe-aṣẹ Ohun elo Sọftiwia
- Ẹri ikole GitHub: Ijẹrisi ṣe asopọ ohun-ọṣọ kọọkan si igbimọ orisun rẹ
- Awọn igbimọ ti a fọwọ si: Gbogbo awọn igbimọ jẹ fifi-SSH-fọwọ si ati rii daju ni CI
- Ijẹrisi igbẹkẹle:
scripts/verify_locked_hashes.pyṣe idaniloju gbogbo awọn hash ni agbegbe
Ṣe Idaniloju Ni Agbegbe
python -m pytest # 718 tests, 100% branch coverage
python scripts/verify_locked_hashes.py # SHA-256 hash verification
git log --show-signature -1 # Verify commit signature