TL;DR: Bank Statement Parser yana sarrafa duk bayanai a gida, yana share PII ta tsohuwa, yana taurare fassarar XML akan hare-haren XXE, yana gudanar da LLMs a gida ta Ollama, kuma yana zuwa tare da SHA-256 hash-locked dependencies da CycloneDX SBOM.
Tsaro ta Ƙira
An gina Bank Statement Parser don sarrafa bayanan kuɗi masu mahimmanci. Kowane yanke shawara na ƙira yana ba da fifikon tsaro, keɓantawa, da iya tantancewa.
Babu Dogaron Cloud
Duk aiki yana faruwa a gida a cikin lokacin aikin ku. Masu fassara na deterministic ba sa yin kiran cibiyar sadarwa ko ɗaya. Hybrid PDF pipeline yana amfani da Ollama don sarrafa LLM na gida — ba a aika bayanai zuwa cloud APIs ba. An daidaita masu fassara XML a sarari tare da no_network=True, resolve_entities=False, da load_dtd=False don hana duk wani hanyar fita waje.
Share PII
Bayanin da za a iya gano mutum (sunaye, IBANs, adiresoshin gidan waya) ana share su ta atomatik a fitowar CLI da yanayin streaming. Wannan yana kunna ta tsohuwa.
- CLI: Filaye masu mahimmanci suna nunawa a matsayin
***REDACTED*** - Streaming:
parse_streaming(redact_pii=True)(tsohuwa) - Fitarwa: CSV/JSON/Excel suna riƙe cikakkun bayanai don sarrafa bayan haka
- Kunna: Yi amfani da
--show-piikoredact_pii=Falselokacin da kuke buƙatar fitarwa marar share
Tsaron XML (Kariyar XXE)
Duk fassarar XML tana amfani da lxml tare da saituna masu taurare:
resolve_entities=False-- yana hana harin faɗaɗa abubuwan XMLno_network=True-- yana toshe duk hanyar sadarwar fita daga parserload_dtd=False-- yana hana harin tushen DTD- Cire namespace kafin aiki -- yana sarrafa kowane bambance-bambancen CAMT.053 cikin aminci
Tsaron Ajiyar ZIP
iter_secure_xml_entries() yana tabbatar da kowane memba na ZIP kafin cirewa:
- Iyakar girman shigarwa: 10 MB kowace shigarwa (mai daidaitawa)
- Iyakar jimlar girma: 50 MB jimlar mara matsi (mai daidaitawa)
- Iyakar rabo na matsi: 100:1 tsoho -- yana gano bama-bamai na ZIP
- Ƙin shigar da rufaffen: Ana tsallake shigarwar rufaffen tare da gargaɗi
- Babu rubuta zuwa faifai: XML bytes suna wucewa kai tsaye zuwa parser ta
from_bytes()
Rigakafin Ƙetare Tafarki
Tabbatar da shigarwa yana toshe hanyoyin fayil masu haɗari:
- Baiti marasa kyau, tsarin ƙetare jagora (
../), da symlinks ana ƙin su - Tabbatar da haɓakar fayil akan tsarin da ake tsammani
- Iyakar girman fayil (tsoho 100 MB, mai daidaitawa)
Tabbatar da Balance (Golden Rule)
Ana tabbatar da kowanne cirowa daga PDF da lissafin: opening balance + credits − debits == closing balance. Ana yiwa sakamako alama VERIFIED, DISCREPANCY, ko FAILED. Ana iya bitar bambance-bambance ta hanyar hulɗa da --type review.
Fitarwa Tabbatacciya
Don tsarin da aka tsara (CAMT, PAIN.001, CSV, OFX, QFX, MT940), idan aka ba da fayil ɗin shigarwa iri ɗaya, parser yana samar da fitarwa iri ɗaya na byte kowane gudu. Babu bazuwar, babu ƙididdiga ta ƙira, babu sampling na heuristic. Wannan yana da mahimmanci don:
- Maimaituwa na tantancewa: Gudanar da fayil iri ɗaya sau biyu kuma kwatanta fitarwa
- Bin doka na tsari: Nuna daidaiton aiki
- Tabbatar da CI: Gwaje-gwaje 718 suna tilasta tabbatarwa tare da rufe reshe 100%
Tsaron Sarkar Kaya
- SHA-256 hash-locked dependencies: Kowane fakitin cikin
poetry.lockyana da hashes na fayil da aka tabbatar - CycloneDX SBOM: Kowane saki ya haɗa da Lissafin Kayan Software
- GitHub build provenance: Shaida tana danganta kowane kayan tarihi zuwa commit ɗin sa na asali
- Sa hannu kan commits: Duk commits SSH-signed ne kuma an tabbatar a cikin CI
- Tabbatar da abubuwan dogaro:
scripts/verify_locked_hashes.pyyana tabbatar da duk hashes a gida
Tabbatarwa a Gida
python -m pytest # 718 tests, 100% branch coverage
python scripts/verify_locked_hashes.py # SHA-256 hash verification
git log --show-signature -1 # Verify commit signature