Sirrin Bayanai Da Bin Doka
Shin wani bayani yana barin ababen more zaman gidana?
A'a. Mai nazarin bayanin banki yana aiki a matsayin ɗakin karatu marar yanayi. Duk sarrafa -- bincike, ɓoye PII, fitar da kayan ajiya -- yana faruwa a cikin ƙwaƙwalwar gudanarwa na gida. Babu kiran API, babu ayyukan girgije, babu sa ido. Ana ƙarfafa masu binciken XML tare da no_network=True, wanda ke toshe duk damar shiga waje a matakin mai bincike. Bayanan kuɗinku ba sa taɓa barin muhallinku.
Yaya ɓoye PII ke aiki?
Ana rufe filayen sirri kafin su isa ga dabaru na aikace-aikacenku. Mai binciken yana gano sunaye masu bashi, sunaye masu karɓa, IBANs, da adireshin gidan waya, yana maye gurbinsu da ***REDACTED*** a cikin fitowar tasha da yanayin streaming.
- Ana ɓoye ta asali a cikin fitowar CLI da yanayin streaming.
- Fitarwa na fayil (CSV, JSON, Excel) suna riƙe da bayanan da ba a ɓoye ba don sarrafa ƙasa.
- Zaɓi don ganin bayanan cikakku tare da
--show-piia CLI koredact_pii=Falsea API.
Shin tsarin fitar bayanai mai daidaituwa ne?
E -- fitowar da ta daidaita byte-byte a kowane gudanarwa. Idan aka ba da fayilin shigarwa ɗaya, mai binciken yana samar da sakamako ɗaya kowane lokaci. Babu rashin tsari, babu ƙirar hasashe, babu samfurin heuristic. CI yana tilasta daidaituwa tare da gwaje-gwaje 467 a rufin reshe 100%, gami da gwajin da aka yi ta amfani da Hypothesis.
Wanne ma'aunin bin doka ne aikin ya bi?
Aikin yana kula da takaddun da suka dace da ISO 13485 tare da cikakken bincike:
- Rajistan Haɗari mai ƙididdiga tare da ƙimar tsanani/yiwuwa da tantancewa kan ragowar haɗari.
- Shirin Tabbatarwa da Ingantawa tare da matakai 19 masu ƙofa a cikin mataki 5.
- Hanyar Sarrafa Canje-canje tare da tantancewa kan tasiri da hanyoyin dawo da baya.
- Rajistan SOUP wanda ya ƙunshi duk dogaro tare da matakan haɗari da binciken ƙarshen lokaci.
- Matirixin Bincike wanda ke danganta shigarwar tsari zuwa aiwatarwa da tabbatarwa.
Kowane fitar da sabo yana ƙunshe da CycloneDX SBOM, checksum na SHA-256, da tabbacin asalin ginin GitHub.
Aiki Da Ƙarfin Girma
Yaya Mai nazarin bayanin banki yake da sauri?
Ana tabbatar da ƙimar aiki a CI a kowane aikatawa:
| Ma'auni | Ƙima |
|---|---|
| Yawan CAMT.053 | Ma'amaloli 27,000+/daƙiƙa |
| Yawan PAIN.001 | Ma'amaloli 52,000+/daƙiƙa |
| Jinkirin kowane ma'amala (CAMT) | Microseconds 37 |
| Jinkirin kowane ma'amala (PAIN.001) | Microseconds 19 |
| Lokaci zuwa sakamako na farko | < 2 ms |
Yaya ake sarrafa manyan fayiloli?
Streaming tare da ƙwaƙwalwar ajiya mai iyaka -- an gwada a ma'amaloli 50,000 a kowane fayil. Yi amfani da parse_streaming() don sarrafa fayilolin XML a hankali. Ana samar da kowane ma'amala a matsayin ƙamus; ana share abubuwa bayan sarrafa su don hana girman ƙwaƙwalwa. Ƙwaƙwalwar ajiya ba ta girma tare da girman fayil ba -- gwajin ma'amaloli 50K (25+ MB) yana amfani da ƙasa da ninki 2 na ƙwaƙwalwar gwajin ma'amaloli 10K.
Don fayiloli da suka wuce 50 MB (misali, baturan PAIN.001 na host-to-host tare da biyan kuɗi 100K+), mai binciken yana streaming ta hanyar fayil na ɗan lokaci tare da cire sunayen sarari a zangon -- ba a taɓa loda duk takaddun a cikin ƙwaƙwalwa ba.
Yaya ake sarrafa kayan ajiya na ZIP cikin tsaro?
iter_secure_xml_entries() yana tabbatar da kowane memba kafin fitar da shi:
- Iyakar girman shigarwa (ta asali 10 MB a kowane shigarwa)
- Jimlar iyakar girman da ba a matse ba (ta asali 50 MB)
- Iyakar haɗin matsa (ta asali 100:1) don hana bam na ZIP
- Ƙin shigarwar da aka ɓoye
Ba a rubuta wani fayil zuwa faifai ba. Bytes na XML suna wucewa kai tsaye zuwa mai binciken ta from_bytes().
Shin zan iya bincika fayiloli da yawa a layi ɗaya?
E. Yi amfani da parse_files_parallel() wanda ke rarraba aiki a cikin ProcessPoolExecutor:
from bankstatementparser import parse_files_parallel
results = parse_files_parallel([
"statements/jan.xml",
"statements/feb.xml",
"statements/mar.xml",
])
for r in results:
print(r.path, r.status, len(r.transactions), "rows")
Tsarin Da Ake Tallafawa
Wanne tsarin bayanan banki ake tallafawa?
| Tsari | Ma'auni | Nau'in Fayil | Ajin Mai Bincike |
|---|---|---|---|
| CAMT.053 | ISO 20022 Bayanin Banki-zuwa-Abokin ciniki | .xml |
CamtParser |
| PAIN.001 | ISO 20022 Fara Canja Kuɗi | .xml |
Pain001Parser |
| CSV | Fitarwa gabaɗayan banki | .csv |
CsvStatementParser |
| OFX | Open Financial Exchange | .ofx |
OfxParser |
| QFX | Quicken Financial Exchange | .qfx |
QfxParser |
| MT940 | Ma'aunin SWIFT | .mt940, .sta |
Mt940Parser |
Shin mai binciken yana sarrafa yarukan CAMT.053 na musamman na banki?
E -- ba a damuwa da sunayen sarari ta tsari. Mai binciken yana cire sunayen sarari na XML kafin sarrafa su, yana sarrafa kowane nau'in CAMT.053 (camt.053.001.02, camt.053.001.04, ko ɓoyayyen banki na musamman) ba tare da saitunan sunayen sarari na musamman ba. Binciken XPath yana nufin tsarin abubuwa, ba URI na sunayen sarari ba.
Don bankuna da ke ɓoye CAMT a cikin ambulaf na musamman, yi amfani da from_string() ko from_bytes() don ciyar da takadda ta cikin kai tsaye.
Shin zan iya danganta kanun ginshiƙai na CSV na musamman zuwa tsarin ma'auni?
E -- daidaita ta atomatik, babu saituna. CsvStatementParser yana gane bambancin kan ginshiƙai na yau da kullum: "Date", "Transaction Date", "Booking Date" duk suna danganta zuwa filin date. "Amount", "Value", "Sum" suna danganta zuwa amount. Ginshiƙan bashi/karɓa na daban (misali, "Credit" da "Debit") ana gano su kuma ana haɗa su zuwa adadi guda mai alama ta atomatik.
Menene tsarin fitowar?
Duk masu bincike suna samar da pandas DataFrames masu daidaituwa tare da nau'in ginshiƙai masu daidaituwa:
| Tsari | Manyan Ginshiƙai |
|---|---|
| CAMT | Amount, Currency, DrCr, Debtor, Creditor, Reference, ValDt, BookgDt, AccountId |
| PAIN.001 | PmtInfId, PmtMtd, InstdAmt, Currency, CdtrNm, EndToEndId, MsgId, CreDtTm, NbOfTxs |
| CSV/OFX/QFX/MT940 | date, description, amount (an daidaita) |
Hakanan za ku iya fitar zuwa CSV, JSON, Excel, ko canza zuwa Polars DataFrames.
Hanyoyin Baitulmali
Yaya mai binciken ke sarrafa bayanan kuɗaɗe da yawa?
Kowane ma'amala yana riƙe da kuɗin asalin sa -- babu canje-canje na ɓoye. Ana fitar da filin Currency daga siffar XML Ccy a kowane ma'amala. Bayanan kuɗaɗe da yawa suna kasancewa kamar yadda suke. Hanyar get_account_balances() tana dawo da ma'aunin buɗewa da rufewa a kowane asusu tare da lambobin kuɗi na asali. Ana barin sulhun kuɗaɗe da yawa ga dabaru na ƙasa, inda kuke sarrafa tushen ƙimar musaya.
Shin mai binciken yana tallafawa tsarin fita da shiga duka biyu?
E. Pain001Parser yana sarrafa fayilolin PAIN.001 na ISO 20022 na fara canja kuɗi (biyan kuɗi masu fita). CamtParser yana sarrafa fayilolin CAMT.053 na bayanin banki-zuwa-abokin ciniki (rahoton shiga). Duka biyu suna tallafawa streaming, ɓoye PII, da fitar zuwa CSV, JSON, da Excel. Yi amfani da detect_statement_format() don gano tsari ta atomatik.
Me ke faruwa lokacin da shigarwar ma'amala ba ta daidaitu ba?
Halin ya dogara ga yanayin bincike:
parse()(yanayin jeri) -- Ana tsallake shigarwa da ba su da filayen da ake buƙata (Amount,Currency, koCdtDbtInd) tare da gargaɗin rijistar. Sauran bayanin yana bincike yadda ya kamata.parse_streaming()(yanayin streaming) -- Kurakuran bincike suna yaɗuwa nan da nan a matsayin abubuwan banda. Babu asarar bayanai a asirce. Wannan halin da ke kawo gazawa cikin sauri an yi shi da gangan don hanyoyin kuɗi inda dole ne a lissafa kowane ma'amala.
Yaya cire kwafi ke aiki?
Ajin Deduplicator yana gano kwafi daidai da abin da ake zargi tare da ƙimar amincewa masu bayani:
from bankstatementparser import CamtParser, Deduplicator
parser = CamtParser("statement.xml")
dedup = Deduplicator()
result = dedup.deduplicate(dedup.from_dataframe(parser.parse()))
print(f"Na musamman: {len(result.unique_transactions)}")
print(f"Kwafi daidai: {len(result.exact_duplicates)}")
print(f"Abin da ake zargi: {len(result.suspected_matches)}")
Shigarwa Da Dacewa
Yaya zan shigar da Mai nazarin bayanin banki?
pip install bankstatementparser
Don tallafin Polars DataFrame na zaɓi:
pip install bankstatementparser[polars]
Wanne sigar Python ake tallafawa?
Python 3.9 zuwa 3.14. Ana gwada duk sigogi a CI tare da gwaje-gwaje 467 a rufin reshe 100%.
Menene dogaron?
Ɗakin karatu yana da dogaro kai tsaye guda 5:
lxml-- Binciken XML tare da ƙarfafa tsaropandas-- DataFrames da sarrafa bayanaiopenpyxl-- Fitar da Excelpydantic-- Tabbatarwa da samfuran bayanaidefusedxml-- Kariya daga XXE
Duk dogaro suna da sigogin da aka kulle da SHA-256 hash. CycloneDX SBOM yana taswirar kowane ɓangaren gudanarwa.
Shin yana aiki a macOS, Linux, da Windows?
E. Ɗakin karatu yana aiki a macOS, Linux, da Windows (ta WSL). Ba shi da dogaro na musamman ga dandamali.
Sake Samarwa Da Tsaro
Yaya zan iya tabbatar da sake samarwa?
python -m pytest # gwaje-gwaje 467, rufin reshe 100%
python scripts/verify_locked_hashes.py # tabbatar da SHA-256 hash
git log --show-signature -1 # tabbatar da sa hannun aikatawa
Wanne kariya na tsaro aka gina ciki?
- Kariya Daga XXE:
resolve_entities=False,no_network=True,load_dtd=False - Kariya Daga Bam Na ZIP: Iyakokin haɗin matsa, ƙayyadaddun girman shigarwa, ƙin shigarwar da aka ɓoye
- Hana Bin Hanyar Fayil: Jerin tsare-tsare masu haɗari da warware hanyoyin haɗin kai
- Tabbatar Da Shigarwa: Iyakokin girman fayil (100 MB ta asali), tabbatar da tsari/ƙari
- Sarkar Samarwa: Dogaro da aka kulle da SHA-256 hash, CycloneDX SBOM, tabbacin asalin gini
- Sa Hannun Aikatawa: Ana tilasta a CI
Yaya Mai nazarin bayanin banki yake kwatanta da pyiso20022?
pyiso20022 kayan aikin ISO 20022 ne mai fadi wanda ke samar da Python dataclasses daga tsarin XML na ISO. Yana ɗaukar nau'ikan saƙonnin ISO 20022 da yawa (PACS, PAIN, CAMT, ADMI) tare da tabbatar da tsari. Mai nazarin bayanin banki an gina shi musamman don bincika bayanan banki tare da tallafin streaming, ɓoye PII, cire kwafi, da API mai haɗin kai a cikin tsari shida ciki har da tsarin da ba na ISO ba (CSV, OFX, QFX, MT940). Idan kuna buƙatar bincika bayanan banki zuwa DataFrames tare da tsaro na samarwa, yi amfani da Mai nazarin bayanin banki. Idan kuna buƙatar yin aiki da cikakken kundin saƙonnin ISO 20022, yi amfani da pyiso20022.
Menene ƙarshen lokacin ƙaura ISO 20022 na SWIFT?
SWIFT ta wallafa jadawalin ƙaura a mataki-mataki:
- Nuwamba 2026: Adireshi mai tsari da na haɗin gwiwa suna da tilas. Za a ƙi saƙonnin MT101 masu umarni da yawa. Mataki na 1 na Sarrafa Shari'a ya fara.
- Nuwamba 2027: Duk cibiyoyin kuɗi dole ne su iya karɓar bayanan CAMT.053 na asali. SWIFT za ta daina canza tsari daga MT zuwa ISO.
- Nuwamba 2028: Cikakken dakatar da MT940, MT942, MT950, MT900, da MT910. Za a maye gurbinsu da CAMT.052, CAMT.053, da CAMT.054.
Mai nazarin bayanin banki yana tallafawa duka tsoffin tsari na MT940 da sabbin tsarin CAMT.053/PAIN.001, wanda ke sa shi ya dace don lokacin sauyi.