Tambayoyi da ake yawan yi

Tambayoyi game da Mai nazarin bayanin banki

Sirrin Bayanai Da Bin Doka

Shin wani bayani yana barin ababen more zaman gidana?

A'a. Mai nazarin bayanin banki yana aiki a matsayin ɗakin karatu marar yanayi. Duk sarrafa -- bincike, ɓoye PII, fitar da kayan ajiya -- yana faruwa a cikin ƙwaƙwalwar gudanarwa na gida. Babu kiran API, babu ayyukan girgije, babu sa ido. Ana ƙarfafa masu binciken XML tare da no_network=True, wanda ke toshe duk damar shiga waje a matakin mai bincike. Bayanan kuɗinku ba sa taɓa barin muhallinku.

Yaya ɓoye PII ke aiki?

Ana rufe filayen sirri kafin su isa ga dabaru na aikace-aikacenku. Mai binciken yana gano sunaye masu bashi, sunaye masu karɓa, IBANs, da adireshin gidan waya, yana maye gurbinsu da ***REDACTED*** a cikin fitowar tasha da yanayin streaming.

  • Ana ɓoye ta asali a cikin fitowar CLI da yanayin streaming.
  • Fitarwa na fayil (CSV, JSON, Excel) suna riƙe da bayanan da ba a ɓoye ba don sarrafa ƙasa.
  • Zaɓi don ganin bayanan cikakku tare da --show-pii a CLI ko redact_pii=False a API.

Shin tsarin fitar bayanai mai daidaituwa ne?

E -- fitowar da ta daidaita byte-byte a kowane gudanarwa. Idan aka ba da fayilin shigarwa ɗaya, mai binciken yana samar da sakamako ɗaya kowane lokaci. Babu rashin tsari, babu ƙirar hasashe, babu samfurin heuristic. CI yana tilasta daidaituwa tare da gwaje-gwaje 467 a rufin reshe 100%, gami da gwajin da aka yi ta amfani da Hypothesis.

Wanne ma'aunin bin doka ne aikin ya bi?

Aikin yana kula da takaddun da suka dace da ISO 13485 tare da cikakken bincike:

  • Rajistan Haɗari mai ƙididdiga tare da ƙimar tsanani/yiwuwa da tantancewa kan ragowar haɗari.
  • Shirin Tabbatarwa da Ingantawa tare da matakai 19 masu ƙofa a cikin mataki 5.
  • Hanyar Sarrafa Canje-canje tare da tantancewa kan tasiri da hanyoyin dawo da baya.
  • Rajistan SOUP wanda ya ƙunshi duk dogaro tare da matakan haɗari da binciken ƙarshen lokaci.
  • Matirixin Bincike wanda ke danganta shigarwar tsari zuwa aiwatarwa da tabbatarwa.

Kowane fitar da sabo yana ƙunshe da CycloneDX SBOM, checksum na SHA-256, da tabbacin asalin ginin GitHub.

Aiki Da Ƙarfin Girma

Yaya Mai nazarin bayanin banki yake da sauri?

Ana tabbatar da ƙimar aiki a CI a kowane aikatawa:

Ma'auni Ƙima
Yawan CAMT.053 Ma'amaloli 27,000+/daƙiƙa
Yawan PAIN.001 Ma'amaloli 52,000+/daƙiƙa
Jinkirin kowane ma'amala (CAMT) Microseconds 37
Jinkirin kowane ma'amala (PAIN.001) Microseconds 19
Lokaci zuwa sakamako na farko < 2 ms

Yaya ake sarrafa manyan fayiloli?

Streaming tare da ƙwaƙwalwar ajiya mai iyaka -- an gwada a ma'amaloli 50,000 a kowane fayil. Yi amfani da parse_streaming() don sarrafa fayilolin XML a hankali. Ana samar da kowane ma'amala a matsayin ƙamus; ana share abubuwa bayan sarrafa su don hana girman ƙwaƙwalwa. Ƙwaƙwalwar ajiya ba ta girma tare da girman fayil ba -- gwajin ma'amaloli 50K (25+ MB) yana amfani da ƙasa da ninki 2 na ƙwaƙwalwar gwajin ma'amaloli 10K.

Don fayiloli da suka wuce 50 MB (misali, baturan PAIN.001 na host-to-host tare da biyan kuɗi 100K+), mai binciken yana streaming ta hanyar fayil na ɗan lokaci tare da cire sunayen sarari a zangon -- ba a taɓa loda duk takaddun a cikin ƙwaƙwalwa ba.

Yaya ake sarrafa kayan ajiya na ZIP cikin tsaro?

iter_secure_xml_entries() yana tabbatar da kowane memba kafin fitar da shi:

  • Iyakar girman shigarwa (ta asali 10 MB a kowane shigarwa)
  • Jimlar iyakar girman da ba a matse ba (ta asali 50 MB)
  • Iyakar haɗin matsa (ta asali 100:1) don hana bam na ZIP
  • Ƙin shigarwar da aka ɓoye

Ba a rubuta wani fayil zuwa faifai ba. Bytes na XML suna wucewa kai tsaye zuwa mai binciken ta from_bytes().

Shin zan iya bincika fayiloli da yawa a layi ɗaya?

E. Yi amfani da parse_files_parallel() wanda ke rarraba aiki a cikin ProcessPoolExecutor:

from bankstatementparser import parse_files_parallel

results = parse_files_parallel([
    "statements/jan.xml",
    "statements/feb.xml",
    "statements/mar.xml",
])
for r in results:
    print(r.path, r.status, len(r.transactions), "rows")

Tsarin Da Ake Tallafawa

Wanne tsarin bayanan banki ake tallafawa?

Tsari Ma'auni Nau'in Fayil Ajin Mai Bincike
CAMT.053 ISO 20022 Bayanin Banki-zuwa-Abokin ciniki .xml CamtParser
PAIN.001 ISO 20022 Fara Canja Kuɗi .xml Pain001Parser
CSV Fitarwa gabaɗayan banki .csv CsvStatementParser
OFX Open Financial Exchange .ofx OfxParser
QFX Quicken Financial Exchange .qfx QfxParser
MT940 Ma'aunin SWIFT .mt940, .sta Mt940Parser

Shin mai binciken yana sarrafa yarukan CAMT.053 na musamman na banki?

E -- ba a damuwa da sunayen sarari ta tsari. Mai binciken yana cire sunayen sarari na XML kafin sarrafa su, yana sarrafa kowane nau'in CAMT.053 (camt.053.001.02, camt.053.001.04, ko ɓoyayyen banki na musamman) ba tare da saitunan sunayen sarari na musamman ba. Binciken XPath yana nufin tsarin abubuwa, ba URI na sunayen sarari ba.

Don bankuna da ke ɓoye CAMT a cikin ambulaf na musamman, yi amfani da from_string() ko from_bytes() don ciyar da takadda ta cikin kai tsaye.

Shin zan iya danganta kanun ginshiƙai na CSV na musamman zuwa tsarin ma'auni?

E -- daidaita ta atomatik, babu saituna. CsvStatementParser yana gane bambancin kan ginshiƙai na yau da kullum: "Date", "Transaction Date", "Booking Date" duk suna danganta zuwa filin date. "Amount", "Value", "Sum" suna danganta zuwa amount. Ginshiƙan bashi/karɓa na daban (misali, "Credit" da "Debit") ana gano su kuma ana haɗa su zuwa adadi guda mai alama ta atomatik.

Menene tsarin fitowar?

Duk masu bincike suna samar da pandas DataFrames masu daidaituwa tare da nau'in ginshiƙai masu daidaituwa:

Tsari Manyan Ginshiƙai
CAMT Amount, Currency, DrCr, Debtor, Creditor, Reference, ValDt, BookgDt, AccountId
PAIN.001 PmtInfId, PmtMtd, InstdAmt, Currency, CdtrNm, EndToEndId, MsgId, CreDtTm, NbOfTxs
CSV/OFX/QFX/MT940 date, description, amount (an daidaita)

Hakanan za ku iya fitar zuwa CSV, JSON, Excel, ko canza zuwa Polars DataFrames.

Hanyoyin Baitulmali

Yaya mai binciken ke sarrafa bayanan kuɗaɗe da yawa?

Kowane ma'amala yana riƙe da kuɗin asalin sa -- babu canje-canje na ɓoye. Ana fitar da filin Currency daga siffar XML Ccy a kowane ma'amala. Bayanan kuɗaɗe da yawa suna kasancewa kamar yadda suke. Hanyar get_account_balances() tana dawo da ma'aunin buɗewa da rufewa a kowane asusu tare da lambobin kuɗi na asali. Ana barin sulhun kuɗaɗe da yawa ga dabaru na ƙasa, inda kuke sarrafa tushen ƙimar musaya.

Shin mai binciken yana tallafawa tsarin fita da shiga duka biyu?

E. Pain001Parser yana sarrafa fayilolin PAIN.001 na ISO 20022 na fara canja kuɗi (biyan kuɗi masu fita). CamtParser yana sarrafa fayilolin CAMT.053 na bayanin banki-zuwa-abokin ciniki (rahoton shiga). Duka biyu suna tallafawa streaming, ɓoye PII, da fitar zuwa CSV, JSON, da Excel. Yi amfani da detect_statement_format() don gano tsari ta atomatik.

Me ke faruwa lokacin da shigarwar ma'amala ba ta daidaitu ba?

Halin ya dogara ga yanayin bincike:

  • parse() (yanayin jeri) -- Ana tsallake shigarwa da ba su da filayen da ake buƙata (Amount, Currency, ko CdtDbtInd) tare da gargaɗin rijistar. Sauran bayanin yana bincike yadda ya kamata.
  • parse_streaming() (yanayin streaming) -- Kurakuran bincike suna yaɗuwa nan da nan a matsayin abubuwan banda. Babu asarar bayanai a asirce. Wannan halin da ke kawo gazawa cikin sauri an yi shi da gangan don hanyoyin kuɗi inda dole ne a lissafa kowane ma'amala.

Yaya cire kwafi ke aiki?

Ajin Deduplicator yana gano kwafi daidai da abin da ake zargi tare da ƙimar amincewa masu bayani:

from bankstatementparser import CamtParser, Deduplicator

parser = CamtParser("statement.xml")
dedup = Deduplicator()
result = dedup.deduplicate(dedup.from_dataframe(parser.parse()))

print(f"Na musamman: {len(result.unique_transactions)}")
print(f"Kwafi daidai: {len(result.exact_duplicates)}")
print(f"Abin da ake zargi: {len(result.suspected_matches)}")

Shigarwa Da Dacewa

Yaya zan shigar da Mai nazarin bayanin banki?

pip install bankstatementparser

Don tallafin Polars DataFrame na zaɓi:

pip install bankstatementparser[polars]

Wanne sigar Python ake tallafawa?

Python 3.9 zuwa 3.14. Ana gwada duk sigogi a CI tare da gwaje-gwaje 467 a rufin reshe 100%.

Menene dogaron?

Ɗakin karatu yana da dogaro kai tsaye guda 5:

  • lxml -- Binciken XML tare da ƙarfafa tsaro
  • pandas -- DataFrames da sarrafa bayanai
  • openpyxl -- Fitar da Excel
  • pydantic -- Tabbatarwa da samfuran bayanai
  • defusedxml -- Kariya daga XXE

Duk dogaro suna da sigogin da aka kulle da SHA-256 hash. CycloneDX SBOM yana taswirar kowane ɓangaren gudanarwa.

Shin yana aiki a macOS, Linux, da Windows?

E. Ɗakin karatu yana aiki a macOS, Linux, da Windows (ta WSL). Ba shi da dogaro na musamman ga dandamali.

Sake Samarwa Da Tsaro

Yaya zan iya tabbatar da sake samarwa?

python -m pytest                              # gwaje-gwaje 467, rufin reshe 100%
python scripts/verify_locked_hashes.py        # tabbatar da SHA-256 hash
git log --show-signature -1                   # tabbatar da sa hannun aikatawa

Wanne kariya na tsaro aka gina ciki?

  • Kariya Daga XXE: resolve_entities=False, no_network=True, load_dtd=False
  • Kariya Daga Bam Na ZIP: Iyakokin haɗin matsa, ƙayyadaddun girman shigarwa, ƙin shigarwar da aka ɓoye
  • Hana Bin Hanyar Fayil: Jerin tsare-tsare masu haɗari da warware hanyoyin haɗin kai
  • Tabbatar Da Shigarwa: Iyakokin girman fayil (100 MB ta asali), tabbatar da tsari/ƙari
  • Sarkar Samarwa: Dogaro da aka kulle da SHA-256 hash, CycloneDX SBOM, tabbacin asalin gini
  • Sa Hannun Aikatawa: Ana tilasta a CI

Yaya Mai nazarin bayanin banki yake kwatanta da pyiso20022?

pyiso20022 kayan aikin ISO 20022 ne mai fadi wanda ke samar da Python dataclasses daga tsarin XML na ISO. Yana ɗaukar nau'ikan saƙonnin ISO 20022 da yawa (PACS, PAIN, CAMT, ADMI) tare da tabbatar da tsari. Mai nazarin bayanin banki an gina shi musamman don bincika bayanan banki tare da tallafin streaming, ɓoye PII, cire kwafi, da API mai haɗin kai a cikin tsari shida ciki har da tsarin da ba na ISO ba (CSV, OFX, QFX, MT940). Idan kuna buƙatar bincika bayanan banki zuwa DataFrames tare da tsaro na samarwa, yi amfani da Mai nazarin bayanin banki. Idan kuna buƙatar yin aiki da cikakken kundin saƙonnin ISO 20022, yi amfani da pyiso20022.

Menene ƙarshen lokacin ƙaura ISO 20022 na SWIFT?

SWIFT ta wallafa jadawalin ƙaura a mataki-mataki:

  • Nuwamba 2026: Adireshi mai tsari da na haɗin gwiwa suna da tilas. Za a ƙi saƙonnin MT101 masu umarni da yawa. Mataki na 1 na Sarrafa Shari'a ya fara.
  • Nuwamba 2027: Duk cibiyoyin kuɗi dole ne su iya karɓar bayanan CAMT.053 na asali. SWIFT za ta daina canza tsari daga MT zuwa ISO.
  • Nuwamba 2028: Cikakken dakatar da MT940, MT942, MT950, MT900, da MT910. Za a maye gurbinsu da CAMT.052, CAMT.053, da CAMT.054.

Mai nazarin bayanin banki yana tallafawa duka tsoffin tsari na MT940 da sabbin tsarin CAMT.053/PAIN.001, wanda ke sa shi ya dace don lokacin sauyi.