Deep analysis of the Financial Fraud Link Analysis idea, its fatal flaws, and a full-stack intelligence platform proposal that will dominate Track 3.
The problem identification is perfect — cybercrime siloing across districts with no link analysis capability is the most documented, most acute gap in Nepal's Cyber Bureau. The judges explicitly asked for this. But the current solution barely scratches it.
Real investigators receive WhatsApp screenshots, Khalti app screenshots, blurry photos of receipts. No one hands them a clean CSV. The entire tool breaks before it starts.
This is a visualizer, not an analyzer. Gephi and NodeXL already do this for free. Without graph algorithms (centrality, community detection), it's just a prettier spreadsheet.
CSV upload → NetworkX → Cytoscape render. A solo developer can build this exact thing in an afternoon. Judges will recognize this immediately and penalize it hard.
Any graph produced cannot be used in court without provable chain of custody. This limits operational value to internal exploration only, not prosecution support.
Judges explicitly said "OSINT & criminal network intelligence." This direction is perfectly aligned. Don't abandon it — deepen it massively.
Offline-first, no cloud APIs, runs on a local machine. This respects Nepal's data privacy laws and the judge's Criterion 02 on low-connectivity operations.
Based on Inspector Khadka's public statements and the Cyber Bureau's operational reports, the #1 bottleneck is not visualizing data — it's getting the data into a usable form in the first place.
A victim reports on the Cyber Bureau portal. They attach a blurry screenshot of a WhatsApp conversation, a photo of an eSewa transaction receipt taken on a cheap phone, and a hand-typed description with the account number. The investigator must manually read this, type the phone number, eSewa ID, and bank account into a spreadsheet, then cross-reference it against 47 other complaints from other districts. This takes 3–5 hours per case. There are 52 cases per day. They have 6 investigators. The math is impossible.
The winning tool doesn't just display connections — it creates them automatically from raw, messy evidence that already exists in the Cyber Bureau's complaint queue.
सर्प (Sarpa) = "serpent" in Nepali — referencing how criminal networks coil invisibly around victims. Also: Surveillance, Analysis, Reporting, Profiling, and Attribution Network.
{ phone: "9841XXXXXX", esewa_id: "...", amount: 12500, date: "...", sender: "...", receiver: "..." }. This is then ingested into the graph database automatically.9841XXXXXX, +977-9841XXXXXX, and 984-1XXXXXX across different complaints. A naive system creates three separate nodes. SarpaNet's entity resolver normalizes all phone numbers to E.164 format, strips eSewa ID prefix variations, canonicalizes bank account numbers, and uses fuzzy string matching (Levenshtein distance ≤ 2) to catch typos in manually transcribed data. When the resolver detects two records are the same entity, it merges them into one node and records both source entries as evidence. This is the difference between a disconnected graph and a connected one — and it works on dirty, real-world data.| Layer | Technology | Why this choice |
|---|---|---|
| Evidence ingestion | PaddleOCR + pdf2image + PIL | PaddleOCR outperforms Tesseract on low-quality images and non-Latin scripts. Handles Devanagari numerals in eSewa/bank screenshots. Runs fully offline. |
| Entity extraction LLM | Ollama + Qwen2.5:7b | 7B model fits in 8GB VRAM, runs on a standard police laptop. Takes OCR text → structured JSON. No API key, no internet. Outperforms regex for ambiguous text. |
| Graph engine | NetworkX + SQLite | NetworkX has all graph algorithms built in (Louvain, centrality, shortest path). SQLite stores the raw data with zero setup. No Neo4j server to install during the 36hrs. |
| Entity resolution | phonenumbers + rapidfuzz | Google's phonenumbers library handles all NTC/Ncell number formats. rapidfuzz does Levenshtein in microseconds. |
| API backend | FastAPI (Python 3.11) | Async endpoints handle file upload, OCR, LLM extraction, and graph queries. Background tasks for enrichment. WebSocket for live graph updates. |
| Frontend | React + Cytoscape.js + D3.js | Cytoscape.js handles graph rendering with physics layout. D3.js for the temporal timeline chart. Drag-and-drop file upload. Dark-mode police intelligence aesthetic. |
| OSINT enrichment | Playwright (async) | Headless browser automation for public profile lookups. Runs in a separate async worker so it doesn't block the UI. |
| Report generation | WeasyPrint + Jinja2 | HTML template → PDF in one command. Embeds graph PNG export. No LibreOffice dependency. |
| Evidence ledger | SQLite WAL + hashlib | Append-only table. Each row hashes the previous row's hash + its own content (blockchain-style without blockchain overhead). Fully verifiable offline. |
| Deployment | Docker Compose | Single docker-compose up starts the entire stack. No dependencies to install. Works on Windows, Linux, Mac. Critical for judge demo reliability. |
Based on all three tracks and the judge's explicit guidance, here are the realistic alternatives and an honest verdict on each.
The 80% rise in deepfake incidents is real and terrifying. But building an accurate deepfake audio classifier in 36 hours that handles Nepali audio is extremely risky. Pre-trained Wav2Vec models are English-centric. A bad demo (misclassifying real audio as fake) destroys credibility. Could be a future v2 feature in SarpaNet — "flag suspected AI-generated evidence." Keep it as a pipeline addition, not the core.
Consider as bonus feature onlyGenuinely powerful and buildable. The Africa's Talking API simulation works well in demos. But Track 1 is more crowded and less technically differentiating. The judge's signal was clearest about Track 3. If your team has no one who can implement OCR pipelines, pivot here — it's a safer build with high operational impact.
Strong fallback if OCR expertise absentMapLibre + PostGIS + WebSockets is a solid stack, but the live demo depends entirely on the venue's local WiFi. If WebSocket sync fails during the presentation, you show nothing. The 2024 flood statistics are extremely compelling for judges, but SarpaNet's demo is more reliable and more technically deep.
Skip — demo risk too highURLScan.io already does this. Even customized for Nepal (.com.np domains, NRB-mimicking sites), judges will know it's essentially a wrapper around VirusTotal + WHOIS. Very buildable but scores zero on innovation. Only consider this if your team has 2 days to spare before the hackathon ends and want a quick supplementary demo.
Skip — not innovative enoughThe "Mule Mapper" concept is correct — it directly addresses the judges' stated priorities. The problem is the depth. Transform it from a CSV visualizer into a full intelligence platform with OCR ingestion, graph analytics, report generation, and chain-of-custody. That's the difference between a 46/50 proposal and a winner. The SarpaNet architecture described above is technically feasible in 36 hours for a strong 4-person team and produces a demo that will make every judge in the room understand within 60 seconds why Nepal's Cyber Bureau needs exactly this tool.