Hack4Safety 2083 · Strategic Analysis

Project Nexus → Complete Rebuild

Deep analysis of the Financial Fraud Link Analysis idea, its fatal flaws, and a full-stack intelligence platform proposal that will dominate Track 3.

Track: 03 — OSINT, Cybercrime & Threat Analysis
Sprint: 36 hours · 3–4 members
Prize: NPR 1,00,000

The "Mule Mapper" as Described: Honest Rating

Project Nexus — CSV → Graph Visualizer
As currently scoped: React + FastAPI + NetworkX + Cytoscape.js, CSV input only
6.5
/ 10 overall
Problem relevance10
Technical depth4
Operational realism5
Innovation score5
Demo appeal8
36-hr feasibility9

The problem identification is perfect — cybercrime siloing across districts with no link analysis capability is the most documented, most acute gap in Nepal's Cyber Bureau. The judges explicitly asked for this. But the current solution barely scratches it.

🔴

Fatal flaw #1: Clean CSV assumption

Real investigators receive WhatsApp screenshots, Khalti app screenshots, blurry photos of receipts. No one hands them a clean CSV. The entire tool breaks before it starts.

🔴

Fatal flaw #2: No intelligence is added

This is a visualizer, not an analyzer. Gephi and NodeXL already do this for free. Without graph algorithms (centrality, community detection), it's just a prettier spreadsheet.

🔴

Fatal flaw #3: Vibe-codeable in 4 hours

CSV upload → NetworkX → Cytoscape render. A solo developer can build this exact thing in an afternoon. Judges will recognize this immediately and penalize it hard.

🟡

Moderate issue: No evidence chain-of-custody

Any graph produced cannot be used in court without provable chain of custody. This limits operational value to internal exploration only, not prosecution support.

Correct problem: Track 3, Problem 1

Judges explicitly said "OSINT & criminal network intelligence." This direction is perfectly aligned. Don't abandon it — deepen it massively.

Right track: Offline + backend-only

Offline-first, no cloud APIs, runs on a local machine. This respects Nepal's data privacy laws and the judge's Criterion 02 on low-connectivity operations.

What Nepal Police Actually Struggles With

Based on Inspector Khadka's public statements and the Cyber Bureau's operational reports, the #1 bottleneck is not visualizing data — it's getting the data into a usable form in the first place.

📱

What a real investigation looks like

A victim reports on the Cyber Bureau portal. They attach a blurry screenshot of a WhatsApp conversation, a photo of an eSewa transaction receipt taken on a cheap phone, and a hand-typed description with the account number. The investigator must manually read this, type the phone number, eSewa ID, and bank account into a spreadsheet, then cross-reference it against 47 other complaints from other districts. This takes 3–5 hours per case. There are 52 cases per day. They have 6 investigators. The math is impossible.

The winning tool doesn't just display connections — it creates them automatically from raw, messy evidence that already exists in the Cyber Bureau's complaint queue.

SarpaNet: Intelligence Platform for Cybercrime Investigation

सर्प (Sarpa) = "serpent" in Nepali — referencing how criminal networks coil invisibly around victims. Also: Surveillance, Analysis, Reporting, Profiling, and Attribution Network.

P-1
Universal Evidence Ingestion Engine
SOLVES THE FATAL FLAW — the real innovation of this project
Instead of requiring clean CSVs (which don't exist in the real world), investigators can drop anything into the system: a WhatsApp screenshot, a photo of an eSewa transaction, a blurry bank statement PDF, or a typed complaint. The pipeline uses PaddleOCR for layout-aware text extraction, then passes the result through a locally-running Ollama model (Qwen2.5:7b or Gemma3:4b — tiny, fast, runs on a 8GB laptop) with a structured extraction prompt. Output is normalized JSON: { phone: "9841XXXXXX", esewa_id: "...", amount: 12500, date: "...", sender: "...", receiver: "..." }. This is then ingested into the graph database automatically.
PaddleOCR Ollama + Qwen2.5:7b FastAPI upload endpoint PIL / pdf2image Zero cloud calls
Demo moment: Drag a blurry, hand-photographed eSewa receipt onto the interface. Watch it automatically extract the sender's phone number, recipient's eSewa ID, and amount in 4 seconds — then watch the graph update live with a new connection.
P-2
Smart Entity Resolution & Deduplication
DETERMINISTIC ALGORITHM — scores high on judge criterion 03
The same phone number appears as 9841XXXXXX, +977-9841XXXXXX, and 984-1XXXXXX across different complaints. A naive system creates three separate nodes. SarpaNet's entity resolver normalizes all phone numbers to E.164 format, strips eSewa ID prefix variations, canonicalizes bank account numbers, and uses fuzzy string matching (Levenshtein distance ≤ 2) to catch typos in manually transcribed data. When the resolver detects two records are the same entity, it merges them into one node and records both source entries as evidence. This is the difference between a disconnected graph and a connected one — and it works on dirty, real-world data.
phonenumbers library rapidfuzz (Levenshtein) NetworkX entity merge SQLite canonical store
P-3
Graph Intelligence Engine
WHAT SEPARATES ANALYSIS FROM VISUALIZATION
On top of the basic node-link display, SarpaNet computes real graph analytics that surface intelligence automatically — no investigator querying needed:

· Degree centrality: Automatically flags the node with the most connections. This is almost always the master mule or kingpin. Shown with a pulsing red glow on the graph.
· Louvain community detection: Automatically clusters the graph into syndicates without any user input. Color-codes clusters by detected community.
· Temporal flow analysis: A timeline slider shows when each transaction occurred, so investigators can see the money laundering chain in time order.
· Shortest path finder: "How is this suspect connected to that arrested criminal?" — one click finds the chain.
· Betweenness centrality: Identifies bridge nodes — the mules who connect two otherwise-separate criminal networks. These are the most valuable arrests to make.
NetworkX algorithms python-louvain Cytoscape.js + D3.js Temporal animation JS
Demo moment: Load 50 complaints. Click "Analyze." Watch the graph self-organize into 3 color-coded clusters representing 3 distinct syndicates. A pulsing node in the center cluster is connected to all three — the bridge mule. This discovery would take 3 days manually.
P-4
Passive OSINT Enrichment Layer
CONTEXT — what the investigator would Google anyway
Right-clicking any phone number node triggers automated public lookups that the investigator would otherwise do manually in separate browser tabs. The enrichment pipeline — running entirely locally — checks public sources in parallel: Truecaller-style reverse phone lookup (scraped), Facebook profile search by phone (public graph API), and a cross-reference against SarpaNet's own internal "known bad actors" database seeded with prior conviction records. Results are attached to the node as metadata. This doesn't require any external paid APIs — it uses headless browser automation pointing at public endpoints, running locally. An "Enrich All" button can process every node in the graph overnight and flag any that show up on multiple public complaint boards.
Playwright headless asyncio parallel requests Local known-bad DB Node metadata sidebar
P-5
One-Click Investigation Report Generator
COURT-READY OUTPUT — closes the loop from data to action
After building a graph, the investigator clicks "Generate Report." Within 10 seconds, a structured PDF is generated containing: the network diagram as a high-resolution image, a timeline of all transactions, a ranked list of suspect nodes by centrality score, a case connection matrix (which complaints share which entities), evidence inventory (all source files that were ingested), and a digest signed by SHA-256 hash for chain-of-custody. This report is formatted in a template matching Nepal Police's standard case documentation format — ready to hand to a prosecutor without any reformatting.
ReportLab / WeasyPrint Cytoscape.js PNG export SHA-256 evidence hashing Nepal Police report template
Demo moment: "We can now show you what this produces for a prosecutor." Click generate. A 12-page PDF appears. This transforms the tool from a visualization toy into an operational prosecution support system.
P-6
Cryptographic Chain-of-Custody Ledger
LEGAL ADMISSIBILITY — the feature that matters in court
Every piece of evidence ingested (screenshot, PDF, CSV) is immediately SHA-256 hashed at the millisecond of upload. The hash, filename, uploader ID, and timestamp are written to a tamper-evident append-only SQLite ledger (no blockchain needed — a WAL-mode SQLite with sequential IDs and hash-chaining of rows is provably tamper-evident and runs offline). This means when a defense lawyer claims an image was modified after upload, the investigator can demonstrate mathematically that the hash at upload matches the hash today — the evidence is untampered. This single feature is the difference between digital evidence being admitted in court and being thrown out.
hashlib SHA-256 SQLite WAL + row chaining Audit log UI Tamper detection alerts

Full Stack — Runs Completely Offline on One Machine

Layer Technology Why this choice
Evidence ingestion PaddleOCR + pdf2image + PIL PaddleOCR outperforms Tesseract on low-quality images and non-Latin scripts. Handles Devanagari numerals in eSewa/bank screenshots. Runs fully offline.
Entity extraction LLM Ollama + Qwen2.5:7b 7B model fits in 8GB VRAM, runs on a standard police laptop. Takes OCR text → structured JSON. No API key, no internet. Outperforms regex for ambiguous text.
Graph engine NetworkX + SQLite NetworkX has all graph algorithms built in (Louvain, centrality, shortest path). SQLite stores the raw data with zero setup. No Neo4j server to install during the 36hrs.
Entity resolution phonenumbers + rapidfuzz Google's phonenumbers library handles all NTC/Ncell number formats. rapidfuzz does Levenshtein in microseconds.
API backend FastAPI (Python 3.11) Async endpoints handle file upload, OCR, LLM extraction, and graph queries. Background tasks for enrichment. WebSocket for live graph updates.
Frontend React + Cytoscape.js + D3.js Cytoscape.js handles graph rendering with physics layout. D3.js for the temporal timeline chart. Drag-and-drop file upload. Dark-mode police intelligence aesthetic.
OSINT enrichment Playwright (async) Headless browser automation for public profile lookups. Runs in a separate async worker so it doesn't block the UI.
Report generation WeasyPrint + Jinja2 HTML template → PDF in one command. Embeds graph PNG export. No LibreOffice dependency.
Evidence ledger SQLite WAL + hashlib Append-only table. Each row hashes the previous row's hash + its own content (blockchain-style without blockchain overhead). Fully verifiable offline.
Deployment Docker Compose Single docker-compose up starts the entire stack. No dependencies to install. Works on Windows, Linux, Mac. Critical for judge demo reliability.

What Changes

Capability
Mule Mapper (original)
SarpaNet (enhanced)
Evidence input
Clean CSV files only
Screenshots, PDFs, photos, CSVs — anything
Data entry
Investigator prepares CSV manually
Auto-extracted by OCR + local LLM
Entity dedup
None — same phone in 3 formats = 3 nodes
Smart normalization + fuzzy match
Graph analytics
Visual only — no algorithms
Centrality, community detection, temporal flow, shortest path
OSINT enrichment
None
Automated public profile lookup per node
Report output
Screenshot of graph
Full PDF with timeline, entities, evidence inventory
Legal admissibility
No chain-of-custody
SHA-256 hash ledger for every piece of evidence
Vibe-codeable?
Yes — 4 hours solo
No — requires 36 hrs from a strong 4-person team
Unique value
Gephi already does this
Nothing like this exists for Nepal Police

Hour-by-Hour with 4-Person Team

0–3h
Setup, schema design, synthetic data
Dev ADev BDev CDev D
Docker Compose skeleton, SQLite schema, generate 60 synthetic complaint records with overlapping entities across 3 "syndicates." This data drives the entire demo so it must be designed carefully. The graph should reveal 3 clusters with one bridge node when analyzed.
3–10h
OCR + LLM extraction pipeline (P-1, P-2)
Dev A
Install PaddleOCR, set up Ollama with Qwen2.5:7b, build FastAPI upload endpoint, write the extraction prompt, implement entity normalization with phonenumbers + rapidfuzz, write unit tests against 10 real-looking test images. This is the hardest piece — give it the strongest developer.
3–10h
Graph engine + analytics (P-3)
Dev B
Build NetworkX graph builder from SQLite, implement Louvain community detection, degree centrality, betweenness centrality, shortest path endpoint. Expose all as FastAPI endpoints. Write the WebSocket endpoint for live graph updates on new evidence upload.
3–12h
React frontend + Cytoscape graph UI
Dev C
Build the main investigation interface: drag-drop upload zone, Cytoscape.js graph with physics layout, node color by community, node size by centrality, right-click context menu for node actions, sidebar for node metadata. Dark mode police intelligence aesthetic. Use react-dropzone for uploads.
10–18h
Integration sprint + temporal analysis + evidence ledger (P-6)
Dev ADev BDev C
Connect all pipelines end-to-end. Test the full flow: drop image → OCR → LLM → entity resolve → graph update. Build the D3.js timeline slider for temporal analysis. Implement the SHA-256 hash ledger for chain-of-custody. Fix integration bugs. This is the most chaotic stretch — plan for debugging time.
10–20h
Report generator (P-5)
Dev D
Build the Jinja2 HTML report template mimicking Nepal Police case documentation format. Implement the Cytoscape.js PNG export, WeasyPrint PDF generation, entity ranking tables, evidence inventory with hashes. Build the "Generate Report" button and download flow.
20–28h
OSINT enrichment (P-4) + UI polish
Dev ADev D
If OCR pipeline is stable (it should be), Dev A builds the Playwright enrichment worker. Otherwise, skip P-4 and polish the main experience. P-4 is a bonus, not core. UI polish: loading states for OCR processing, graph animation on new node addition, highlight "Analyze" button pulse, node inspection sidebar.
28–36h
Demo rehearsal, edge case hardening, presentation
Dev ADev BDev CDev D
Run the demo 10 times. Harden against OCR failure (fallback to raw text display). Prepare the "messy Excel" vs "SarpaNet graph" side-by-side slide. Rehearse the 90-second demo script. Make sure Docker starts cleanly on the demo machine.

Exactly What to Say and Show

SARPANET — Final Presentation Demo Script
Hook [0:00–0:15]
"Yesterday, 52 Nepali citizens were defrauded online. Every case went to the Cyber Bureau. Every case is a spreadsheet. An investigator in Kathmandu and one in Rupandehi are both actively pursuing the same syndicate right now and have no idea. We built a tool to end that."
Setup [0:15–0:25]
"This is what the Cyber Bureau gets from a victim." [Show blurry WhatsApp screenshot of an eSewa transaction, hand-photographed]. "No CSV. No clean data. Just this."
Extraction demo [0:25–0:40]
[Drag the image onto SarpaNet. Progress bar. 4 seconds later:] "SarpaNet read the image, extracted phone number 9841-XXXXXX, eSewa ID, and the NPR 8,000 transaction. It just did in 4 seconds what takes an officer 15 minutes."
Graph explosion [0:40–1:00]
[Click "Load Demo Dataset" — 60 complaints animate onto the graph, self-organizing into clusters]. "50 complaints. 3 provinces. Watch what happens when we click Analyze." [Click. Graph reorganizes. 3 color clusters appear. One amber node pulses in the center]. "That amber node is connected to all three syndicates. It's the bridge mule. Finding this manually would take 3 days. It took SarpaNet 3 seconds."
Close [1:00–1:20]
"One click generates the prosecution report." [Click Generate. PDF appears]. "This goes directly to the prosecutor. Every piece of evidence is SHA-256 hashed — court-admissible, tamper-proof. SarpaNet doesn't just show data. It turns raw evidence into actionable intelligence."

Alternative Ideas Honestly Evaluated

Based on all three tracks and the judge's explicit guidance, here are the realistic alternatives and an honest verdict on each.

Deepfake / AI-weaponised crime detector

Impact: 9/10 Feasibility: 6/10 Risk: HIGH

The 80% rise in deepfake incidents is real and terrifying. But building an accurate deepfake audio classifier in 36 hours that handles Nepali audio is extremely risky. Pre-trained Wav2Vec models are English-centric. A bad demo (misclassifying real audio as fake) destroys credibility. Could be a future v2 feature in SarpaNet — "flag suspected AI-generated evidence." Keep it as a pipeline addition, not the core.

Consider as bonus feature only

USSD Domestic Violence Check-in (Track 1)

Impact: 9/10 Feasibility: 9/10 Track 1

Genuinely powerful and buildable. The Africa's Talking API simulation works well in demos. But Track 1 is more crowded and less technically differentiating. The judge's signal was clearest about Track 3. If your team has no one who can implement OCR pipelines, pivot here — it's a safer build with high operational impact.

Strong fallback if OCR expertise absent

Tactical Disaster Situational Awareness Map (Track 2)

Impact: 10/10 Feasibility: 7/10 Demo Risk: HIGH

MapLibre + PostGIS + WebSockets is a solid stack, but the live demo depends entirely on the venue's local WiFi. If WebSocket sync fails during the presentation, you show nothing. The 2024 flood statistics are extremely compelling for judges, but SarpaNet's demo is more reliable and more technically deep.

Skip — demo risk too high

Phishing Domain Auto-Analyzer (Track 3)

Impact: 6/10 Feasibility: 10/10 Innovation: 4/10

URLScan.io already does this. Even customized for Nepal (.com.np domains, NRB-mimicking sites), judges will know it's essentially a wrapper around VirusTotal + WHOIS. Very buildable but scores zero on innovation. Only consider this if your team has 2 days to spare before the hackathon ends and want a quick supplementary demo.

Skip — not innovative enough

Stay on Track 3. Go deeper, not sideways.

The "Mule Mapper" concept is correct — it directly addresses the judges' stated priorities. The problem is the depth. Transform it from a CSV visualizer into a full intelligence platform with OCR ingestion, graph analytics, report generation, and chain-of-custody. That's the difference between a 46/50 proposal and a winner. The SarpaNet architecture described above is technically feasible in 36 hours for a strong 4-person team and produces a demo that will make every judge in the room understand within 60 seconds why Nepal's Cyber Bureau needs exactly this tool.

6.5
Mule Mapper as-is
9.2
SarpaNet full platform