KI: PRAKTEK 13 — Proyek Akhir AI Security

From OnnoWiki
Jump to navigation Jump to search

PRAKTEK 13 — Proyek Akhir AI Security Tujuan Di praktikum ini kamu tidak lagi “coba-coba tool”. Kamu akan membangun produk keamanan mini yang: punya input data jelas punya proses deteksi punya output keputusan (risk score + alasan) punya laporan & demo yang bisa dipertanggungjawabkan Kuncinya: jelaskan logika keamanan. AI hanya membantu. Pilihan Proyek (Pilih 1) AI Phishing Detector (paling “nyata”, mudah diuji) AI Audit PDP (privacy compliance, cocok untuk log/CSV/dataset) AI IDS Sederhana (network log/anomaly, menantang tapi seru) Semua proyek punya kerangka sama (end-to-end). Struktur Wajib Proyek (Sama untuk semua) Tahap 0 — Setup Environment (Ubuntu 24.04) Instal dependensi dasar

sudo apt update sudo apt install -y python3 python3-venv python3-pip git gpg python3 --version gpg --version Buat folder proyek

mkdir -p ~/ai-security-final/{data,src,reports,models} cd ~/ai-security-final python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip Paket Python (open-source) Untuk semua opsi proyek: pip install pandas scikit-learn numpy joblib rich

Opsional (kalau butuh parsing log lebih rapih / regex kuat): pip install python-dateutil Struktur folder minimal: ai-security-final/

 data/
 src/
 models/
 reports/
 README.md

Tahap 1 — Keamanan Data Proyek (Wajib): GnuPG untuk Dataset & Output Kenapa? Karena dataset dan laporan sering mengandung data sensitif. Kamu harus membuktikan bahwa kamu bisa mengamankan data. 1. Buat key GPG (untuk proyek)

gpg --full-generate-key Pilih: (1) RSA and RSA 3072 atau 4096 nama: AI Security Student email: student@lab.local Cek key: gpg --list-keys 2. Enkripsi dataset (contoh) Misal dataset kamu data/phishing_samples.csv: gpg --output data/phishing_samples.csv.gpg --symmetric --cipher-algo AES256 data/phishing_samples.csv shred -u data/phishing_samples.csv Decrypt saat butuh: gpg --output data/phishing_samples.csv --decrypt data/phishing_samples.csv.gpg

Aturan proyek: dataset yang berisi data pribadi/berisiko harus disimpan terenkripsi atau minimal data dummy. Tahap 2 — Pilih Proyek + Jalankan Step-by-step Di bawah ini saya kasih 3 jalur proyek lengkap, masing-masing punya: data contoh realistis langkah implementasi kode training + inference output demo format laporan Kamu tinggal pilih salah satu. OPSI A — AI Phishing Detector (Recommended) Goal: deteksi pesan phishing dari teks email/chat → keluarkan label + risk score + alasan. 1. Siapkan dataset (realistis tapi aman) Buat file: data/phishing_samples.csv (contoh mini, bisa kamu tambah) text,label "URGENT: Your account will be suspended. Verify now at http://secure-login.example.com",1 "Hi team, meeting moved to 3pm. Link: https://meet.example.org/abc",0 "Reset password now. Your mailbox is full. Click http://mailbox-reset.example.net",1 "Invoice attached, please review. Thanks",0 "Bank: unusual activity detected. Confirm your OTP at http://bank-verify.example.xyz",1 "Reminder: submit assignment before Friday",0

Label: 1=phishing, 0=benign. Challenge: nanti kamu tambahkan 50–200 contoh (bisa dari teks buatan sendiri yang realistis). 2. Buat training script (ML sederhana + explainable) Buat src/train_phishing.py: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.metrics import classification_report, confusion_matrix import joblib

DATA_PATH = "data/phishing_samples.csv" MODEL_PATH = "models/phishing_model.joblib"

def main():

   df = pd.read_csv(DATA_PATH)
   X = df["text"].astype(str)
   y = df["label"].astype(int)
   X_train, X_test, y_train, y_test = train_test_split(
       X, y, test_size=0.3, random_state=42, stratify=y
   )
   model = Pipeline([
       ("tfidf", TfidfVectorizer(ngram_range=(1,2), min_df=1)),
       ("clf", LogisticRegression(max_iter=200))
   ])
   model.fit(X_train, y_train)
   y_pred = model.predict(X_test)
   print("=== Confusion Matrix ===")
   print(confusion_matrix(y_test, y_pred))
   print("\n=== Classification Report ===")
   print(classification_report(y_test, y_pred))
   joblib.dump(model, MODEL_PATH)
   print(f"\nSaved model to: {MODEL_PATH}")

if __name__ == "__main__":

   main()

Jalankan: python src/train_phishing.py 3. Buat detector + alasan (top keywords) Buat src/detect_phishing.py: import joblib from rich import print from rich.console import Console

MODEL_PATH = "models/phishing_model.joblib"

SUSPICIOUS_HINTS = [

   "urgent", "verify", "reset", "suspended", "otp", "password",
   "click", "confirm", "limited", "account", "bank"

]

def explain_text(text: str):

   low = text.lower()
   hits = [h for h in SUSPICIOUS_HINTS if h in low]
   return hits[:10]

def main():

   model = joblib.load(MODEL_PATH)
   console = Console()
   console.print("[bold]AI Phishing Detector Demo[/bold]")
   console.print("Ketik pesan/email. Enter kosong untuk keluar.\n")
   while True:
       text = input("Message> ").strip()
       if not text:
           break
       proba = model.predict_proba([text])[0][1]  # prob phishing
       label = "PHISHING" if proba >= 0.5 else "BENIGN"
       hints = explain_text(text)
       print("\n[bold]Result[/bold]")
       print(f"Label     : [bold]{label}[/bold]")
       print(f"Risk score: [bold]{proba:.2f}[/bold] (0..1)")
       print(f"Reasons   : {hints if hints else 'No obvious keyword hints'}")
       print("-" * 60)

if __name__ == "__main__":

   main()

Run demo: python src/detect_phishing.py

Contoh input nyata untuk demo: “Admin: akun kamu akan nonaktif. klik link ini untuk verifikasi …” “Tolong cek invoice, ada file .zip passwordnya 12345” “Meeting jam 2, link google meet …” Penilaian tinggi kalau kamu menambahkan: deteksi URL pendek, domain aneh, kata “urgent”, dan pattern yang sering dipakai scam. OPSI B — AI Audit PDP (Privacy Audit Tool) Goal: scan file CSV/log → deteksi personal data → laporan risiko + rekomendasi. 1. Dataset contoh Buat data/sample_users.csv: name,email,phone,nik,address,notes "Budi","budi@mail.com","08123456789","327xxxxxxxxxxxx","Bekasi","token=abc123" "Siti","siti@gmail.com","082233445566","320xxxxxxxxxxxx","Jakarta","pwd=123456" "Andi","andi@corp.co.id","081299988877","","Bandung","no issues" 2. Tool audit (regex + scoring) Buat src/pdp_audit.py: import re import pandas as pd from rich import print from rich.table import Table

PATTERNS = {

   "EMAIL": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
   "PHONE_ID": re.compile(r"\b(08\d{8,12})\b"),
   "NIK_LIKE": re.compile(r"\b\d{16}\b"),
   "TOKEN_LIKE": re.compile(r"\b(token|apikey|secret)\s*=\s*[A-Za-z0-9_-]{6,}\b", re.I),
   "PASSWORD_LIKE": re.compile(r"\b(pass|pwd|password)\s*=\s*\S+\b", re.I),

}

SEVERITY = {

   "EMAIL": 2,
   "PHONE_ID": 2,
   "NIK_LIKE": 4,
   "TOKEN_LIKE": 4,
   "PASSWORD_LIKE": 5,

}

def scan_value(val: str):

   findings = []
   for k, rx in PATTERNS.items():
       if rx.search(val):
           findings.append(k)
   return findings

def main():

   path = "data/sample_users.csv"
   df = pd.read_csv(path)
   findings_rows = []
   total_score = 0
   for idx, row in df.iterrows():
       row_findings = []
       row_score = 0
       for col, v in row.items():
           s = "" if pd.isna(v) else str(v)
           hits = scan_value(s)
           for h in hits:
               row_findings.append((col, h))
               row_score += SEVERITY[h]
       total_score += row_score
       findings_rows.append((idx, row_score, row_findings))
   table = Table(title="PDP Audit Report (Quick Scan)")
   table.add_column("Row", justify="right")
   table.add_column("Risk Score", justify="right")
   table.add_column("Findings")
   for idx, score, f in findings_rows:
       pretty = ", ".join([f"{col}:{tag}" for col, tag in f]) if f else "-"
       table.add_row(str(idx), str(score), pretty)
   print(table)
   print(f"\n[bold]Total Risk Score:[/bold] {total_score}")
   if total_score >= 20:
       print("[bold red]High risk:[/bold red] segera lakukan masking/encryption & akses kontrol.")
   elif total_score >= 10:
       print("[bold yellow]Medium risk:[/bold yellow] audit consent + minimisasi data.")
   else:
       print("[bold green]Low risk:[/bold green] tetap pastikan retention & akses log.")

if __name__ == "__main__":

   main()

Run: python src/pdp_audit.py Bonus nilai: hasil audit disimpan jadi file reports/pdp_report.txt lalu dienkripsi pakai GPG. OPSI C — AI IDS Sederhana (Anomaly Detection dari Log) Goal: baca log koneksi → deteksi “aneh” (contoh: port scanning / brute force) → alert. 1. Buat dataset log sederhana (contoh realistis) Buat data/connections.csv: src_ip,dst_port,count_per_minute 192.168.1.10,80,5 192.168.1.11,22,3 192.168.1.50,22,60 192.168.1.50,23,55 192.168.1.50,445,40 192.168.1.12,443,4 192.168.1.13,80,6 Interpretasi: IP 192.168.1.50 “rame banget” ke banyak port → scan/bruteforce suspicion 2. Anomaly detection dengan IsolationForest Buat src/ids_anomaly.py: import pandas as pd from sklearn.ensemble import IsolationForest from rich import print from rich.table import Table

def main():

   df = pd.read_csv("data/connections.csv")
   # features sederhana
   X = df"dst_port", "count_per_minute".astype(float)
   model = IsolationForest(contamination=0.2, random_state=42)
   df["anomaly"] = model.fit_predict(X)  # -1 anomaly, 1 normal
   df["score"] = model.decision_function(X)  # semakin kecil = semakin aneh
   table = Table(title="AI IDS Sederhana (Anomaly Detection)")
   table.add_column("src_ip")
   table.add_column("dst_port", justify="right")
   table.add_column("count/min", justify="right")
   table.add_column("anomaly")
   table.add_column("score", justify="right")
   for _, r in df.sort_values("score").iterrows():
       tag = "[bold red]ALERT[/bold red]" if r["anomaly"] == -1 else "OK"
       table.add_row(
           str(r["src_ip"]),
           str(int(r["dst_port"])),
           str(int(r["count_per_minute"])),
           tag,
           f"{r['score']:.3f}"
       )
   print(table)
   alerts = df[df["anomaly"] == -1]
   if len(alerts) > 0:
       print("\n[bold]Suggested Investigation Steps:[/bold]")
       print("- Cek apakah IP itu user normal atau device tak dikenal")
       print("- Cek log auth (/var/log/auth.log) jika port 22 dominan")
       print("- Jika banyak port berbeda: kemungkinan port scanning")
   else:
       print("\n[bold green]No anomaly detected[/bold green]")

if __name__ == "__main__":

   main()

Run: python src/ids_anomaly.py Bonus nilai: integrasikan dengan log nyata dari auth.log atau ufw.log (tanpa data pribadi).


Tahap 3 — Output Wajib: Tool + Laporan + Demo 1. Tool (Wajib) Tool kamu minimal bisa: menerima input (file / teks) menghasilkan output (label/alert/report) punya cara menjalankan yang jelas: python src/... 2. Laporan (Template singkat tapi kuat) Buat reports/report.md (minimal isi): Latar belakang masalah Threat model ringkas (siapa attacker, target, impact) Desain sistem Dataset & pengamanan data (pakai GPG? masking?) Hasil uji (contoh output, confusion matrix / alert list) Limitasi & potensi kesalahan Rekomendasi perbaikan 3. Demo (Wajib) Demo 3–5 menit: jelaskan masalah jalankan tool live tunjukkan output jelaskan “kenapa” hasilnya begitu Rubrik Penilaian Kalau mau nilai tinggi, fokus ke ini: End-to-end berfungsi (bukan potongan kode) output ada risk score + alasan ada evaluasi & limitation (AI bisa salah di mana) data sensitif ditangani: mask/encrypt (GPG) dokumentasi rapi: README + report

Checklist Final (Sebelum Submit) src/ berisi script utama data/ aman (dummy atau terenkripsi GPG) models/ ada model kalau proyek ML reports/report.md ada Demo bisa jalan di Ubuntu 24.04 dengan perintah jelas Semua open-source, tanpa proprietary


Pranala Menarik