KI: PRAKTEK 13 — Proyek Akhir AI Security
PRAKTEK 13 — Proyek Akhir AI Security Tujuan Di praktikum ini kamu tidak lagi “coba-coba tool”. Kamu akan membangun produk keamanan mini yang: punya input data jelas punya proses deteksi punya output keputusan (risk score + alasan) punya laporan & demo yang bisa dipertanggungjawabkan Kuncinya: jelaskan logika keamanan. AI hanya membantu. Pilihan Proyek (Pilih 1) AI Phishing Detector (paling “nyata”, mudah diuji) AI Audit PDP (privacy compliance, cocok untuk log/CSV/dataset) AI IDS Sederhana (network log/anomaly, menantang tapi seru) Semua proyek punya kerangka sama (end-to-end). Struktur Wajib Proyek (Sama untuk semua) Tahap 0 — Setup Environment (Ubuntu 24.04) Instal dependensi dasar
sudo apt update sudo apt install -y python3 python3-venv python3-pip git gpg python3 --version gpg --version Buat folder proyek
mkdir -p ~/ai-security-final/{data,src,reports,models} cd ~/ai-security-final python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip Paket Python (open-source) Untuk semua opsi proyek: pip install pandas scikit-learn numpy joblib rich
Opsional (kalau butuh parsing log lebih rapih / regex kuat): pip install python-dateutil Struktur folder minimal: ai-security-final/
data/ src/ models/ reports/ README.md
Tahap 1 — Keamanan Data Proyek (Wajib): GnuPG untuk Dataset & Output Kenapa? Karena dataset dan laporan sering mengandung data sensitif. Kamu harus membuktikan bahwa kamu bisa mengamankan data. 1. Buat key GPG (untuk proyek)
gpg --full-generate-key Pilih: (1) RSA and RSA 3072 atau 4096 nama: AI Security Student email: student@lab.local Cek key: gpg --list-keys 2. Enkripsi dataset (contoh) Misal dataset kamu data/phishing_samples.csv: gpg --output data/phishing_samples.csv.gpg --symmetric --cipher-algo AES256 data/phishing_samples.csv shred -u data/phishing_samples.csv Decrypt saat butuh: gpg --output data/phishing_samples.csv --decrypt data/phishing_samples.csv.gpg
Aturan proyek: dataset yang berisi data pribadi/berisiko harus disimpan terenkripsi atau minimal data dummy. Tahap 2 — Pilih Proyek + Jalankan Step-by-step Di bawah ini saya kasih 3 jalur proyek lengkap, masing-masing punya: data contoh realistis langkah implementasi kode training + inference output demo format laporan Kamu tinggal pilih salah satu. OPSI A — AI Phishing Detector (Recommended) Goal: deteksi pesan phishing dari teks email/chat → keluarkan label + risk score + alasan. 1. Siapkan dataset (realistis tapi aman) Buat file: data/phishing_samples.csv (contoh mini, bisa kamu tambah) text,label "URGENT: Your account will be suspended. Verify now at http://secure-login.example.com",1 "Hi team, meeting moved to 3pm. Link: https://meet.example.org/abc",0 "Reset password now. Your mailbox is full. Click http://mailbox-reset.example.net",1 "Invoice attached, please review. Thanks",0 "Bank: unusual activity detected. Confirm your OTP at http://bank-verify.example.xyz",1 "Reminder: submit assignment before Friday",0
Label: 1=phishing, 0=benign. Challenge: nanti kamu tambahkan 50–200 contoh (bisa dari teks buatan sendiri yang realistis). 2. Buat training script (ML sederhana + explainable) Buat src/train_phishing.py: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.linear_model import LogisticRegression from sklearn.pipeline import Pipeline from sklearn.metrics import classification_report, confusion_matrix import joblib
DATA_PATH = "data/phishing_samples.csv" MODEL_PATH = "models/phishing_model.joblib"
def main():
df = pd.read_csv(DATA_PATH) X = df["text"].astype(str) y = df["label"].astype(int)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=42, stratify=y
)
model = Pipeline([
("tfidf", TfidfVectorizer(ngram_range=(1,2), min_df=1)),
("clf", LogisticRegression(max_iter=200))
])
model.fit(X_train, y_train) y_pred = model.predict(X_test)
print("=== Confusion Matrix ===")
print(confusion_matrix(y_test, y_pred))
print("\n=== Classification Report ===")
print(classification_report(y_test, y_pred))
joblib.dump(model, MODEL_PATH)
print(f"\nSaved model to: {MODEL_PATH}")
if __name__ == "__main__":
main()
Jalankan: python src/train_phishing.py 3. Buat detector + alasan (top keywords) Buat src/detect_phishing.py: import joblib from rich import print from rich.console import Console
MODEL_PATH = "models/phishing_model.joblib"
SUSPICIOUS_HINTS = [
"urgent", "verify", "reset", "suspended", "otp", "password", "click", "confirm", "limited", "account", "bank"
]
def explain_text(text: str):
low = text.lower() hits = [h for h in SUSPICIOUS_HINTS if h in low] return hits[:10]
def main():
model = joblib.load(MODEL_PATH)
console = Console()
console.print("[bold]AI Phishing Detector Demo[/bold]")
console.print("Ketik pesan/email. Enter kosong untuk keluar.\n")
while True:
text = input("Message> ").strip()
if not text:
break
proba = model.predict_proba([text])[0][1] # prob phishing
label = "PHISHING" if proba >= 0.5 else "BENIGN"
hints = explain_text(text)
print("\n[bold]Result[/bold]")
print(f"Label : [bold]{label}[/bold]")
print(f"Risk score: [bold]{proba:.2f}[/bold] (0..1)")
print(f"Reasons : {hints if hints else 'No obvious keyword hints'}")
print("-" * 60)
if __name__ == "__main__":
main()
Run demo: python src/detect_phishing.py
Contoh input nyata untuk demo: “Admin: akun kamu akan nonaktif. klik link ini untuk verifikasi …” “Tolong cek invoice, ada file .zip passwordnya 12345” “Meeting jam 2, link google meet …” Penilaian tinggi kalau kamu menambahkan: deteksi URL pendek, domain aneh, kata “urgent”, dan pattern yang sering dipakai scam. OPSI B — AI Audit PDP (Privacy Audit Tool) Goal: scan file CSV/log → deteksi personal data → laporan risiko + rekomendasi. 1. Dataset contoh Buat data/sample_users.csv: name,email,phone,nik,address,notes "Budi","budi@mail.com","08123456789","327xxxxxxxxxxxx","Bekasi","token=abc123" "Siti","siti@gmail.com","082233445566","320xxxxxxxxxxxx","Jakarta","pwd=123456" "Andi","andi@corp.co.id","081299988877","","Bandung","no issues" 2. Tool audit (regex + scoring) Buat src/pdp_audit.py: import re import pandas as pd from rich import print from rich.table import Table
PATTERNS = {
"EMAIL": re.compile(r"\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b"),
"PHONE_ID": re.compile(r"\b(08\d{8,12})\b"),
"NIK_LIKE": re.compile(r"\b\d{16}\b"),
"TOKEN_LIKE": re.compile(r"\b(token|apikey|secret)\s*=\s*[A-Za-z0-9_-]{6,}\b", re.I),
"PASSWORD_LIKE": re.compile(r"\b(pass|pwd|password)\s*=\s*\S+\b", re.I),
}
SEVERITY = {
"EMAIL": 2, "PHONE_ID": 2, "NIK_LIKE": 4, "TOKEN_LIKE": 4, "PASSWORD_LIKE": 5,
}
def scan_value(val: str):
findings = []
for k, rx in PATTERNS.items():
if rx.search(val):
findings.append(k)
return findings
def main():
path = "data/sample_users.csv" df = pd.read_csv(path)
findings_rows = [] total_score = 0
for idx, row in df.iterrows():
row_findings = []
row_score = 0
for col, v in row.items():
s = "" if pd.isna(v) else str(v)
hits = scan_value(s)
for h in hits:
row_findings.append((col, h))
row_score += SEVERITY[h]
total_score += row_score
findings_rows.append((idx, row_score, row_findings))
table = Table(title="PDP Audit Report (Quick Scan)")
table.add_column("Row", justify="right")
table.add_column("Risk Score", justify="right")
table.add_column("Findings")
for idx, score, f in findings_rows:
pretty = ", ".join([f"{col}:{tag}" for col, tag in f]) if f else "-"
table.add_row(str(idx), str(score), pretty)
print(table)
print(f"\n[bold]Total Risk Score:[/bold] {total_score}")
if total_score >= 20:
print("[bold red]High risk:[/bold red] segera lakukan masking/encryption & akses kontrol.")
elif total_score >= 10:
print("[bold yellow]Medium risk:[/bold yellow] audit consent + minimisasi data.")
else:
print("[bold green]Low risk:[/bold green] tetap pastikan retention & akses log.")
if __name__ == "__main__":
main()
Run: python src/pdp_audit.py Bonus nilai: hasil audit disimpan jadi file reports/pdp_report.txt lalu dienkripsi pakai GPG. OPSI C — AI IDS Sederhana (Anomaly Detection dari Log) Goal: baca log koneksi → deteksi “aneh” (contoh: port scanning / brute force) → alert. 1. Buat dataset log sederhana (contoh realistis) Buat data/connections.csv: src_ip,dst_port,count_per_minute 192.168.1.10,80,5 192.168.1.11,22,3 192.168.1.50,22,60 192.168.1.50,23,55 192.168.1.50,445,40 192.168.1.12,443,4 192.168.1.13,80,6 Interpretasi: IP 192.168.1.50 “rame banget” ke banyak port → scan/bruteforce suspicion 2. Anomaly detection dengan IsolationForest Buat src/ids_anomaly.py: import pandas as pd from sklearn.ensemble import IsolationForest from rich import print from rich.table import Table
def main():
df = pd.read_csv("data/connections.csv")
# features sederhana X = df"dst_port", "count_per_minute".astype(float)
model = IsolationForest(contamination=0.2, random_state=42) df["anomaly"] = model.fit_predict(X) # -1 anomaly, 1 normal df["score"] = model.decision_function(X) # semakin kecil = semakin aneh
table = Table(title="AI IDS Sederhana (Anomaly Detection)")
table.add_column("src_ip")
table.add_column("dst_port", justify="right")
table.add_column("count/min", justify="right")
table.add_column("anomaly")
table.add_column("score", justify="right")
for _, r in df.sort_values("score").iterrows():
tag = "[bold red]ALERT[/bold red]" if r["anomaly"] == -1 else "OK"
table.add_row(
str(r["src_ip"]),
str(int(r["dst_port"])),
str(int(r["count_per_minute"])),
tag,
f"{r['score']:.3f}"
)
print(table)
alerts = df[df["anomaly"] == -1]
if len(alerts) > 0:
print("\n[bold]Suggested Investigation Steps:[/bold]")
print("- Cek apakah IP itu user normal atau device tak dikenal")
print("- Cek log auth (/var/log/auth.log) jika port 22 dominan")
print("- Jika banyak port berbeda: kemungkinan port scanning")
else:
print("\n[bold green]No anomaly detected[/bold green]")
if __name__ == "__main__":
main()
Run: python src/ids_anomaly.py Bonus nilai: integrasikan dengan log nyata dari auth.log atau ufw.log (tanpa data pribadi).
Tahap 3 — Output Wajib: Tool + Laporan + Demo
1. Tool (Wajib)
Tool kamu minimal bisa:
menerima input (file / teks)
menghasilkan output (label/alert/report)
punya cara menjalankan yang jelas: python src/...
2. Laporan (Template singkat tapi kuat)
Buat reports/report.md (minimal isi):
Latar belakang masalah
Threat model ringkas (siapa attacker, target, impact)
Desain sistem
Dataset & pengamanan data (pakai GPG? masking?)
Hasil uji (contoh output, confusion matrix / alert list)
Limitasi & potensi kesalahan
Rekomendasi perbaikan
3. Demo (Wajib)
Demo 3–5 menit:
jelaskan masalah
jalankan tool live
tunjukkan output
jelaskan “kenapa” hasilnya begitu
Rubrik Penilaian
Kalau mau nilai tinggi, fokus ke ini:
End-to-end berfungsi (bukan potongan kode)
output ada risk score + alasan
ada evaluasi & limitation (AI bisa salah di mana)
data sensitif ditangani: mask/encrypt (GPG)
dokumentasi rapi: README + report
Checklist Final (Sebelum Submit) src/ berisi script utama data/ aman (dummy atau terenkripsi GPG) models/ ada model kalau proyek ML reports/report.md ada Demo bisa jalan di Ubuntu 24.04 dengan perintah jelas Semua open-source, tanpa proprietary