<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=LLM%3A_Fine_Tune_Ollama_gemma3%3A1b</id>
	<title>LLM: Fine Tune Ollama gemma3:1b - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://onnocenter.or.id/wiki/index.php?action=history&amp;feed=atom&amp;title=LLM%3A_Fine_Tune_Ollama_gemma3%3A1b"/>
	<link rel="alternate" type="text/html" href="https://onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_gemma3:1b&amp;action=history"/>
	<updated>2026-04-21T07:11:36Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.4</generator>
	<entry>
		<id>https://onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_gemma3:1b&amp;diff=72948&amp;oldid=prev</id>
		<title>Onnowpurbo: Created page with &quot;### Panduan Fine-Tuning Model Gemma 3 1B dengan Dataset JSONL untuk Penggunaan Langsung di Ollama  Halo! Pertanyaanmu tentang fine-tuning model **Gemma 3 1B** (dari Google, te...&quot;</title>
		<link rel="alternate" type="text/html" href="https://onnocenter.or.id/wiki/index.php?title=LLM:_Fine_Tune_Ollama_gemma3:1b&amp;diff=72948&amp;oldid=prev"/>
		<updated>2025-10-04T21:52:00Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;### Panduan Fine-Tuning Model Gemma 3 1B dengan Dataset JSONL untuk Penggunaan Langsung di Ollama  Halo! Pertanyaanmu tentang fine-tuning model **Gemma 3 1B** (dari Google, te...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;### Panduan Fine-Tuning Model Gemma 3 1B dengan Dataset JSONL untuk Penggunaan Langsung di Ollama&lt;br /&gt;
&lt;br /&gt;
Halo! Pertanyaanmu tentang fine-tuning model **Gemma 3 1B** (dari Google, tersedia di Hugging Face sebagai `google/gemma-3-1b-it` atau `google/gemma-3-1b-pt` untuk pre-trained) menggunakan file JSONL sebagai input dataset, dan outputnya siap pakai di **Ollama**, sangat relevan. Gemma 3 1B adalah model ringan (1 miliar parameter) yang multimodal (teks + gambar), tapi untuk fine-tuning teks, kita fokus pada format chat/instruction.&lt;br /&gt;
&lt;br /&gt;
Proses ini menggunakan **Hugging Face Transformers + QLoRA** (efisien untuk hardware terbatas, seperti GPU 8GB+) atau **Unsloth** (lebih cepat, 2x lebih efisien). Dataset JSONL harus dalam format **Alpaca/ChatML** (misal: `{&amp;quot;instruction&amp;quot;: &amp;quot;...&amp;quot;, &amp;quot;input&amp;quot;: &amp;quot;...&amp;quot;, &amp;quot;output&amp;quot;: &amp;quot;...&amp;quot;}` atau array `messages` dengan role/user/assistant).&lt;br /&gt;
&lt;br /&gt;
**Prasyarat:**&lt;br /&gt;
- GPU dengan minimal 8GB VRAM (NVIDIA, CUDA 11+).&lt;br /&gt;
- Python 3.10+.&lt;br /&gt;
- Akses Hugging Face (buat token di huggingface.co dan accept license Gemma di model card).&lt;br /&gt;
- Ollama terinstal (download dari ollama.com).&lt;br /&gt;
&lt;br /&gt;
Saya sarankan **QLoRA via HF** karena resmi dari Google, tapi Unsloth lebih cepat untuk dataset kecil. Berikut langkah-langkah lengkap. Estimasi waktu: 1-2 jam untuk 1 epoch pada 10k sampel (tergantung hardware).&lt;br /&gt;
&lt;br /&gt;
#### 1. Persiapan Lingkungan dan Dataset&lt;br /&gt;
- **Install Dependencies** (jalankan di terminal atau Colab):&lt;br /&gt;
  ```&lt;br /&gt;
  pip install torch&amp;gt;=2.4.0 transformers&amp;gt;=4.51.3 datasets==3.3.2 accelerate==1.4.0 evaluate==0.4.3 bitsandbytes==0.45.3 trl==0.21.0 peft==0.14.0 protobuf sentencepiece&lt;br /&gt;
  # Opsional untuk Unsloth (lebih cepat): pip install &amp;quot;unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git&amp;quot;&lt;br /&gt;
  ```&lt;br /&gt;
- **Login Hugging Face**:&lt;br /&gt;
  ```python&lt;br /&gt;
  from huggingface_hub import login&lt;br /&gt;
  login(&amp;quot;your_hf_token_here&amp;quot;)  # Ganti dengan token HF-mu&lt;br /&gt;
  ```&lt;br /&gt;
- **Siapkan Dataset JSONL**:&lt;br /&gt;
  - Format contoh (`data.jsonl`): Setiap baris adalah JSON object. Untuk instruction-tuning:&lt;br /&gt;
    ```json&lt;br /&gt;
    {&amp;quot;instruction&amp;quot;: &amp;quot;Apa itu AI?&amp;quot;, &amp;quot;input&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;output&amp;quot;: &amp;quot;AI adalah kecerdasan buatan...&amp;quot;}&lt;br /&gt;
    {&amp;quot;instruction&amp;quot;: &amp;quot;Terjemahkan ke Indonesia&amp;quot;, &amp;quot;input&amp;quot;: &amp;quot;Hello world&amp;quot;, &amp;quot;output&amp;quot;: &amp;quot;Halo dunia&amp;quot;}&lt;br /&gt;
    ```&lt;br /&gt;
    Atau format chat (lebih baik untuk Gemma):&lt;br /&gt;
    ```json&lt;br /&gt;
    {&amp;quot;messages&amp;quot;: [{&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Apa itu AI?&amp;quot;}, {&amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;AI adalah...&amp;quot;}]}&lt;br /&gt;
    ```&lt;br /&gt;
  - Load dataset:&lt;br /&gt;
    ```python&lt;br /&gt;
    from datasets import load_dataset&lt;br /&gt;
    dataset = load_dataset(&amp;quot;json&amp;quot;, data_files=&amp;quot;path/to/your/file.jsonl&amp;quot;, split=&amp;quot;train&amp;quot;)&lt;br /&gt;
    dataset = dataset.train_test_split(test_size=0.1)  # 90% train, 10% eval&lt;br /&gt;
    ```&lt;br /&gt;
&lt;br /&gt;
#### 2. Load Model dan Setup Fine-Tuning (Menggunakan QLoRA)&lt;br /&gt;
- Gunakan `gemma-3-1b-pt` untuk pre-trained (atau `-it` untuk instruction-tuned).&lt;br /&gt;
- Code lengkap (adaptasi dari guide resmi Google):&lt;br /&gt;
  ```python&lt;br /&gt;
  import torch&lt;br /&gt;
  from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments&lt;br /&gt;
  from peft import LoraConfig, get_peft_model&lt;br /&gt;
  from trl import SFTTrainer&lt;br /&gt;
  from datasets import load_dataset  # Sudah di atas&lt;br /&gt;
&lt;br /&gt;
  # Load tokenizer dan model dengan 4-bit quantization (hemat memori)&lt;br /&gt;
  model_id = &amp;quot;google/gemma-3-1b-pt&amp;quot;&lt;br /&gt;
  quantization_config = BitsAndBytesConfig(&lt;br /&gt;
      load_in_4bit=True,&lt;br /&gt;
      bnb_4bit_quant_type=&amp;quot;nf4&amp;quot;,&lt;br /&gt;
      bnb_4bit_compute_dtype=torch.bfloat16,  # Atau float16 jika GPU lama&lt;br /&gt;
      bnb_4bit_use_double_quant=True&lt;br /&gt;
  )&lt;br /&gt;
  model = AutoModelForCausalLM.from_pretrained(&lt;br /&gt;
      model_id,&lt;br /&gt;
      quantization_config=quantization_config,&lt;br /&gt;
      device_map=&amp;quot;auto&amp;quot;,&lt;br /&gt;
      torch_dtype=torch.bfloat16&lt;br /&gt;
  )&lt;br /&gt;
  tokenizer = AutoTokenizer.from_pretrained(model_id)&lt;br /&gt;
  tokenizer.pad_token = tokenizer.eos_token  # Penting untuk padding&lt;br /&gt;
&lt;br /&gt;
  # Format dataset ke chat template (jika pakai messages)&lt;br /&gt;
  def formatting_prompts_func(example):&lt;br /&gt;
      texts = []&lt;br /&gt;
      for msg in example[&amp;quot;messages&amp;quot;]:&lt;br /&gt;
          if msg[&amp;quot;role&amp;quot;] == &amp;quot;user&amp;quot;:&lt;br /&gt;
              texts.append(f&amp;quot;&amp;lt;start_of_turn&amp;gt;user\n{msg['content']}&amp;lt;end_of_turn&amp;gt;&amp;quot;)&lt;br /&gt;
          elif msg[&amp;quot;role&amp;quot;] == &amp;quot;assistant&amp;quot;:&lt;br /&gt;
              texts.append(f&amp;quot;&amp;lt;start_of_turn&amp;gt;model\n{msg['content']}&amp;lt;end_of_turn&amp;gt;&amp;quot;)&lt;br /&gt;
      return {&amp;quot;text&amp;quot;: &amp;quot;&amp;quot;.join(texts)}  # Gemma pakai template ini&lt;br /&gt;
&lt;br /&gt;
  dataset = dataset.map(formatting_prompts_func)  # Apply ke train/eval&lt;br /&gt;
&lt;br /&gt;
  # Setup LoRA (efisien, hanya train 1-5% parameter)&lt;br /&gt;
  peft_config = LoraConfig(&lt;br /&gt;
      r=16,  # Rank LoRA&lt;br /&gt;
      lora_alpha=32,&lt;br /&gt;
      lora_dropout=0.05,&lt;br /&gt;
      bias=&amp;quot;none&amp;quot;,&lt;br /&gt;
      task_type=&amp;quot;CAUSAL_LM&amp;quot;,&lt;br /&gt;
      target_modules=[&amp;quot;q_proj&amp;quot;, &amp;quot;k_proj&amp;quot;, &amp;quot;v_proj&amp;quot;, &amp;quot;o_proj&amp;quot;, &amp;quot;gate_proj&amp;quot;, &amp;quot;up_proj&amp;quot;, &amp;quot;down_proj&amp;quot;]  # Untuk Gemma&lt;br /&gt;
  )&lt;br /&gt;
  model = get_peft_model(model, peft_config)&lt;br /&gt;
&lt;br /&gt;
  # Training args&lt;br /&gt;
  training_args = TrainingArguments(&lt;br /&gt;
      output_dir=&amp;quot;./gemma-3-1b-finetuned&amp;quot;,&lt;br /&gt;
      num_train_epochs=1,  # Mulai dengan 1, naikkan jika perlu&lt;br /&gt;
      per_device_train_batch_size=4,  # Sesuaikan VRAM&lt;br /&gt;
      gradient_accumulation_steps=4,&lt;br /&gt;
      learning_rate=2e-4,&lt;br /&gt;
      fp16=True,  # Atau bf16&lt;br /&gt;
      logging_steps=10,&lt;br /&gt;
      save_steps=500,&lt;br /&gt;
      evaluation_strategy=&amp;quot;steps&amp;quot;,&lt;br /&gt;
      eval_steps=500,&lt;br /&gt;
      report_to=&amp;quot;none&amp;quot;  # Atau &amp;quot;wandb&amp;quot; untuk logging&lt;br /&gt;
  )&lt;br /&gt;
&lt;br /&gt;
  # Trainer&lt;br /&gt;
  trainer = SFTTrainer(&lt;br /&gt;
      model=model,&lt;br /&gt;
      tokenizer=tokenizer,&lt;br /&gt;
      train_dataset=dataset[&amp;quot;train&amp;quot;],&lt;br /&gt;
      eval_dataset=dataset[&amp;quot;test&amp;quot;],&lt;br /&gt;
      dataset_text_field=&amp;quot;text&amp;quot;,  # Field yang berisi prompt + output&lt;br /&gt;
      max_seq_length=512,  # Sesuaikan context length&lt;br /&gt;
      args=training_args,&lt;br /&gt;
      peft_config=peft_config&lt;br /&gt;
  )&lt;br /&gt;
  trainer.train()&lt;br /&gt;
  trainer.save_model(&amp;quot;./gemma-3-1b-finetuned&amp;quot;)  # Simpan adapter LoRA&lt;br /&gt;
  ```&lt;br /&gt;
- Jalankan `trainer.train()` – monitor loss di console.&lt;br /&gt;
&lt;br /&gt;
**Alternatif Unsloth (Lebih Cepat, 2x Speed)**: Ganti load model dengan:&lt;br /&gt;
```python&lt;br /&gt;
from unsloth import FastLanguageModel&lt;br /&gt;
model, tokenizer = FastLanguageModel.from_pretrained(&amp;quot;google/gemma-3-1b&amp;quot;, dtype=torch.float16, load_in_4bit=True)&lt;br /&gt;
# Lanjutkan dengan LoRA dan trainer serupa, Unsloth otomatis optimasi.&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
#### 3. Merge Model dan Convert ke GGUF (untuk Ollama)&lt;br /&gt;
- **Merge LoRA Adapter ke Base Model** (buat model full, bukan adapter saja):&lt;br /&gt;
  ```python&lt;br /&gt;
  from peft import PeftModel&lt;br /&gt;
  base_model = AutoModelForCausalLM.from_pretrained(&amp;quot;google/gemma-3-1b-pt&amp;quot;, torch_dtype=torch.bfloat16, device_map=&amp;quot;auto&amp;quot;)&lt;br /&gt;
  model = PeftModel.from_pretrained(base_model, &amp;quot;./gemma-3-1b-finetuned&amp;quot;)&lt;br /&gt;
  merged_model = model.merge_and_unload()&lt;br /&gt;
  merged_model.save_pretrained(&amp;quot;./gemma-3-1b-merged&amp;quot;)&lt;br /&gt;
  tokenizer.save_pretrained(&amp;quot;./gemma-3-1b-merged&amp;quot;)&lt;br /&gt;
  ```&lt;br /&gt;
- **Convert ke GGUF** (format Ollama):&lt;br /&gt;
  - Install llama.cpp: `git clone https://github.com/ggerganov/llama.cpp &amp;amp;&amp;amp; cd llama.cpp &amp;amp;&amp;amp; make`.&lt;br /&gt;
  - Convert: `./convert_hf_to_gguf.py ./gemma-3-1b-merged --outdir ./gguf --outtype f16` (untuk full precision).&lt;br /&gt;
  - Quantize (hemat ruang, e.g., Q4_K_M): `./quantize ./gguf/gemma-3-1b-merged.gguf ./gguf/gemma-3-1b-merged-q4.gguf Q4_K_M`.&lt;br /&gt;
  - Alternatif mudah: Upload merged model ke Hugging Face Hub, lalu gunakan HF Space &amp;quot;ggml-org/gguf-my-repo&amp;quot; untuk auto-convert ke GGUF.&lt;br /&gt;
&lt;br /&gt;
#### 4. Deploy ke Ollama&lt;br /&gt;
- Buat file **Modelfile** (di folder kosong):&lt;br /&gt;
  ```&lt;br /&gt;
  FROM ./gemma-3-1b-merged-q4.gguf  # Path ke file GGUF-mu&lt;br /&gt;
  TEMPLATE &amp;quot;&amp;quot;&amp;quot;{{ if .System }}&amp;lt;|im_start|&amp;gt;system\n{{ .System }}&amp;lt;|im_end|&amp;gt;\n{{ end }}{{ if .Prompt }}&amp;lt;|im_start|&amp;gt;user\n{{ .Prompt }}&amp;lt;|im_end|&amp;gt;\n{{ end }}&amp;lt;|im_start|&amp;gt;assistant\n&amp;quot;&amp;quot;&amp;quot;&lt;br /&gt;
  PARAMETER stop &amp;quot;&amp;lt;|im_end|&amp;gt;&amp;quot;&lt;br /&gt;
  PARAMETER temperature 0.7&lt;br /&gt;
  PARAMETER top_p 0.9&lt;br /&gt;
  ```&lt;br /&gt;
  - Template ini untuk format Gemma (sesuaikan jika datasetmu beda).&lt;br /&gt;
- Import ke Ollama:&lt;br /&gt;
  ```&lt;br /&gt;
  ollama create my-gemma3-finetuned -f Modelfile&lt;br /&gt;
  ollama run my-gemma3-finetuned&lt;br /&gt;
  ```&lt;br /&gt;
- Test: Ketik prompt di Ollama CLI, atau integrasikan ke app.&lt;br /&gt;
&lt;br /&gt;
#### Tips Tambahan&lt;br /&gt;
- **Ukuran Dataset**: Mulai dengan 1k-10k baris JSONL untuk test. Lebih banyak = lebih baik, tapi overfit jika &amp;lt;100.&lt;br /&gt;
- **Evaluasi**: Tambah `compute_metrics` di trainer untuk perplexity/accuracy.&lt;br /&gt;
- **Hardware**: Di Colab (free T4), gunakan Unsloth. Lokal, pakai `nvidia-smi` untuk monitor VRAM.&lt;br /&gt;
- **Troubleshoot**: Jika error quantization, cek CUDA version. Untuk Gemma 3 spesifik, context window 128K, tapi batasi max_seq_length ke 2048 awal.&lt;br /&gt;
- **Biaya**: Gratis di Colab, tapi push ke HF butuh akun.&lt;br /&gt;
&lt;br /&gt;
Jika datasetmu spesifik (e.g., function calling), sesuaikan prompt template. Kalau butuh bantuan code lengkap atau debug, share detail dataset! Sumber: Guide resmi Google, Unsloth docs, dan tutorial HF.&lt;/div&gt;</summary>
		<author><name>Onnowpurbo</name></author>
	</entry>
</feed>