LLM: RAG open source no API di google collab

From OnnoWiki
Jump to navigation Jump to search

Here’s an example of a **Retrieval-Augmented Generation (RAG)** implementation in Python, using Gemini for vector embeddings and a local retrieval system. This setup does not rely on external APIs and can be run on Google Colab. The system uses a local vector store with FAISS for similarity search.

Installation Requirements

Ensure these libraries are installed:

!pip install faiss-cpu transformers torch datasets


RAG Implementation with Gemini and Local Vector Search

Here’s the source code:

import faiss
import numpy as np
from transformers import AutoTokenizer, AutoModel
from typing import List, Tuple
from datasets import load_dataset
# 1. Embedding Function using Gemini (mocked here for simplicity)
class GeminiEmbedder:
    def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModel.from_pretrained(model_name) 

    def embed(self, texts: List[str]) -> np.ndarray:
        inputs = self.tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
        outputs = self.model(**inputs)
        embeddings = outputs.last_hidden_state.mean(dim=1).detach().numpy()
        return embeddings

# 2. Build FAISS Index
class RAGRetriever:
    def __init__(self, dimension: int):
        self.index = faiss.IndexFlatL2(dimension) 

    def add_to_index(self, embeddings: np.ndarray):
        self.index.add(embeddings)

    def search(self, query_embedding: np.ndarray, top_k: int = 5) -> List[int]:
        distances, indices = self.index.search(query_embedding, top_k)
        return indices[0]

# 3. Load Dataset (use a sample dataset like WikiText for local corpus)
def load_corpus() -> Tuple[List[str], np.ndarray]:
    dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train")
    texts = dataset["text"][:1000]  # Limit to 1000 samples for simplicity
    texts = [text for text in texts if len(text.strip()) > 10]  # Filter short texts
    return texts

# 4. Main RAG Workflow
def main():
    # Initialize embedder and retriever
    embedder = GeminiEmbedder()
    retriever = RAGRetriever(dimension=384)  # Based on the embedding dimension of MiniLM 

    # Load corpus and build index
    corpus = load_corpus()
    corpus_embeddings = embedder.embed(corpus)
    retriever.add_to_index(corpus_embeddings)

    # Query Example
    query = "What is the capital of France?"
    query_embedding = embedder.embed([query])
    top_indices = retriever.search(query_embedding, top_k=5)

    # Retrieve and generate answers
    print("Query:", query)
    print("Top Results:")
    for idx in top_indices:
        print(f" - {corpus[idx]}")

# Execute the RAG pipeline
if __name__ == "__main__":
    main()

Explanation:

  • GeminiEmbedder Class: Uses a pre-trained Transformer model (`sentence-transformers/all-MiniLM-L6-v2`) to generate embeddings for the corpus and queries.
  • FAISS Index: Stores the embeddings locally for efficient similarity search without relying on APIs.
  • Dataset: Uses the `wikitext` dataset as a sample corpus. You can replace this with any custom dataset.
  • RAG Workflow: Embeds the corpus, builds the index, and retrieves relevant documents based on query embeddings.

Running on Google Colab

  1. Save the script in a `.py` file or run it directly in Colab cells.
  2. Ensure required libraries are installed using the `pip install` command.
  3. Modify the dataset or embedding model as needed.