LLM: RAG open source no API di google collab
Jump to navigation
Jump to search
Here’s an example of a **Retrieval-Augmented Generation (RAG)** implementation in Python, using Gemini for vector embeddings and a local retrieval system. This setup does not rely on external APIs and can be run on Google Colab. The system uses a local vector store with FAISS for similarity search.
Installation Requirements
Ensure these libraries are installed:
!pip install faiss-cpu transformers torch datasets
RAG Implementation with Gemini and Local Vector Search
Here’s the source code:
import faiss import numpy as np from transformers import AutoTokenizer, AutoModel from typing import List, Tuple from datasets import load_dataset
# 1. Embedding Function using Gemini (mocked here for simplicity) class GeminiEmbedder: def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"): self.tokenizer = AutoTokenizer.from_pretrained(model_name) self.model = AutoModel.from_pretrained(model_name) def embed(self, texts: List[str]) -> np.ndarray: inputs = self.tokenizer(texts, padding=True, truncation=True, return_tensors="pt") outputs = self.model(**inputs) embeddings = outputs.last_hidden_state.mean(dim=1).detach().numpy() return embeddings # 2. Build FAISS Index class RAGRetriever: def __init__(self, dimension: int): self.index = faiss.IndexFlatL2(dimension) def add_to_index(self, embeddings: np.ndarray): self.index.add(embeddings) def search(self, query_embedding: np.ndarray, top_k: int = 5) -> List[int]: distances, indices = self.index.search(query_embedding, top_k) return indices[0] # 3. Load Dataset (use a sample dataset like WikiText for local corpus) def load_corpus() -> Tuple[List[str], np.ndarray]: dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train") texts = dataset["text"][:1000] # Limit to 1000 samples for simplicity texts = [text for text in texts if len(text.strip()) > 10] # Filter short texts return texts # 4. Main RAG Workflow def main(): # Initialize embedder and retriever embedder = GeminiEmbedder() retriever = RAGRetriever(dimension=384) # Based on the embedding dimension of MiniLM # Load corpus and build index corpus = load_corpus() corpus_embeddings = embedder.embed(corpus) retriever.add_to_index(corpus_embeddings) # Query Example query = "What is the capital of France?" query_embedding = embedder.embed([query]) top_indices = retriever.search(query_embedding, top_k=5) # Retrieve and generate answers print("Query:", query) print("Top Results:") for idx in top_indices: print(f" - {corpus[idx]}") # Execute the RAG pipeline if __name__ == "__main__": main()
Explanation:
- GeminiEmbedder Class: Uses a pre-trained Transformer model (`sentence-transformers/all-MiniLM-L6-v2`) to generate embeddings for the corpus and queries.
- FAISS Index: Stores the embeddings locally for efficient similarity search without relying on APIs.
- Dataset: Uses the `wikitext` dataset as a sample corpus. You can replace this with any custom dataset.
- RAG Workflow: Embeds the corpus, builds the index, and retrieves relevant documents based on query embeddings.
Running on Google Colab
- Save the script in a `.py` file or run it directly in Colab cells.
- Ensure required libraries are installed using the `pip install` command.
- Modify the dataset or embedding model as needed.