LLM: RAG open source no API di google collab
Jump to navigation
Jump to search
Here’s an example of a **Retrieval-Augmented Generation (RAG)** implementation in Python, using Gemini for vector embeddings and a local retrieval system. This setup does not rely on external APIs and can be run on Google Colab. The system uses a local vector store with FAISS for similarity search.
Installation Requirements
Ensure these libraries are installed:
!pip install faiss-cpu transformers torch datasets
RAG Implementation with Gemini and Local Vector Search
Here’s the source code:
import faiss import numpy as np from transformers import AutoTokenizer, AutoModel from typing import List, Tuple from datasets import load_dataset
# 1. Embedding Function using Gemini (mocked here for simplicity)
class GeminiEmbedder:
def __init__(self, model_name="sentence-transformers/all-MiniLM-L6-v2"):
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
def embed(self, texts: List[str]) -> np.ndarray:
inputs = self.tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
outputs = self.model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1).detach().numpy()
return embeddings
# 2. Build FAISS Index
class RAGRetriever:
def __init__(self, dimension: int):
self.index = faiss.IndexFlatL2(dimension)
def add_to_index(self, embeddings: np.ndarray):
self.index.add(embeddings)
def search(self, query_embedding: np.ndarray, top_k: int = 5) -> List[int]:
distances, indices = self.index.search(query_embedding, top_k)
return indices[0]
# 3. Load Dataset (use a sample dataset like WikiText for local corpus)
def load_corpus() -> Tuple[List[str], np.ndarray]:
dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train")
texts = dataset["text"][:1000] # Limit to 1000 samples for simplicity
texts = [text for text in texts if len(text.strip()) > 10] # Filter short texts
return texts
# 4. Main RAG Workflow
def main():
# Initialize embedder and retriever
embedder = GeminiEmbedder()
retriever = RAGRetriever(dimension=384) # Based on the embedding dimension of MiniLM
# Load corpus and build index
corpus = load_corpus()
corpus_embeddings = embedder.embed(corpus)
retriever.add_to_index(corpus_embeddings)
# Query Example
query = "What is the capital of France?"
query_embedding = embedder.embed([query])
top_indices = retriever.search(query_embedding, top_k=5)
# Retrieve and generate answers
print("Query:", query)
print("Top Results:")
for idx in top_indices:
print(f" - {corpus[idx]}")
# Execute the RAG pipeline
if __name__ == "__main__":
main()
Explanation:
- GeminiEmbedder Class: Uses a pre-trained Transformer model (`sentence-transformers/all-MiniLM-L6-v2`) to generate embeddings for the corpus and queries.
- FAISS Index: Stores the embeddings locally for efficient similarity search without relying on APIs.
- Dataset: Uses the `wikitext` dataset as a sample corpus. You can replace this with any custom dataset.
- RAG Workflow: Embeds the corpus, builds the index, and retrieves relevant documents based on query embeddings.
Running on Google Colab
- Save the script in a `.py` file or run it directly in Colab cells.
- Ensure required libraries are installed using the `pip install` command.
- Modify the dataset or embedding model as needed.