Difference between revisions of "LLM: LLama Instal Ubuntu 24.04"

From OnnoWiki
Jump to navigation Jump to search
 
(No difference)

Latest revision as of 07:06, 24 May 2024

Sumber: https://twm.me/posts/how-to-install-llama2-linux-ubuntu/


In this guide, I'll demonstrate how to set up Meta's Llama2 open-source large language model to run on your desktop computer. To show how to do some customization, we'll create a basic game NPC AI that enables interaction.

Prerequisites Several prerequisites are necessary to run and customize the Llama2 language model.

Hardware We'll be configuring the 7B parameter model. Despite being the smallest parameter model, it demands significant hardware resources for smooth operation. Note that there are no definitive or official hardware requirements for Llama2. The following are general recommendations for running 7B size language models based on feedback from the community and other testers.

RAM: 8GB or 16GB; the more, the better GPU VRAM: Minimum of 8GB, recommended at least 12GB; the more, the better Storage: Strongly recommend SSD. Approximately 11GB of storage for the 7B model Keep in mind that GPU memory (VRAM) is crucial. You might be able to manage with lower-spec hardware, I've successfully run it on my M1 MacBook Air (though performance was extremely slow).

Python Language You'll need some basic knowledge of Python programming to create an interactive program that utilizes the model.

Downloading the model Request for access The initial step is requesting access to the model on Meta AI's website and agreeing to their terms and conditions.

Main page Download page Complete your details and submit. Typically, you'll promptly receive an email with download instructions.

Download the model Follow the email's instructions. The steps might change over time. Below are the steps, but always refer to the email as your primary guide in case steps change.

Navigate to the Llama2 repository and download the code:

  1. Clone the code

git clone git@github.com:facebookresearch/llama.git Access the directory and execute the download script:

cd llama

  1. Make the ./download script executable

sudo chmod +x ./download.sh

  1. Run the ./download script

./download.sh The download script will prompt you to enter the link from the email, resembling https://download.llamameta.net/*?Policy=eyJTdGF0ZW1lbnQiOlt7InUuaXF1ZV9oYXNoIjoidWRuMGljOGhmNGh2eXo0e....

Subsequently, it will prompt you to choose from available model weights:

Llama-2-7b Llama-2-7b-chat Llama-2-13b Llama-2-13b-chat Llama-2-70b Llama-2-70b-chat Use the Llama-2-7b-chat weight to start with the chat application. Select and download. Once downloaded, you'll have the model downloaded into the ./llama-2-7b-chat directory.

Install Installing the library dependencies is essential. You can optionally (but it's recommended) to set up a Python virtual environment, to isolate the environment for your project.

  1. Using virtualenv...

virtualenv env

  1. Or, using venv

python3 -m venv env

  1. Then, activate the environment

source env/bin/activate Then, install the project and the dependencies.

  1. Install the project

pip install -e .

  1. Install the project's dependencies as specified in the requirements.txt file.

pip install -r requirements.txt Test the model After installation, it's time to test and run the model. The code includes example application scripts for testing. Within the code, you'll find:

example_chat_completion.py example_text_completion.py. You should now have Torch installed from the requirements.txt file installation in the previous step. This loads the example_chat_completion.py script to run the downloaded llama-2-7b-chat model.

torchrun --nproc_per_node 1 example_chat_completion.py \

   --ckpt_dir llama-2-7b-chat/ \
   --tokenizer_path tokenizer.model \
   --max_seq_len 512 --max_batch_size 6

It should output a sample conversation like this:

> initializing model parallel with size 1 > initializing ddp with size 1 > initializing pipeline with size 1 Loaded in 10.51 seconds

User: what is the recipe of mayonnaise?

> Assistant: Mayonnaise is a thick, creamy condiment made from a mixture of egg yolks, oil, and an acid, such as vinegar or lemon juice. Here is a basic recipe for homemade mayonnaise: Troubleshoot Unfortunately, it doesn't always go smoothly and you might not see the above. The most common error you'd see is:

torch.distributed.elastic.multiprocessing.errors.ChildFailedError: However, this is a very generic PyTorch error, making it challenging to pinpoint the issue precisely. One solution that worked for me was modifying the values in --max_seq_len 512 --max_batch_size 6 parameters. You can experiment with values like 128, 256, etc., for --max_seq_len, and 4, 6, or 8 for --max_batch_size.

As of now, no definitive solution exists for this problem. If the above doesn't work, I suggest looking at the issues section of the PyTorch repo for further troubleshooting.

Write the chat AI Let's develop a straightforward game NPC AI to develop some hands-on knowledge of how the code operates and how you can construct your chat AI. Begin by crafting your chat script, creating a file ./my_chat.py. You can copy the code from ./example_chat_completion.py as a base and clear the example dialogs.

from typing import Optional

import fire

from llama import Llama

def main(

   ckpt_dir: str,
   tokenizer_path: str,
   temperature: float = 0.6,
   top_p: float = 0.9,
   max_seq_len: int = 512,
   max_batch_size: int = 8,
   max_gen_len: Optional[int] = None,

):

   generator = Llama.build(
       ckpt_dir=ckpt_dir,
       tokenizer_path=tokenizer_path,
       max_seq_len=max_seq_len,
       max_batch_size=max_batch_size,
   )
   # Can support multiple dialogs at once.
   # A dialog is essentially a conversation.
   # Just create one empty dialog to keep it simple for now.
   dialogs = [
       []
   ]
   results = generator.chat_completion(
       dialogs,
       max_gen_len=max_gen_len,
       temperature=temperature,
       top_p=top_p,
   )
   # Remove everything else below


if __name__ == "__main__":

   fire.Fire(main)

To create a continuous prompting loop, insert the following into your chat script. Each dialog consists of message exchanges within the conversation. The structure of the object can be referred to in the example code.

  1. ...

dialogs = [

   []

]

  1. Note: I've prompted it to keep answers short to prevent resource issues from occurring.

dialogs[0].append({ "role": "system", "content": "Provide short answers like a game NPC named George"})

while True:

   user_input = input("Say something: ")
   if user_input == 'exit':
       print("Exit the conversation.")
   break
   dialogs[0].append({ "role": "user", "content": user_input})
   results = generator.chat_completion(
       dialogs, # type: ignore
       max_gen_len=max_gen_len,
       temperature=temperature,
       top_p=top_p,
   .
   dialogs[0].append(results[0]['generation'])
   print("George:" + results[0]['generation']['content'])

Ensure that each message is assigned a role of user, generation, or system. Messages should alternate between user and generation roles. Consecutive messages with the same role, such as two consecutive user roles, are invalid (example below)

[

   { "role": "user", "content": "Hi"},
   { "role": "user", "content": "How are you"},

] In the chat script, begin by adding an instruction for the system. This is where you can provide AI commands, such as instructing it to behave like a game NPC named George. Prompt user input and add it to the dialogue. The dialog is then passed to generator.chat_completion() to generate the AI's response, which is added to the dialog's results. The generator.chat_completion() generates results for all dialogs, but since we're using one dialog for simplicity, dialog[0] corresponds to results[0].

Append the response's generation results to the dialog to keep track of the conversation's progression. Print the response as well.

dialogs[0].append(results[0]['generation']) print("George:" + results[0]['generation']['content']) In essence, in each iteration of the loop, you feed the entire conversation history to generate a response based on the entire dialog history.

The result You now have a game NPC AI bot for engaging in conversations. Here's a sample output of a conversation:

Say something: Hi George: Hey there, young adventurer! *adjusts spectacles* What brings you to this humble village? Are you here to seek fortune, or perhaps to uncover the secrets of the ancient ruins that lie nearby? *winks*

Say something: Who are you? George: Ah, a curious traveler! *chuckles* My name is George, and I am the village elder here in Greenhaven. *adjusts spectacles* It's a pleasure to make your acquaintance! *smiles* What brings you to our little village? Are you here to rest your weary bones, or perhaps to seek out the wisdom of the ages? *winks* Conclusion You've completed a guide on installing Llama2 on your local machine and applying it to a simple application. You can extend it to accommodate more dialogs and content generation. This guide provides a foundation for utilizing the Llama2 model for various applications. I hope you've managed to learn from this experience and create something exciting with it!




Referensi