Difference between revisions of "LLM: tips untuk CPU"

From OnnoWiki

Jump to navigation Jump to search

Revision as of 04:25, 17 July 2024

Kata CGPT: saat pake CPU, coba:

1. Batch Processing u. kurangi overhead & speedup embedding.
2. Kurangi presisi model; float32->float16/int8;  speedup tanpa korbankan akurasi.
3. Buat versi kecil dari model yg sama.
4. Multi-threading.
5. Gunakan Intel MKL / OpenBLAS.

saya pakai model intfloat pak, lumayan cepet di CPU https://huggingface.co/intfloat/multilingual-e5-large

Retrieved from "https://onnocenter.or.id/wiki/index.php?title=LLM:_tips_untuk_CPU&oldid=70476"