Difference between revisions of "LLM: tips untuk CPU"

From OnnoWiki
Jump to navigation Jump to search
(Created page with "Kata CGPT: saat pake CPU, coba: 1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akuras...")
 
 
(2 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
  4. Multi-threading.
 
  4. Multi-threading.
 
  5. Gunakan Intel MKL / OpenBLAS.
 
  5. Gunakan Intel MKL / OpenBLAS.
 +
 +
saya pakai model intfloat pak, lumayan cepet di CPU  https://huggingface.co/intfloat/multilingual-e5-large
 +
 +
 +
 +
kalo pdf bisanya saya parse dulu textnya atau pakai ocr, terus embeddingnya disimpan di postgre pakai pgvector (https://github.com/pgvector/pgvector)
 +
agak effort sih

Latest revision as of 04:26, 17 July 2024

Kata CGPT: saat pake CPU, coba:

1. Batch Processing u. kurangi overhead & speedup embedding.
2. Kurangi presisi model; float32->float16/int8;  speedup tanpa korbankan akurasi.
3. Buat versi kecil dari model yg sama.
4. Multi-threading.
5. Gunakan Intel MKL / OpenBLAS.

saya pakai model intfloat pak, lumayan cepet di CPU https://huggingface.co/intfloat/multilingual-e5-large


kalo pdf bisanya saya parse dulu textnya atau pakai ocr, terus embeddingnya disimpan di postgre pakai pgvector (https://github.com/pgvector/pgvector) agak effort sih