LLM: tips untuk CPU
Revision as of 04:22, 17 July 2024 by Onnowpurbo (talk | contribs) (Created page with "Kata CGPT: saat pake CPU, coba: 1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akuras...")
Kata CGPT: saat pake CPU, coba:
1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akurasi. 3. Buat versi kecil dari model yg sama. 4. Multi-threading. 5. Gunakan Intel MKL / OpenBLAS.