Difference between revisions of "LLM: tips untuk CPU"
Jump to navigation
Jump to search
Onnowpurbo (talk | contribs) |
Onnowpurbo (talk | contribs) |
||
(One intermediate revision by the same user not shown) | |||
Line 6: | Line 6: | ||
5. Gunakan Intel MKL / OpenBLAS. | 5. Gunakan Intel MKL / OpenBLAS. | ||
− | saya pakai model intfloat pak, lumayan cepet di CPU | + | saya pakai model intfloat pak, lumayan cepet di CPU https://huggingface.co/intfloat/multilingual-e5-large |
− | https://huggingface.co/intfloat/multilingual-e5-large | + | |
+ | |||
+ | |||
+ | kalo pdf bisanya saya parse dulu textnya atau pakai ocr, terus embeddingnya disimpan di postgre pakai pgvector (https://github.com/pgvector/pgvector) | ||
+ | agak effort sih |
Latest revision as of 04:26, 17 July 2024
Kata CGPT: saat pake CPU, coba:
1. Batch Processing u. kurangi overhead & speedup embedding. 2. Kurangi presisi model; float32->float16/int8; speedup tanpa korbankan akurasi. 3. Buat versi kecil dari model yg sama. 4. Multi-threading. 5. Gunakan Intel MKL / OpenBLAS.
saya pakai model intfloat pak, lumayan cepet di CPU https://huggingface.co/intfloat/multilingual-e5-large
kalo pdf bisanya saya parse dulu textnya atau pakai ocr, terus embeddingnya disimpan di postgre pakai pgvector (https://github.com/pgvector/pgvector) agak effort sih