Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

By LLMs I meant decoder only, e.g. Gemini, Claude, etc. Can you go into more detail on how you're using the encoder models? I'm curious. Typically I have used them for embedding text or for fine-tuning after attaching a classifier head. What are you pre-training on, and for what task?


> how you're using the encoder models?

In my original comment this is what I was referring to: using the embeddings produced by these models, not using something like GPT to classify text (that's wildly inefficient and in my experience gets subpar results).

To answer your question: you simply use the embedding vector as the features in whatever model you're trying to train. I've found this to get significantly superior results with significantly less examples than any traditional NLP approach to vector representation.

> What are you pre-training on, and for what task?

My experience has been that you don't need to pretrain at all. The embeddings are more information rich than anything you could attempt to achieve with other vector representations you might come up with using the set of data you have. This might not be true at extreme scales, but for nearly all traditional nlp classification tasks I've found this to be so much easier to implement and so much better performing there's really not a good reason to start with a "simpler" approach.


Ah yes this does make sense. We are definitely in agreement on the point of "wildly inefficient and subpar". I'll try out decoder model embeddings soon, e.g. Qwen/Qwen3-Embedding-8B. I'm working with largish amounts of data (200M records), so I tried to pick a good balance between size:perf:cost, using BAAI/bge-base-en-v1.5 to start (384 dim).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: