Ask HN: Where can I find practical comparative data regarding different LLMs?

8 points

2 years ago

I've been trying to keep up with the advances in the world of AI and LLMs. NLP was a world that I knew pretty well 7 years ago, when I knew most of the major NLP libraries, and their various strengths and weaknesses. However, nowadays, I'm having trouble finding good discussions about the real uses of the LLMs.

I have gone to Hugging Face, and the amount of data there is overwhelming, but it seems poorly organized:

https://huggingface.co

Does anyone know a secret that makes that site tractable? I've experimented with a few of the libraries posted there, but I can only sample a tiny fraction of what is there, and what I'm missing is some method for finding the useful stuff while disposing of the junk.

7 years ago I could tell you the strengths of weaknesses of the Google's Tensorflow or the Stanford NLP library. But where do I go to get good comparative information now, about the strengths and weaknesses of the various libraries that interact with the new LLM tools?

I'm looking to answer practical questions, that I can use in my own work with AI startups.

For an example of a question, for which I cannot find an answer, I am aware of a startup that has developed a chat client that, the startup says, can entirely replace a company's customer support team. Among the claims made by the startup is that when their chat client makes a mistake, it can be easily adjusted so it won't make that mistake any more. I am curious, what approaches are the engineers at that startup probably using to fix mistakes? If I search Hugging Face for ways to fix factual errors in LLMs then I see some libraries, but I've no idea what is considered good or bad.

So I asked the Hacker News community, how are you keeping up with advances around LLMs and associated tools?

Also, every LLM seems to have an embedded finite state machine that remembers the state of the current conversation, so where can I go to learn about the strengths and weaknesses of those finite state machines? How would I go about adjusting them?

Or, let me offer another example of the kind of information I want:

I've been testing different AI chats by trying to play text adventures with them. For instance:

https://huggingface.co/spaces/HuggingFaceH4/zephyr-7b-gemma-chat

https://chat.openai.com

If I use the same prompt with each of them, I can see how different they are, but how do I know if my observations are general (would other people get similar results) and how do I learn about other AI chats (since I cannot test them all).

5 comments