This is What Limits Current LLMs

88,900
0
Published 2024-05-04
Recent advances in large language models (LLMs) have centered around more data, larger models, and larger context lengths. The ability of LLMs to learn in-context (i.e. in-context learning) makes longer context lengths extremely valuable. However, there are some problems with relying on just in-context learning to learn during inference.

Social Media
YouTube - youtube.com/c/EdanMeyer
Twitter - twitter.com/ejmejm1

All Comments (21)
  • The points you bring up about the failures of RAG and incontext learning are more broadly problematic. I've frequently wondered why LLMs fail to be able to "synthesize" information from different sources. Try to get an LLM to debate you on a topic. it's completely futile because the LLM just regurgitates information from its training data (with all its associated biases). It doesn't do any reading between the lines, by taking disparate sources of information and having that "Eurika" moment when two, at first, disconnected ideas come together to introduce new knowledge. In essence, LLMs are still just context dependent pattern recognition machines.
  • @li_tsz_fung
    After you first brought up continual learning versus long context / RAG, I immediately think of the example of just asking for advice from customer service staff with a manual, versus asking a technician. RAG is like a CS person that could quickly find the page related to your question. In context seems a bit better, the person had read the manual and probably can bring you a bit insight, depends on how smart that person is. But continual learning, ideally means the manual is fully understood by the person. They learnt it, instead of remembering it. Ideally, depends on how good a learner they are.
  • @broyojo
    do you see degradation of the model as you are doing continual learning, maybe some catastrophic forgetting?
  • @natecodesai
    I paused. From my experience working in production with RAG... simply it's like using a legacy technology to fill in the blanks in training. Attention mechanisms are way more subtle and nuanced than semantic search lookup.... if there was a cheap way to viably do continual learning in a production environment, and still allow context windows, etc. it would solve the whole problem of semantic search not being enough to find the right data you really need.
  • @woolfel
    lets be honest, documentation generally sucks. I'm an open source contributor and the biggest issue users report is "your docs suck". RAG doesn't solve out-of-distribution situations, so it's not a magic bullet. Even when developers try to write good docs, it quickly becomes out of date and wrong. until we have better architectures, continuous training will be needed.
  • @yassinesafraoui
    I think privacy of users can be a real problem if continual learning is not done carefully
  • @Crybyte
    2:30 - I'm thinking that the reason you continuously train models in production instead of using RAG is because deciding what information is relevant is unclear. By continuously training models, the model is able to make those connections itself. I'm wondering if this also reduces inference speed, as the prompt no longer has so much extra padding.
  • @DanLyndon
    The poetry example has a hidden layer to it. It does not matter whether an AI is trained on high quality poetry. It cannot create high quality poetry, because this requires a completely different kind of intelligence than mere pattern recognition of things that have already been written.
  • @Laszer271
    I am not convinced by your arguments. In my work, I often need to tell business people that we should not do training or fine-tuning until we see that in-context learning is not enough. And in the end, it turns out that in-context learning is enough, even for those complicated examples you described. Let me argue with your arguments. 1. `Where does the context come from?` - Well, if we have no context then any training is even more out of the question than in-context learning, right? I don't see how that would be an argument for continual learning. 1.1 `RAG won't enable our models to solve complicated or niche problems` - It can. If LLM is capable enough and has great general knowledge then it can often solve problems that no human has ever solved before, just using its general knowledge and some additional context about the problem that needs solving. 2. `The scope of in-context learning is limited by pertaining data` - agreed. However, the most capable models are trained on almost every type of data you can imagine. That means that you probably won't find a problem for which in-context learning won't work for those models. This can however be a problem if you are working with less capable, smaller models. For me the biggest problem of RAGs is that models do not work well with very large contexts. Even if you have 128k context length in gpt4, it won't be able to reason well with 100k tokens of table data or system logs or even a codebase. Instead, it will often "misread" or "forget" information in such a long context. That's more of a limitation of current LLMs, I think, not an intrinsic characteristic of RAG systems.
  • @jaysmooveV2
    My guess on why continual is that if pulled in an economically efficient manner it can allow researchers who are on the bleeding edge of the ML field to harness the copilot like assistance when working on completely new tasks because often times when doing your own research you often have to answer all the questions on your own and to have a second brain their with you that has all the same context on the research as you do will be very beneficial
  • @JEEVRAJTARALKAR
    I have created an algorithm for continous learning model, which can help software developers keep track of problems solved. But implementation needs more brains, so going slow till I am alone. I wasn't aware someone else is thinking on same lines. Thanks for the video
  • Reason 3, which I've worked with since around 2015: My AI needs to be able to learn to get smarter, weights can not be frozen and the net can not be static, nor can it ever become silent.
  • @roomo7time
    The main reason we dont do CL is nbecause there is nothing in old fashioned CL that actually works. Currently the only thing that learns from the previous and improves is in context learning. Yes we need continual learning but it should be by making incontext learning lighter. More fundamentally, we need to separate between memory based incontext continual learning and parameter continual learning. The former is to add up event based knowledge while the latter is to add up intrincsic reasoning capabilities. The former should be based on external memory while the latter is on the parameter level. Right now we have the former in the name of incontext learning using token memory. The community is already doing research along this direction although not in the name of CL.
  • Beautiful drawings. Erasing the drawing, that is a nice transition. Great presentation.
  • @tk_kushal
    the first thing that comes to mind when thinking about using continual learning as compared to RAG, is that the LLM is quite like our brains, and we can't effectively retrieve all the relevant information or solve the problem with comparable accuracy if we are seeing something for the first time, even if we have all of the context there is.
  • @therobotocracy
    Intuitively I have always thought this. Really interested in what you are doing. Where can we find more info on your work?
  • @Sancarn
    This is the same problem I have with LLMs currently. They just don't use code I've written even though it's open source lol. It's a big problem Another important thing is, sure you may make a coding LLM, but when you get it to code something from the real world, e.g. simulation software, you will likely need an LLM which has knowledge of both physics and coding. General domain knowledge is useful.
  • @bilalbaig8586
    Continous learning is not possible at the moment for most top open weights models because the creators have not released the data used to train the model. A significant amount of training using data that does not include the original data will lead to degradation of the model.
  • @ImtihanAhmed
    Great explanation about the limitations of current LLMs. I worked with RAG systems recently and it works...for very specific limited knowledge retrieval type applications. We are still so far away from these systems being generally useful. There's probably a lot more you can do with MoE now that a lot more effort is being put into making models smaller.
  • @chrisbo3493
    This sums up my current Evaluation of the LLM hype: those models are limited (by input data, like quality and field/focus). I do not see the hyped exponential growth, just bigger training provided by more data and computing power. And regarding creative and smart combination solutions for (new) tasks, without really good prompting leading the LLM nothing happens in that direction.