Blog > Dolly, No Diddle Daddle

Dolly, No Diddle Daddle

Trying to make sense of the otherworldly claims about generative AI

1,036 words, ~ 5 min read

ai

perspective

Generative AI seems to be the newest trend gaining momentum and hype. It reminds me a lot of crypto from a few years ago. People are saying it'll change everything, that this is the future.

note: I started drafting this piece when Dolly was first announced. It's highly likely a lot has changed since then.

Table of Contents

A Human LLM

Let's say there's a human form of these LLMs. We'll refer to them as "L" for LLM.

You're trying to teach L about everything, in the hopes that you can ask L for something later on and L will give you an answer. But it seems like L isn't learning anything, they're just trying to predict what you want to hear.

If you teach it about World War 2, it tries to map all that data by processing that data based on word representations and word arrangement. If you ask it about World War 2, they'll be able to go back and give some pretty decent information. If you ask it about how wars impact humanity, they'll try to predict based on what the training has been to generate an answer.

Notice that nowhere in that process L reflects on what a war is, what its implications are, or anything of that sort. The response is a prediction of what is most likely to follow the prompt.

Finding what is likely to follow a given prompt is tricky and far beyond the scope of this post. There are graduate level courses for that. The key idea is that there are underlying representations of words as they are put together, which improves prediction rates. These often come from training data, however, and do not signify a true, humanlike understanding.

A lot of the hype around LLMs is due to the idea that this may finally be the AI that replaces us. That is a bit of a stretch. Prediction is useful if the task is very repetitive, but the inaccuracies present with generating new information are an integral part of LLMs if they use this prediction model without deeper understanding of the words themselves.

Deeper understanding is a term that's tricky to define.

Moving Past Understanding

AI has been around for a long time.

Initially, AI was seen as the ultimate intelligence. This was what Isaac Asimov wrote about and the antagonists of many, many movies.

The things that represent higher levels of intelligence for humans are not the same for computers. Consider chess, for example. Computers were able to take the game, consider the possible moves using heuristics, and then evaluate with a degree of depth to determine what move to make. By 1997, when Deep Blue beat Garry Kasparov, it could be said that we had figured out how to beat humans in chess.

pic

That didn't generalize to simpler tasks, however. Making a computer that could pick up something, walk, and drop it somewhere else is a far more challenging problem. Yet, for a human, we consider it very trivial. This is Moravec's Paradox.

The idea of mimicking intelligence, instead, has been widely embraced. Self-driving systems effectively try to get in a wide enough dataset of various conditions, predict what to do, and then do that action. These systems, however, are not very generalizable. The self-driving model can't be used to write your essay or have a conversation with you while it's driving, for example.

Artificial General Intelligence is the concept of an agent that can do any intellectual task, perhaps showing true intelligence. It may be possible, it may not be possible, but for now, much of the hype is surrounding the applicability and use of models like ChatGPT.

Predictions and Implications

There exists a point where the models become good enough. At that point, it doesn't matter whether GPT-6 or GPT-7 is being used; it's more about the prompt and what data is being passed into the model for it to contextualize. It's similar to how the neural net architecture is less important than the data being passed in - not just in sheer terms, but in quality. Andrew Ng calls this "data-centric" AI.

Sam Altman, CEO of OpenAI, has said that they aren't focusing on training GPT-5, rather focusing on expanding GPT-4.

Databricks recently released a blog post detailing something they call "Dolly" (what the title of this post references), with the source code on GitHub. ChatGPT is proprietary and has a lot of parameters that go into training it. The idea behind Dolly is to make the process open-source for everyone while improving the underlying training data of older models.

Even more recently, Dolly 2.0 was released, trained on a high-quality, human generated dataset by employees that has been completely open-sourced for commercial use.

These indicate two key things:

  1. Data Quality > # of Parameters
  2. Open Source

There are two things that can describe a given model: the architecture and the parameters. The parameters are usually determined by some form of training, which is where the data comes into play. Usually, increasing the number of parameters in the underlying architecture improves the ability to "learn" at the risk of overfitting to specific cases in the data, but improving the quality of data looks to change that.

A key implication: lower compute requirements. Fewer parameters mean companies can train models themselves and run those models easier. The market is better if we have every company able to train themselves, forcing the larger compute providers to remain competitive opposed to a few monopolies.

Databricks as a company is huge on open-source, so it makes sense that they are the ones to push out Dolly. More generally, open-source has been key to much of AI research. Google's paper on transformers, for example, pushed the whole industry forward in major ways back in 2017 for NLP.

Anything put into ChatGPT is going through OpenAI's servers; for many companies that may be working on proprietary things, it is not a great idea to put that information into OpenAI.

More importantly, there looks to be an ethical responsibility here to open-source technologies that have the broad impact that LLMs do.

Found this interesting? Subscribe to get email updates for new posts.

Return to Blog