Apparently some geek managed to get hold of GPT4 and stick it on the internet! Bit of an issue due to the eye watering costs for training that thing. Some of these numbers, scratch that, ALL of these numbers are mindboggling:
- The total GPT-4 model has around 1.8 trillion parameters with 120 layers
- It's made of 16 different models, each with ~110b parameters and trained for a specific task/field (a technique known as MoE or Mixture of Experts)
- ~55b parameters are used solely for 'attention', i.e. guiding the model to stay on the topic at hand
- They believe GPT-4 was trained on ~13 trillion tokens (~10 trillion words)
- Some of these tokens were re-used (twice, or four times if it was code-based), so the actual size of the dataset is unknown but probably several trillion tokens
- They fine-tuned the model using more specific data from OpenAI and ScaleAI
- It took ~2e25 FLOPS of compute to train, using ~25k A100 processors for around 3 months
- GPT-4 is about 3x as computationally expensive to run than GPT-3.5
- It's thought a faster/cheaper model takes over after the first few words of a response, which could explain the complaints that ChatGPT got worse over time
Copied and pasted off a link from SemiAnalysis or Yam Peleg (for anyone wanting sources.
That means they used twenty five thousand high end Nvidia GPUs each costing about $25k (!) to do the training and while I think these companies rent resources rather than buy those GPUs, that's still tens of millions $$$ which got nicked and shared!
The page for the data source is deleted but once it's out there, it's out there...
...and I remember back in ye olden days when millions of parameters was still a thing. Right back in the distant past not long after I started this thread. Yeah. That's exponential growth for you. Forget Mores law, these things are accelerating like a dragster on methanol.
So in the past few months (say!) while GPT or AI generally became a billion times "smarter", how much smarter did humans get?
Looks like by the end of this year we'll be talking about Petaflops and YoctoBytes like anyone really comprehends the scale of such vast numbers. While I don't know what this really means, for a scale approximation, the human brain has about 100 billion neurons.
Those 1.8 trillion "parameters" are organised into "layers" and the layers are stacked in a "Multi Layer Perceptron"
...and that diagram effectively represents a neuron in a software neural net. So the numbers can't be compared like for like and in a way, a neural net is not a brain. It is being structured like one though...