The standard training for LLMs involves **forecasting the next word** in a sequence of words.
The LLM prediction is compared to the actual words in the text until it can generate accurate responses. This is known as **next-token prediction** and **masked-language modelling**.
![[Pasted image 20230219144444.png]]
The model above is simply filling in the blanks with the most probable word given the surrounding words. Next-token prediction and masked-language modelling are both common training tasks for [[Recurrent Neural Networks - RNNs]]. But there are serious limitations to this.