The standard training for LLMs involves **forecasting the next word** in a sequence of words. The LLM prediction is compared to the actual words in the text until it can generate accurate responses. This is known as **next-token prediction** and **masked-language modelling**. ![[Pasted image 20230219144444.png]] The model above is simply filling in the blanks with the most probable word given the surrounding words. Next-token prediction and masked-language modelling are both common training tasks for [[Recurrent Neural Networks - RNNs]]. But there are serious limitations to this.