Bayesian Optimisation

> In a hyper-parameter space, points in the same region, tend to return similar performance Bayesian Optimization samples regions of better performing points with higher likelihood. Tree of Parzen Estimators (TPE) is one of the algorithms that takes a Bayesian Optimisation approach ![[Pasted image 20210917131941.png]] ![[Pasted image 20210917132043.png]] TPE is shown to find better hyperparameter configurations than random search Gray dots in the above diagram represent the lowest error among n. trials that used random search, while green dots are the lowest errors by the TPE algorithm for the corresponding n. trials. The paper thus showed in their experiments that TPE generally discovers hyperparameter configurations that return lower validation error than random search. Source: [Bergstra et al. (2013)](http://proceedings.mlr.press/v28/bergstra13.pdf) ---- ![[Pasted image 20230521194224.png]] Bayesian Optimization uses a surrogate model, the model used for optimizing the objective function. A popular surrogate model is [[Gaussian Processes]].. The GP posterior is cheap to evaluate and is used to propose points in the search space where sampling is likely to yield improvement... and also an **acquisition function** that directs sampling to areas where an improvement over the current best observation is likely