In former articles, we have defined what generalization error is in the context of machine learning, and how to bound it through various inequalities. We also defined overfitting and how it can be remedied by using a validation set. We could avoid the paradox of choosing a size for our validation set by using cross-validation, which turned out to be an unbiased estimator for E_out(N — 1). In this article, we will give some practical examples on which inequalities to use in the case of a validation set and cross-validation.
Example — Validation Set
Imagine that we have a dataset…
In the former articles, we defined what generalization error is in the context of machine learning, and we saw how it could be bounded through various inequalities. Hence, we would estimate that the hypothesis with the smallest in-sample error also would result in the smallest out-of-sample error. But, in practice, is it always true that the hypothesis with the smallest in-sample error results in the best performance on new data? The answer is no, sometimes the hypothesis with the smallest in-sample error results in a much worse result than another hypothesis with a higher in-sample error. Why is that the…
In the former article, we saw how we could replace M in our inequality with the growth function, and thereby the VC dimensions. The VC dimension means that it is possible to learn with infinite hypothesis sets — because we can replace them with something finite. It is also tighter than M since we no longer depend on the union bound, but instead, we consider overlaps between different hypotheses. In this article, we will give a few examples of how to use the inequalities. …
In the former article, we saw how Hoeffding’s Inequality, with slight modifications, could be used in the context of estimating how well a hypothesis would generalize on unseen data. The crux of the problem was that we needed to take the step away from an old-school verification of a single hypothesis’ viability, to find the optimal hypothesis from a hypothesis set instead. The problem was, we could not blindly apply Hoeffding’s Inequality to this scenario — instead, we needed to loosen the bound significantly by using the union bound of all M hypotheses. The last article ended by asking whether…
In the former two articles, we talked respectively about Hoeffding’s Inequality and how it could be used in the setting of the learning problem. We saw how Hoeffding’s Inequality could not be directly applied to the learning problem, otherwise, it would simply be a verification of a hypothesis — instead of learning and choosing the best hypothesis from a hypothesis set. In this article, we will get some practical examples of how to use the two bounds before the next article in the series will continue upon how to improve the bound. In the last article, concerning the generalization error…
Machine learning is about building models based on some given sample data, also known as training data, and afterward using this model to make predictions and decisions on new, unknown data. Hence, it can be said that machine learning is about learning and using the rules intrinsic to the training data. This where a problem arises — how can we be sure whether the rules the model has learned from the training data are viable on new, unseen data? This article will be the first in a series of articles concerning the theory of generalization in machine learning. …
Said in simple terms, probability theory is the mathematical study of the uncertain — it allows us (and thereby also the computer) to reason and make decisions in situations where complete certainty is impossible. Probability theory plays a center-stage role in machine learning theory, as many learning algorithms rely on probabilistic assumptions about the given data. Especially one probability bound has had an enormous impact on machine learning theory — the Hoeffding’s Bound.
This article is meant to understand the inequality behind the bound, the so-called Hoeffding’s Inequality. It will try to give a good mathematical and intuitive understanding of…
Data science and Machine Learning student at Copenhagen University.