K-NN algorithm, cross-validation, hold-out validation, model performance, robustness, data rebalancing, folds, neighbor selection, category representation
This document discusses the importance of choosing the right value of k in K-NN algorithms and how cross-validation can be used to improve the model's performance and robustness.
[...] If on my 3 folds, I have like for percentage of correct predictions: and then the performance of my model depends a lot on the training dataset and it would be necessary to do more investigations, change the model or change the value of k. The goal being to have 100% for each iteration. Suppose now that we had simply opted for a hold-out validation, that is, a single iteration. And by chance, our performance was we would think that we have a very good model, but it is not the case. On can then perform this validation cross-validation for different values of k and observe which value allows to obtain the best efficiency. [...]
[...] How to make a K-NN algorithm more efficient by cross-validation? The choice of k is crucial to obtain relevant results It has been observed that the determination of a category depends largely on the choice of the number of neighbors With a k that is too small, the magnifying effect is too strong, leaving too much to the randomness of the distribution of the elements When k takes a maximum value, the resultthat depends on the number of elements present in each category. [...]
[...] We check if the model finds the correct category of the submitted profile. When we proceed in this way, we evaluate the model only once in a single test data set. However, it may happen that the distribution of the samples is unfavorable, that in the test data set, there is an unrepresented category. We therefore obtain an estimate of the model's performance that does not reflect reality. Cross-validation allows us to evaluate the model several times and present the entire samples of the data set. [...]
APA Style reference
For your bibliographyOnline reading
with our online readerContent validated
by our reading committee