Improving K-NN Algorithm Efficiency with Cross-Validation

EN P.

Order the writing of a tailor-made Computer science Practical guide

Free quote online

Practical guide Format .docx

Improving K-NN Algorithm Efficiency with Cross-Validation

Download

Read an extract

Themes

K-NN algorithm, cross-validation, hold-out validation, model performance, robustness, data rebalancing, folds, neighbor selection, category representation

Reader
Abstract
Contents
Extract

Abstract

This document discusses the importance of choosing the right value of k in K-NN algorithms and how cross-validation can be used to improve the model's performance and robustness.

Choice of k
Rebalancing

Get this table of contents for free after login.

Extract

[...] If on my 3 folds, I have like for percentage of correct predictions: and then the performance of my model depends a lot on the training dataset and it would be necessary to do more investigations, change the model or change the value of k. The goal being to have 100% for each iteration. Suppose now that we had simply opted for a hold-out validation, that is, a single iteration. And by chance, our performance was we would think that we have a very good model, but it is not the case. On can then perform this validation cross-validation for different values of k and observe which value allows to obtain the best efficiency. [...]

[...] How to make a K-NN algorithm more efficient by cross-validation? The choice of k is crucial to obtain relevant results It has been observed that the determination of a category depends largely on the choice of the number of neighbors With a k that is too small, the magnifying effect is too strong, leaving too much to the randomness of the distribution of the elements When k takes a maximum value, the resultthat depends on the number of elements present in each category. [...]

[...] We check if the model finds the correct category of the submitted profile. When we proceed in this way, we evaluate the model only once in a single test data set. However, it may happen that the distribution of the samples is unfavorable, that in the test data set, there is an unrepresented category. We therefore obtain an estimate of the model's performance that does not reflect reality. Cross-validation allows us to evaluate the model several times and present the entire samples of the data set. [...]

docx