Instance-based learning

 
house flickr photo by barnyz shared under a Creative Commons (BY-NC-ND) license

In instance-based learning the training examples are stored verbatim, and a distance function is used to determine which member of the training set is closest to an unknown test instance. Once the nearest training instance has been located, its class is predicted for the test instance.

Distance function

Although there are other possible choices, most instance-based learners use Euclidean distance.

5

When comparing distances it is not necessary to perform the square root operation; the sums of squares can be compared directly.

Different attributes are measured on different scales, so if the Euclidean distance formula were used directly, the effects of some attributes might be completely dwarfed by others that had larger scales of measurement. Consequently, it is usual to normalize all attribute values to lie between 0 and 1, by calculating

6

Nearest-neighbor instance-based learning is simple and often works very well. In the method described previously each attribute has exactly the same influence on the decision, just as it does in the Naïve Bayes method. Another problem is that the database can easily become corrupted by noisy exemplars.

 

Bibliography

Ian H. Witten, Eibe Frank. (1999). Data mining practical machine learning tools and techniques. Elsevier

Advertisements

Author: enroblog

Computer science student at ITESM

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s