bricks. flickr photo by eaortmann shared under a Creative Commons (BY-NC-ND) license

When the outcome, or class, is numeric, and all the attributes are numeric, linear regression is a natural technique to consider. This is a staple method in statistics. The idea is to express the class as a linear combination of the attributes, with predetermined weights:

Linear regression is an excellent, simple method for numeric prediction, and it has been widely used in statistical applications for decades. Of course, linear models suffer from the disadvantage of, well, linearity. If the data exhibits a nonlinear dependency, the best-fitting straight line will be found, where “best” is interpreted as the least mean-squared difference.

### Linear classification: Logistic regression

we can use any regression technique, whether linear or nonlinear, for classification. The trick is to perform a regression for each class, setting the output equal to one for training instances that belong to the class and zero for those that do not. The result is a linear expression for the class. Then, given a test example of unknown class, calculate the value of each linear expression and choose the one that is largest. This method is sometimes

called multiresponse linear regression.

One way of looking at **multiresponse linear regression** is to imagine that it approximates a numeric membership function for each class. The membership function is 1 for instances that belong to that class and 0 for other instances. Given a new instance we calculate its membership for each class and select the biggest.

### Drawbacks

- First, the membership values it produces are not proper probabilities because they can fall outside the range 0 to 1.
- Second, leastsquares regression assumes that the errors are not only statistically independent, but are also normally distributed with the same standard deviation, an assumption that is blatantly violated when the method is applied to classification problems because the observations only ever take on the values 0 and 1.

A related statistical technique called logistic regression does not suffer from these problems. Instead of approximating the 0 and 1 values directly, thereby risking illegitimate probability values when the target is overshot, logistic regression builds a linear model based on a transformed target variable.

### Linear classification using the perceptron

to learn a hyperplane that separates the instances pertaining to the different classes let’s assume that there are only two of them. If the data can be separated perfectly into two groups using a hyperplane, it is said to be linearly separable. It turns out that if the data is linearly separable, there is a very simple algorithm for finding a separating hyperplane.

The algorithm is called the perceptron learning rule.

Here, a1, a2, . . ., ak are the attribute values, and w0, w1, . . ., wk are the weights that define the hyperplane. We will assume that each training instance a1, a2, . . . is extended by an additional attribute a0 that always has the value 1 This extension, which is called the bias, just means that we don’t have to include an additional constant element in the sum.

Of course, if the data is not linearly separable, the algorithm will not terminate, so an upper bound needs to be imposed on the number of iterations when this method is applied in practice.

### Linear classification using Winnow

Winnow only updates the weight vector when a misclassified instance is encountered

it is mistake driven.

The two methods differ in how the weights are updated. The perceptron rule employs an additive mechanism that alters the weight vector by adding (or subtracting) the instance’s attribute vector. Winnow employs multiplicative updates and alters weights individually by multiplying them by the user-specified parameter a (or its inverse).

# Bibliography

Ian H. Witten, Eibe Frank. (1999). *Data mining practical machine learning tools and techniques.* Elsevier.