Posts

Showing posts with the label Data Mining & Machine Learning

K- Fold Cross Validation for reducing over-fit issue on classifiers

Image
In k -fold cross-validation, the original sample is randomly partitioned into k equal sized sub-samples of the k subsamples, A single sub-sample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data . The cross-validation process is then repeated k times (the folds ), with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation . The advantage of this method over repeated random sub-sampling is that all observations are used for both training and validation , and each observation is used for validation exactly once. The disadvantage of this method is training algorithm has to be re-run from the scratch k-times which means it takes as much computation to make an evaluation . The error of the classifier is the averages testing error across k-testing parts.

Regression and Classification

Image
Regression and Classifications are two major area in Classification Technique in Data Mining. Toady I heared a question what is regression and what is classification and where am i use which condition. The Image says lots of word than I write.

Session 01 : Data Mining and Machine Learning - Introduction

Image
Introduction The Data is the most important thing in any field, whether is profitable or non- profitable field. The amount of data in the world, in our lives seems to go on and on increasing. The fact of the statistic says the amount of the data stored in the common world database is increasing doubles every 20 months. It is hard to justify the data in the sense of quantities.