Jump to content
Business Intelligence & Analytics for Digital Transformation

Standardization vs. normalization ?


Recommended Posts

In the overall knowledge discovery process, before data mining itself, data preprocessing plays a crucial role. One of the first steps concerns the normalization of the data. This step is very important when dealing with parameters of different units and scales. For example, some data mining techniques use the Euclidean distance. Therefore, all parameters should have the same scale for a fair comparison between them.


Two methods are usually well known for rescaling data. Normalization, which scales all numeric variables in the range [0,1]. One possible formula is given below:

norm.png.734e9f7c5ed4fd68a5b5b9a74309af80.png


On the other hand, you can use standardization on your data set. It will then transform it to have zero mean and unit variance, for example using the equation below:

stand.png.8afcd54950adc5134383de8bb625ebc4.png


Both of these techniques have their drawbacks. If you have outliers in your data set, normalizing your data will certainly scale the “normal” data to a very small interval. And generally, most of data sets have outliers. When using standardization, your new data aren’t bounded (unlike normalization).


So my question is what do you usually use when mining your data and why?
 

  • Thanks 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

Announcements



×
×
  • Create New...