# how to find outliers using standard deviation and mean

We use the following formula to calculate a z-score: z = (X – μ) / σ. where: X is a single raw data value; μ is the population mean; σ is the population standard deviation

I know this is dependent on the context of the study, for instance a data point, 48kg, will certainly be an outlier in a study of babies' weight but not in a study of adults' weight. Take the Q1 value and subtract the two values from step 1. That is what Grubbs' test and Dixon's ratio test do as I have mention several times before. For our example, the IQR equals 0.222. Hypothesis tests that use the mean with the outlier are off the mark. A single outlier can raise the standard deviation and in turn, distort the picture of spread. You mention 48 kg for baby weight. it might be part of an automatic process?).

any datapoint that is more than 2 standard deviation is an outlier). How do you find the outlier with mean and standard deviation? You say, "In my case these processes are robust". Calculate the inner and outer lower fences. In this case, you didn't need a 2 × SD to detect the 48 kg outlier - you were able to reason it out. Asking for help, clarification, or responding to other answers. To subscribe to this RSS feed, copy and paste this URL into your RSS reader. site design / logo © 2020 Stack Exchange Inc; user contributions licensed under cc by-sa. A square root of a number is merely the value that when multiplied by itself, will result in the number. I don't know. rev 2020.11.5.37957, The best answers are voted up and rise to the top, Cross Validated works best with JavaScript enabled, Start here for a quick overview of the site, Detailed answers to any questions you might have, Discuss the workings and policies of this site, Learn more about Stack Overflow the company, Learn more about hiring developers or posting ads with us. Do the same for the higher half of your data and call it Q3. How can I debate technical ideas without being perceived as arrogant by my coworkers? Then, get the lower quartile, or Q1, by finding the median of the lower half of your data. This method can fail to detect outliers because the outliers increase the standard deviation. If you have N values, the ratio of the distance from the mean divided by the SD can never exceed (N-1)/sqrt(N).

