Measures of Central Tendency in Statistics

Measures of Central Tendency in Statistics

Hi! In this article, I will tell you stories about mean, median, and mode, how to calculate them, what are the difficulties faced by them, and who will rescue them? In the end, I will explain when to use and when not to use each of them. So, without wasting further time, let us start. Scroll down!!

1. Measures of the Central Tendency

In the first place, what are measures of central tendency and, why do I need to study it? In my opinion, the answer is, when you have data with you and someone asks how is your data and what’s a short summary of your data, we need to come up with some sort of “average” so that, that average represents how your entire data looks like. And that average is of many types such as mean, median, and mode. So, we need to study “Measures of the Central Tendency” to summarize our data and come up with a value such that it represents our entire data.

Correct me if my opinion is wrong in the comments!

Now let us get started!!

1.1 The story of Mean

I think you all know me :) I am introduced to you in your schools. I am just AVERAGE. Most people call me average. But I get angry when people call me average because there exist other types of averages too (median, mode, etc). People call me using μ. My formula is the sum of all values present, divided by the total number of values. But,

I have a problem — 😒

I am blown away when there are outliers!! You may ask what are outliers. Well, they are the values which are very much greater than the average values present in the data. Haven’t understood? Let me explain it with an example. Let us consider marks in a class. Let’s assume the marks of students has a mean 8 GPA, but there is one TOPPER, who gets a 9.8 GPA. Now, what's the mean of the class?? Is it 8 or something else? Well, we might intuitively think that the mean should be 8, but it’s not!

What’s the problem?? The problem is with TOPPER. If we add his GPA to the entire class GPA, the mean GPA will increase so much, which is obviously incorrect. So here, the TOPPER GPA is an outlier.

Let’s define it more formally — An outlier is an observation that lies an abnormal distance from other values in a random sample from a population.

Therefore, the mean is heavily affected by outliers.

I have one more problem.

I am not always present in the data.

The mean you get, may not be present in the data itself! For example, consider, 1,2,3,101,102,103. The mean is 52. But 52 is itself not a value in the data.

So what to do now? Don’t worry, Median comes into the rescue!

1.2 The story of Median

Median: I am introduced to you because there was a problem with my friend Mean (μ). The problem is, he doesn’t like OUTLIERS. So, who am I? How can you find me? Here it is,

1. Sort the values in ascending order.

2. If there is an odd number of values, I am the middle element.

3. If there is an even number of values, I am the mean of two middle numbers.

Median: That’s it! I am so easy to compute

Pranay: One good thing about the median are it solves the problem faced by mean. Median is not affected by the outliers.

For example, consider the data, 1,2,2,3,3,4,5,100. Clearly, 100 is an outlier in our data. So, we can’t use mean here and take the help of median. So, median is (3+3)/2 which is 3. So, median is not affected by the outliers.

Median: So, I am better than my friend Mean(μ)! I am proud of myself :)

I have a problem — 😒

Pranay: Wait, what problem again? We fixed the issue of outliers. So what’s the issue again!?

Median: Look at this data set, 1, 1, 1, 2, 2, 3, 41, 42, 100, 101, 101, 101. I am (3+41)/2 =22. But 22 is not present in the data set!

Pranay: I thought you can solve all my problems, but no!! Now, what’s the solution?

Median: Well, I have my best friend mode. He will solve this issue!!

1.3 The story of Mode

Hi, I am Mode!! I am introduced to you because both of my friends ( mean and median ) sometimes will not be present in the data. For example, let us consider the data 1, 1, 1, 2, 2, 3, 41, 42, 100, 101, 101, 101. Now if we find the median, it is 22, which is not present in the data. So, in this case, we use Mode.

How to calculate Mode?

  1. Find all the unique values or categories in the data
  2. Calculate the frequency of all these unique values/categories
  3. Select the category/value with the highest frequency

So, in the above example, 1, 1, 1, 2, 2, 3, 41, 42, 100, 101, 101, 101 mode is 1 and 101 both! Because they are present 3 times in the data. So this data has 2 modes. It is sometimes called Bimodal. Our data has 2 Clusters.

Note that, the data can have any number of modes too!

So, the mode has rescued us!! And we can tell that the data has an average of 1 and 101. The data has two averages because it has 2 clusters/modes! If we visualize the data, we see that cluster 1 has values 1,2,3,41,42 and the second cluster has values 100,101.

Two good things about the mode are, it is always present in the data unlike mean and median, and, also, it is not affected by the outliers!

Mode can be used for categorical variables also, whereas we cannot use mean and median if our data has categorical variables.

1.4 Summary

Mean: When to use: If our data doesn’t have outliers and is symmetric.

Median: When to use: If our data is skewed due to outliers.

Mode: When to use: If our data has clusters or categorical variables.

I hope you have gained some knowledge of Measures of Central Tendency. If you have liked the content, please hit the “Clap” button and,

Connect with me -

*LinkedIn :* [*https://linkedin.com/in/bomma-pranay*](https://linkedin.com/in/bomma-pranay)*GitHub :* [*https://github.com/Bomma-Pranay*](https://github.com/Bomma-Pranay)

*--- By Bomma Pranay A Data Science Enthusiast*