Descriptive statistics help you to accurately describe different sets of data. These often make it easier to present conclusions and make comparisons.
They include some basic statistics which you will have covered in maths, along with some you may need a reminder about:
Types of Data
Mean, Median & Mode
Grouped Data
Range & Inter-quartile range
Standard Deviation
Understanding what type of data you have is important when selecting the right statistical method.
Uses words to describe and observe.
Nominal data
Name based data which have no order.
Male, Female
Sedimentary, Igneous, Metamorphic
Aberdeen, Dundee, Stirling
Ordinal data
Data that can be given an ascending or descending order, such as:
First, Second, Third
Agree, Neutral, Disagree
Uses numbers to measure and report.
Interval data
Data that has real numbers, with no true zero. This is rare - temperature (in Celsius or Fahrenheit) & dates are the only common examples.
You can't make statements such as "Gairloch is twice as warm as London" - because there's not a true zero.
Ratio data
Data that has real numbers with a true zero. Most numerical data is ratio data.
Rainfall measurements
Counts of people/cars
% soil moisture
A – Understanding the perceptions of rural vs urban residents towards the installation of a new power line.
B – A study into the success of soil improvements in crofts in the NW Highlands.
C – Whether improvements to a High Street have been successful.
A is a perception study, which would be richer with qualitative data from the two groups of residents.
B would largely use quantitative data. You may give the soil's texture as nominal data (based on using a flow chart) but it could also be described quantatitvely (using % silt, sand and clay).
C could use both/either. For example, you might want to combine quantitative data about the number of visitors with qualitative data from a questionnaire asking their opinions.
For each of these examples, state whether it is:
Nominal data
Ordinal data
Interval data
Ratio data
Low order services, middle order services, high order services
12oC, 3oC, 16oC, -3oC
Earth, air, fire, water
2 hectares, 7 hectares, 1 hectare, 10 hectares
Primary, secondary, tertiary, quaternary
30m, 75m, 10m, 3m
Arable, pastoral, plantation, subsistence
32K, 60K, 12K, 100K (temperatures in kelvin)
These quick statistical calculations are all forms of average.
You should be able to choose which one is appropriate for the data you are describing.
This is a type of average which is calculated by adding together all the values in the dataset (∑), and dividing by the total count of values (n).
We use the mean when...
Data is similar and doesn’t have any very low or high values. If you were to plot it on a scatter graph there would be few gaps.
e.g. If a student sits four geography exams the best measurement would be the mean as it would give a good picture of overall performance.
This is the middle value, when all the values are placed in ascending (or descending) order.
If there are two middle numbers, find the mean of these.
We use the median when...
Data has some extreme lows or highs which may skew a mean.
e.g. House prices where few homes with a high or low price.
The most frequently occurring value.
This is the only measure of average which can be used with nominal data.
If there are two modes, the data is "bi-modal". If there are no modes, it simply has no mode!
We use the mode when...
The data is in the format of frequency.
e.g. plotting the land use in a town - you can't get a "mean" land use but you can get the modal land use.
A - Soil pH testing using a pH meter
B – Soil texture following a flow-chart
C – Soil moisture (%) using the drying method
Depending on the data, both A and C could use median or mean. If the data was spread out (e.g. very low and very high moisture readings) then median would be the most appropriate.
Soil texture following a flow-chart gives nominal data, so this would be best described using the mode.
Complete these questions in your notes.
The weekly pocket money for nine S1 pupils was found to be: £3, £12, £4, £6, £1, £4, £2, £5, £8. What is their mean weekly pocket money?
Find the mean daily rainfall from the following measurements that were taken each day over a one-week period: 6, 2, 7, 6, 9, 0, 5 (mm).
Find the median of the data in questions 1 and 2 .
Find the mode of the data in questions 1 and 2.
Which measure of average is the best one to use in the following three examples? In each case give a reason for your answer.
a) A distance runner entered seven marathons. His times were (hh:mm) 3:45, 4:05, 3:55, 4:25, 4:20, 3:48, 3:55. He wants to find his overall performance.
b) The average wage of the employees in an office from managing director to the office junior.
c) In the 2022 Boston marathon, finishers were reported in two categories; male and female. There were 6562 male finishers and 1562 female finishers.
Working out the mean, median and mode can be more tricky to calculate when data is sorted into groups.
We use the midpoint (or median) of the group and multiply this by the frequency.
We can then calculate the mean...
... which is in the 5-9 group.
We can then calculate the median...
... which, as there are 30 total visitors (n=30), will be the 15th and 16th values when they are placed in order. This will also fall into the 5-9 group. However, because the data is groups, we don't get any sense of where it is between 5-9.
We can then calculate the mode...
... well actually, we can't find the mode. We can only find the modal group. This is the one that occurs most frequently (i.e. the 5-9 group). Note that the modal group is sometimes known as the modal class.
A survey of visitors to a new health clinic in central Chad is undertaken, to find out it's sphere of influence. The results are shown in the table.
Plot the data in the form of a bar graph.
Create a table which will allow you to work out the mean. Then find the modal group and median distance travelled by visitors to the health clinic.
Which of these measurements (mean, mode, median) is the most satisfactory and least satisfactory? Justify your answer.
These are both measures of dispersion. They show how much data differs from the average.
If data was plotted onto a scatter graph it would show how spread out or distributed the data was - but wouldn't show where it actually is on the axis.
is the difference between the highest and lowest values.
Marks in a geography exam:
10 25 45 47 49 51 51 52 52 54 56 57 58 60 62 66 68 70 75 90
The range of this data is 80. However, one disadvantage of the range is that it doesn't tell us that most of the data is grouped around the middle.
The inter-quartile range is the range of the middle half of the values.
In this case, it would be a better measure as it omits the extremes.
Every set of data has three quartiles:
Q1 is the middle value of the bottom half of the data.
Q2 is the median value
Q3 is the middle value of the top half of the data.
Once we have worked out Q1 and Q3, we can calculate the inter-quartile range.
Marks in a geography exam:
10 25 45 47 49 51 51 52 52 54 56 57 58 60 62 66 68 70 75 90
Q1 is half way between the 49 and 51 values. Q1 = 50
Q2 is the median - half way between 54 and 56. Q2 = 55
Q3 is half way between the 62 and 66 values. Q3 = 64
Therefore, the Interquartile range is: Q3-Q1 = 64-50 = 14
When comparing two sets of data, a box-plot is often used to display the interquartile range.
Answer these questions.
1. The data below shows how long on average it takes a group of cyclists to travel to work
along a cycle path (in minutes):
22, 19, 24, 31, 16, 48, 29, 29, 21, 15, 22, 28, 27, 23, 37, 31, 23, 30, 26, 16, 26, 29
Calculate the median and inter-quartile range.
2. The data shows the minimum temperatures of ten weather stations in Britain on a winter’s
day. The temperatures are: 5, 9, 3, 2, 7, 9, 8, 2, 2, 3 (o C)
Calculate the median and inter-quartile range.
3. Minimum temps in summer at ten weather stations: 10, 12, 11, 14, 9, 8, 15, 12, 12, 11 (o C)
Compare, using two box plots, these with the figures from question 2.
The Standard Deviation measures dispersion of values around the mean.
A large number means there is a wide spread
A small number means that the values are grouped closely together.
The standard deviation is more useful than the interquartile range because it tells us how it's grouped around the mean, rather than the absolute value. It allows us to compare data sets which are very different.
The formula for standard deviation is:
But what does standard deviation actually mean?
When we get a value for standard deviation, we can then use it to describe the data:
68% of data lies within one standard deviation of the mean
95% of data lies within two standard deviations of the mean
99.7% of data lies within 3 standard deviations of the mean
Find the standard deviation of the minimum temperatures, measured at ten weather stations in Britain on a winter's day.
The temperatures are:
5, 9, 3, 2, 7, 9, 8, 2, 2, 3 (o C)
Constructing a table can help calculate the standard deviation correctly.
Column 1 - list all values. Add them up to find ∑x. Calculate the mean (x̄).
Column 2 - write the mean temperature (x̄) in every row.
Column 3 - subtract each value (temperature) from the mean. It does not matter if you get a negative number.
Column 4 - square all column 3 figures, to remove any negative numbers.
Once you've used the formula to calculate the standard deviation, you should always write the answer out in full.
The standard deviation of the minimum temperatures at the weather stations is 2.8 oC. This means that 68% of the data lies within 2.8oC of the mean, which is 5oC.
A sample of coniferous trees had the following heights: 3, 2, 1, 2, 3, 4, 3, 7, 6, 5 (m)
Calculate the standard deviation.