Merge Sort Time Complexity is O(nlogn) and in this tutorial i am going to show you with analysis using recurrence relation why it is so.
Merge Sort Algorithm
So this is the Merge Sort algorithm as we know it, i am assuming that you already know how merge sort works; i am not going to go into the explanation of how merge sort works in this tutorial but i am only going to stick to the time complexity analysis of merge sort. If you want an in depth explanation of how merge sort works, you should look here https://firstclicklimited.com/tutorials/index.php/2021/04/12/merge-sort-algorithm-with-examples/
Merge Sort Recurrence Relation
So to analyze the time complexity of any recursive function, we need to know the recurrence relation (in other words we will take the individual runtime of the recursive calls in the recursive function and sum them up together to get our time complexity equation or recurrence relation).
So if i say that the time it takes the recursive function MergeSort(l, h) to perform its task is T(n), then i can claim that the time it will take the recursive call left MergeSort(l, mid) and right MergeSort(mid+1, h) is going to be half the time (i.e T(n/2)) it takes MergeSort(l, h) because as we already know, mergesort takes the list and breaks it into two part from the mid each time – it will continue to do this until n becomes 1 at which time we can say that the problem has been broken down from a large problem to a small one – this is the base case. The function which performs the merge after the left and right recursive calls mergesort are done take O(m+n) to perform its task where m is from the start of the list to the middle and n is from the middle+1 of the list to the end – but let’s just simplify it to O(n) since both list are in single array – we will just take it to be n elements as also shown in the image above. Please note that in finding the runtime of each statement in the image above i have ignored that of the if statement because even if i mark it as 1, it will still not affect our result at the end of the day since all constant no matter the value can simply be represented as 1 in the derived equation.
Next we add the runtime of each statement together to get our recurrence relation:
T(n) = 2T(n/2) + n (i have ignored 1 again to simplify things since 1 will not make any difference at the end of the day).
Therefore our recurrence relation is as shown in the image
meaning if n>1 return 2T(n/2) + n (in other words call left MergeSort(l, mid) and afterwards right MergeSort(mid+1, h) then merge merge) otherwise if n=1 return 1. Check the code to confirm: if(l<h) at which point n is still > 1 perform statements inside the if block; once l is equal to h, n becomes 1 (in other words for l to be equal to h means that there is only 1 element now in the list and so n is 1) and the code will just terminate and so lets take some constant value 1 for terminate.
Now we have to solve our recurrence relation to find the time complexity of merge sort. I will use induction or successive substitution method.
T(n) = 2T(n/2) + n ————————————— is our first equation
Since i know what T(n) is, if i know what T(n/2) is, i should be able to solve this equation. T(n/2) will be T(n/2/2) + n which is T(n/4) + n. If i substitute this in our first equation we should get:
T(n) = 2T(n/4) + n + n which becomes T(n) = 2T(n/22) + 2n
T(n) = 2T(n/22) + 2n ————————————— is our second equation
To get our third equation, i need to know what T(n/4) in our second equation is. T(n/4) will be T(n/4/2) + n which is T(n/8) + n. If i substitute this in our first equation we should get:
T(n) = 2T(n/8) + n + 2n which becomes T(n) = 2T(n/23) + 3n
T(n) = 2T(n/23) + 3n ————————————— is our third equation
We know that at some point T(n/2k) will become 1 (i.e our base case) at which point T(n/2k) = 1
Instead of continuing with the substitution we can observe that the equation can be simply written as:
T(n) = 2T(n/2k) + kn (note that i have also written K for 3n because observe that when K value in n/2k was 1, K value after the plus sign was also 1; when K value in n/2k was 2, K after the plus sign was also 2; when K value in n/2k was 3, K after the plus sign was also 3).
When n/2k = 1 means that 2k=n and then k = log2n which can be written as k = logn for short.
So because n/2k = 1, T(n/2k) equals T(1) which becomes 1 because from our recurrence relation when n=0, just return 1.
So our equation becomes: T(n) = 2 * 1 + kn, substitute logn for k and we should get
T(n) = 2 + lognn of course can be re-written as: T(n) = 2 + nlogn
In time complexity analysis, once you’ve gotten the time complexity equation, you then need to work with the highest growing term of the equation and discard any other thing. So for us the terms we have in our equation are 2 and nlogn. 2 cannot be the highest growing term because 2 is only a constant; nlogn is the highest growing term in our equation. The next thing we need to do is with our highest growing term is to strip it of any co-efficient but in our case there is no co-efficient so we simply return nlogn. Therefore the time complexity of Merge Sort has been proven to be O(nlogn).