Median of Two Sorted Arrays
Table of contents
Problem
Given two sorted arrays nums1
and nums2
of size m
and n
respectively, return the median of the two sorted arrays.
Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.
Input: nums1 = [1,2], nums2 = [3,4]
Output: 2.50000
Explanation: merged array = [1,2,3,4] and median is (2 + 3) / 2 = 2.5.
Explanation
Let $A$ and $B$ be the two sorted arrays of size $m$ and $n$ respectively.
Let’s assume that $m \leq n$.
Let $x$ be the median of the two sorted arrays.
This $x$ will divide each array into two parts:
\[\begin{gather*} A_{left} \mid A_{right} \\ B_{left} \mid B_{right} \end{gather*}\]where $A_{left}$ and $B_{left}$ consist of respective elements $\le x$, and $A_{right}$ and $B_{right}$ consist of respective elements $\gt x$.
For $x$ to be the median of the two sorted arrays, the following conditions must be satisfied:
- $\max(A_{left}) \le \min(B_{right})$ and $\max(B_{left}) \le \min(A_{right})$
If $m + n$ is odd, then $|A_{left}| + |B_{left}| = |A_{right}| + |B_{right}| + 1$.
If $m + n$ is even, then $|A_{left}| + |B_{left}| = |A_{right}| + |B_{right}|$
Let $i$ be the number of elements in $A_{left}$, and $j$ be the number of elements in $B_{left}$.
The first condition can be rewritten as:
\[a_{i - 1} \le b_j \wedge b_{j - 1} \le a_i\]The notation follows the coding convention of 0-based indexing.
The second condition can be rewritten as:
\[\begin{gather*} |A_{left}| + |B_{left}| = |A_{right}| + |B_{right}| + 1 \\[1em] i + j = (m - i) + (n - j) + 1 \\[1em] 2j = m + n + 1 - 2i \\[1em] j = \frac{m + n + 1}{2} - i \end{gather*}\]Later in code, the integer division truncating (m + n + 1) / 2
takes care of both odd and even cases.
We see that $j$ is predetermined by $i$, so our problem boils down to finding the right $i$ that satisfies the first condition.
Once we find the right $i$ and $j$, we can compute the median $x$ as follows:
\[x = \begin{cases} \max(a_{i-1}, b_{j-1}) & \text{if } m + n \text{ is odd} \\[1em] \operatorname{avg}(\max(a_{i-1}, b_{j-1}), \min(a_{i}, b_{j})) & \text{if } m + n \text{ is even} \end{cases}\]Why $\max(a_{i-1}, b_{j-1})$ for odd $m + n$?
Because we know the left side has one more element which is the median.
Why $\operatorname{avg}(\max(a_{i-1}, b_{j-1}), \min(a_{i}, b_{j}))$ for even $m + n$?
Because we want the average of the two middle elements.
Now we do a binary search on $i$ in the range $[0, m]$, to find the right $i$ that satisfies the conditions.
Solution
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
public double findMedianSortedArrays(int[] A, int[] B) {
int m = A.length;
int n = B.length;
// Because we want to do a binary search on the smaller array
if (m > n) return findMedianSortedArrays(B, A);
// Setting candidate range for i
int iMin = 0;
int iMax = m;
int lenCondition = (m + n + 1) / 2;
while (iMin <= iMax) {
int i = (iMin + iMax) / 2;
int j = lenCondition - i;
int maxALeft = (i < 1) ? Integer.MIN_VALUE : A[i - 1];
int maxBLeft = (j < 1) ? Integer.MIN_VALUE : B[j - 1];
int minARight = (i >= m) ? Integer.MAX_VALUE : A[i];
int minBRight = (j >= n) ? Integer.MAX_VALUE : B[j];
if (maxALeft <= minBRight && maxBLeft <= minARight) {
if ((m + n) % 2 > 0) { // When m + n is odd,
return Math.max(maxALeft, maxBLeft); // Median is at the left
} else { // When m + n is even,
// Median is the average of the two middle elements
return (Math.max(maxALeft, maxBLeft) + Math.min(minARight, minBRight)) / 2.0;
}
} else if (maxALeft > minBRight) { // We overshot our partition for A
iMax = i - 1;
} else { // We undershot our partition for A
iMin = i + 1;
}
}
return 0.0;
}
Everything is explained in the Explanation section, except for the lines 17-20.
maxALeft
: $a_{i-1}$maxBLeft
: $b_{j-1}$minARight
: $a_i$minBRight
: $b_j$
We must note that a partition of $A$ and $B$ by $x$ may be empty.
Take $A = [1, 2]$ and $B = [3, 4]$, where the median $2.5$ is not in the middle of the two arrays.
\[\begin{gather*} A_{left} = [1, 2] \mid A_{right} = [] \\[1em] B_{left} = [] \mid B_{right} = [3, 4] \end{gather*}\]We want $a_{i-1} \le b_j$ and $b_{j-1} \le a_i$, but we can’t compare $b_{j-1}$ and $a_i$ because $B_{left}$ and $A_{right}$ are empty.
That is why we need to turn them into
\[\begin{gather*} A_{left} = [1, 2] \mid A_{right} = [\infty] \\[1em] B_{left} = [-\infty] \mid B_{right} = [3, 4] \end{gather*}\]By setting $a_{i-1}, b_{j-1} = -\infty$ and $a_i, b_j = \infty$, when indices are out of bound.
The complexity is $O(\log(\min(m, n)))$.