Elegant Coding: O(log(n))

I find O(log(n)) to be a very interesting algorithm complexity, it’s also highly desirable and if you achieve it you are probably doing well, or you are just using a couple of common time tested algorithms and data structures such as Binary Search or [Self] Balanced Binary Trees like Red Black or AVL trees. The order of Big O Complexities is:

O(1)	constant
O(log log n)	Double Logarithmic
O(log n)	Logarithmic
O(n^m) 0 < m < 1	fractional power
O(n)	Linear
O(n log n) = O(log n!)	Loglinear
O(n²)	Quadratic
O(n³)	Cubic
O(n^m) m > 1	(general) polynomial (m is a fixed, non-negative integer; e.g. n⁴, n⁵)
O(mⁿ)	exponential (m >= 2; ; e.g. 2ⁿ, 3ⁿ)
O(n!)	Factorial

When I was growing up and hand held calculators were quickly evolving much like smart phones are now, although perhaps not as rapidly, my father would bring these calculators home and I would play with them, this was my first exposure to logarithms when I got to high school I learned more about them including some of the identities. Actually I find the Logarithmic Identities to be quite interesting and have found them important in understanding logarithms which are useful in understanding other areas of science and math. I even used to use log base 10 to calculate the number of digits of numbers for print formatting way back before we had better ways to do number formatting^†, ⌈log₁₀(n)⌉, of any base 10 number is its number of digits, where ⌈x⌉ is the ceiling function. Also many common measurements like the Richter Scale, pH and Decibel among other things are logarithmic scale. Also we have previously encountered logs in relation to information entropy . Some useful logarithmic identities are:

y = log_b(x) if and only if x = b^y
log_b(1) = 0
log_b(b) = 1
-log_b(x) = log_b(1/x)
log_b(x*y) = log_b(x) + log_b(y)
log_b(x/y) = log_b(x) - log_b(y)
log_b(xⁿ) = n log_b(x)
log_b(x) = log_b(c) * log_c(x)
log_b(x) = log_c(x) / log_c(b)

The first one demonstrates the relation between the functions of logarithm and exponentiation, as you can see the value y, which is log base b of the value x, is the exponent to raise b to get the value x, so a logarithm is just a way to get the exponent. Log base 10 of 100 is 2, log₁₀(100) = 2, since 10 to second power is 100, 10² = 100. Log base b of y, x = log_b(y) is inverse function of raising b to the y power, x=b^y, so:

b^log_b(x) = x

Another interesting observation that one can draw from the logarithmic identities is the answer to the question: What is the difference between O(log₂(x)) and O(log₁₀(x))? It’s a little trick question that I like. The answer is they are the same. Using the second to last identity (#8):

O(log₂(x)) = O(log₂(10) * log₁₀(x))

Since log₂(10) is a constant:

O(log₂(x)) = O(log₁₀(x))

Pretty cool!

This would also work with the second to last identity (#9) as well and remember that Big O notation is about [asymptotic] growth, so equals in the case of Big O notation is not the same as equals e.g. log₂(x) ≠ log₁₀(x). Also for small quantities like the examples in this article, the base does make a difference.

Simlarily exponetiation has its identities as well:

b⁰ = 1
b¹ = b
b^-n = 1/bⁿ
b^mbⁿ= b^m+n
b^m/bⁿ= b^m-n
(b^m)ⁿ = b^mn
(b/b)ⁿ = bⁿ/bⁿ
(b/a)^-n = (a/b)ⁿ

The binary tree structure (a rooted acyclic graph) visually illustrates O(log(n)) and the inverse function relationship between logarithms and exponentiation. The diagram above shows a complete^‡ and therefore balanced binary tree structure. As you can see the number of items in each row grows exponentially as powers of two, shown in red, also the total number of elements in the tree as each row is added grows in the same exponential fashion, denoted in square braces. So in a balanced tree you will fill all of the n rows up to the 2ⁿ – 1 item, and when that number is exceeded (greater than or equal to 2ⁿ) a new (n + 1) row will be added. Now the growth is not necessarily exponential, you can add items at any rate, but the structure in which items are stored are broken down along “exponential row boundaries”. So to search 2⁴ – 1 (15) items for the number 7 value we would traverse 8-4-6-7, shown in green, which is 4 nodes, this is the maximum searches for this tree, a search can also be 1,2, or 3 depending on the depth of the item. Since 4 the exact exponent of the total size of the graph, which is [2⁴-1], and therefore the log: [log₂(2⁴-1) = 3.9…], almost 4, our max traversal, O(log(2ⁿ-1)) = O(log(2ⁿ)). Since we know that these two functions are inverses this illustrates O(log(n)) in visual terms.

The Binary Search algorithm has similar log base 2 characteristics, if you remember the algorithm it takes list which has to be ordered, and it repeatedly halves the length of the remaining items and checks the element at each new position, if it is equal you are done, if it is not then depending on whether the search value is greater or less than the midpoint value you then half the bottom or top of the list respectively and repeat the process. This is better illustrated by the following Java code:

public static int binarySearch(int value, int[] array) {

int min = 0;

int max = array.length - 1;

while (min <= max) {

int mid = min + (max - min) / 2;

if (value < array[mid])

max = mid - 1;

else if (value > array[mid])

min = mid + 1;

else

return mid;

}

return -1;

}

The following recursive code example is both more elegant and gives a more intuitive feel for the Divide and Conquer nature of the algorithm. Also it is this Divide and Conquer behavior that breaks the list apart logarithmically.

public static int binarySearch(int value, int[] array) {

returnbinarySearch(value, array, 0, array.length – 1);

}

public static intbinarySearch(int value, int[] array, int min, int max) {

if (max < min)

return -1;

int mid = min + (max - min) / 2;

if (array[mid] > value)

return binarySearch(value, array, min, mid - 1);

else if (key > array[mid])

return binarySearch(value, array, mid + 1, max);

return mid;

}

Let's view a search visually:

In our search for the value 7 in our 16 item list we first go to 8, which is too large, then to 4 which is too small and then to 6, and then 7, at each point in our search the size of what remains to be searched is half of the size of the previous search space.

In our visual example needed to search 4 positions (8, 4, 6, 7) and log₂(16) = 4 which is O(log(n)).

An interesting side note is that many binary search algorithms are broken due to possible overflow problems with integers, as pointed out in a blog post by Joshua Bloch, I account for the issue, you can read more about it here.

^† Due to computing a logarithm, this may not be optimally efficient.

^‡ Complete in this context means that all positions up to and including 2ⁿ-1 positions are filled, i.e. have a node. Also it should be noted that Self Balancing algorithms may not yield such a well balanced tree and it might have more rows for 15 items, but it is still O(log(n)).

Elegant Coding

18 October 2011

O(log(n))

No comments:

Post a Comment

About Me