C++ Notes: Algorithms: Big-Oh Notation
How resource use grows as the problem becomes larger
It's useful to estimate the cpu or memory resources an algorithm
requires. "Complexity analysis" attempts to determine
the relationship between the number of data elements and resource
usage (time or space) with a simple formula. This can be useful if you are testing a program
with a small amount of data, but later production runs will use
large data sets.
Dominant Term
Big-oh (the "O" stands for "order of") notation is concerned with what happens for very
large values of N, therefore only the largest term in the formula is needed.
For example, the number of operations in some sorts is N2 - N.
For large values, N is insignificant compared to N2,
therefore this is described as an O(N2) algorithm, and the N term is ignored.
Why it Matters
Here is a table of typical cases. Logarithms to base 2 (as used here)
are proportional to logarithms in other base, so this doesn't affect the big-oh formula.
| constant | logarithmic | linear | | quadratic | cubic |
n | O(1) |
O(log N) |
O(N) |
O(N log N) |
O(N2)
| O(N3) |
1 | 1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 2 | 2 | 4 | 8 |
4 | 1 | 2 | 4 | 8 | 16 | 64 |
8 | 1 | 3 | 8 | 24 | 64 | 512 |
16 | 1 | 4 | 16 | 64 | 256 | 4,096 |
1024 | 1 | 10 | 1,024 | 10,240 | 1,048,576 | 1,073,741,824 |
Best, worst, and average cases
You should always be clear about which cases big-oh notation describes.
The characteristics for best, worst, and average cases can be
very different.
Why big-oh notation isn't always useful
Complexity analysis can be very useful, but there are problems with it too.
- Too hard to analyze. Many algorithms are simply too hard to analyze
mathematically.
- Average case unknown. There may not be sufficient information to know what the
most important "average" case really is, therefore analysis is impossible.
- Unknown constant. Both walking and traveling at the speed of light
have a time-as-function-of-distance big-oh complexity of O(N).
Altho they have the same big-oh characteristics, one is rather faster than the
other. Big-oh analysis only tells you how it grows with the size of the
problem, not how efficient it is.
- Small data sets. If there are no large
amounts of data, algorithm efficiency may not be important.
Benchmarks are better
Big-oh notation can give some very good ideas about performance for
large amounts of data, but the only real way to know for sure is
to actually try it with large data sets. There may be performance
issues that are not taken into account by big-oh notation, eg,
the effect on paging as virtual memory usage grows.