algs4.jar
to root directoryxxxxxxxxxx
11$ sudo curl -O "https://algs4.cs.princeton.edu/code/algs4.jar"
xxxxxxxxxx
31$ sudo curl -O "https://algs4.cs.princeton.edu/linux/{javac,java}-{algs4,cos226,coursera}"
2$ sudo chmod 755 {javac,java}-{algs4,cos226,coursera}
3$ sudo mv {javac,java}-{algs4,cos226,coursera} /usr/local/bin
As soon as an Analytic Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will arise — By what course of calculation can these results be arrived at by the machine in the shortest time?
— Charles Babbage (1864)
In Java, we can time the running time of an algorithm by Stopwatch
class, which is a part of the stdlib.jar
. We should use Stopwatch()
for creating a new stopwatch and then call elapsedTime()
to get the time in seconds since the Stopwatch
is created.
xxxxxxxxxx
101public class Stopwatch {
2 private final long start;
3 public Stopwatch() {
4 start = System.currentTimeMillis();
5 }
6 public double elapsedTime() {
7 long now = System.currentTimeMillis();
8 return (now - start) / 1000.0;
9 }
10}
In the 1970s, Donald Knuth introduces the scientific method for understanding computing performance. Nowadays, the scientific method is explained in the following steps,
There are two principles for the scientific method,
Break down waveform of N samples into periodic components. The applications are on DVDs, JPEG files, MRI, astrophysics, etc.
Simulate gravitational interactions among N bodies.
Given N distinct integers, how many triples sum to exactly zero?
For example,
xxxxxxxxxx
111 2 -1 0 -2
The answer should be,
xxxxxxxxxx
312
21 + -1 + 0 = 0
32 + -2 + 0 = 0
xxxxxxxxxx
51for (int i = 0; i < n; i++)
2 for (int j = i + 1; j < n; j++)
3 for (int k = j + 1; k < n; k++)
4 if (a[i] + a[j] + a[k] == 0)
5 count++;
The Grid-search program should be,
x1import edu.princeton.cs.algs4.StdIn;
2import edu.princeton.cs.algs4.StdOut;
3import edu.princeton.cs.algs4.In;
4
5public class ThreeSum {
6
7 private ThreeSum() {
8 }
9
10 public static int count(int[] a) {
11 int n = a.length;
12 int count = 0;
13 for (int i = 0; i < n; i++) {
14 for (int j = i + 1; j < n; j++) {
15 for (int k = j + 1; k < n; k++) {
16 if (a[i] + a[j] + a[k] == 0) {
17 count++;
18 }
19 }
20 }
21 }
22 return count;
23 }
24
25 public static void main(String[] args) {
26 In in = new In(args[0]);
27 int[] a = in.readAllInts();
28
29 Stopwatch timer = new Stopwatch();
30 int count = count(a);
31 StdOut.println("elapsed time = " + timer.elapsedTime());
32 StdOut.println(count);
33 }
34}
We can download the testing files of 1Kint.txt
, 2Kint.txt
, 4Kint.txt
, and 8Kint.txt
(nKint
means we have n thousand integers in that file) from these links. And then we are able to test these files by,
xxxxxxxxxx
41$ java-algs4 ThreeSum 1Kints.txt
2$ java-algs4 ThreeSum 2Kints.txt
3$ java-algs4 ThreeSum 4Kints.txt
4$ java-algs4 ThreeSum 8Kints.txt
The results are,
xxxxxxxxxx
51Name Time Result
21Kints.txt 0.296 70
32Kints.txt 2.679 528
44Kints.txt 22.225 4039
58Kints.txt 170.302 32074
If we plot this result with the standard plot and the log-log plot, we will have the following relationships,
This is because we have,
xxxxxxxxxx
11T(N) = a * N^b
So,
xxxxxxxxxx
11log(T(N)) = blog(N) + c
where,
xxxxxxxxxx
11a = exp(c)
Because we have a slope in this plot of about 3, then we can say,
xxxxxxxxxx
11b = 3
And the order of growth of the running time is about,
xxxxxxxxxx
11N^3
binarySearch
later in this series.xxxxxxxxxx
61for (int i = 0; i < N; i++) {
2 for (int j = i+1; j < N; j++) {
3 int k = Arrays.binarySearch(a, -(a[i] + a[j]));
4 if (k > j) cnt++;
5 }
6}
xxxxxxxxxx
301import java.util.Arrays;
2import edu.princeton.cs.algs4.StdIn;
3import edu.princeton.cs.algs4.StdOut;
4import edu.princeton.cs.algs4.In;
5
6public class ThreeSumFast {
7
8 public static int count(int[] a) {
9 int N = a.length;
10 Arrays.sort(a);
11 int cnt = 0;
12 for (int i = 0; i < N; i++) {
13 for (int j = i+1; j < N; j++) {
14 int k = Arrays.binarySearch(a, -(a[i] + a[j]));
15 if (k > j) cnt++;
16 }
17 }
18 return cnt;
19 }
20
21 public static void main(String[] args) {
22 In in = new In(args[0]);
23 int[] a = in.readAllInts();
24
25 Stopwatch timer = new Stopwatch();
26 int count = count(a);
27 StdOut.println("elapsed time = " + timer.elapsedTime());
28 StdOut.println(count);
29 }
30}
Again, we can simulate the time by,
xxxxxxxxxx
41$ java-algs4 ThreeSumFast 1Kints.txt
2$ java-algs4 ThreeSumFast 2Kints.txt
3$ java-algs4 ThreeSumFast 4Kints.txt
4$ java-algs4 ThreeSumFast 8Kints.txt
The results are,
xxxxxxxxxx
51Name Time Result
21Kints.txt 0.019 70
32Kints.txt 0.090 528
44Kints.txt 0.257 4039
58Kints.txt 1.504 32074
Here, we can spot a big improvement in the time complexity.
The algorithm analysis theory is invented in the book The Art Of Computer Programming by Donald Knuth. It explains the mathematical models for calculating the total running time of a computer program, which should physically be the sum of cost times frequency for all operations.
Now, let’s analyze the cost of a TwoSum
program. In this program, we have,
xxxxxxxxxx
51int count = 0;
2for (int i = 0; i < N; i++)
3 for (int j = i+1; j < N; j++)
4 if (a[i] + a[j] == 0)
5 count++;
Then, the frequency for each operation should be,
xxxxxxxxxx
191// 1 time int count; 1 time int i; N times int j;
2variable declaration N + 2
3
4// 1 time count = 0; 1 time i = 0; N times j = i + 1;
5assignment statement N + 2
6
7// N+1 times i < N; N(N+1)/2 times j < N
8less than compare N+1 + N(N+1)/2 = (N+1)(N+2)/2
9
10// (N-1) + ... + 2 + 1 + 0 times x == 0;
11equal to compare N(N-1)/2
12
13// N(N-1)/2 times a[i]; N(N-1)/2 times a[j];
14array access N(N-1)
15
16// at worst: N(N-1)/2 times count++; N(N-1)/2 times a[i] + a[j];
17// at best: no count++; N(N-1)/2 times a[i] + a[j];
18// ignore i++ and j++;
19increment N(N-1)/2 to N(N-1)
The cost model is to use some basic operations as a proxy for running time and here we can simply choose array access with the cost model N(N-1)
.
The tilde notation ~
is used when we are able to ignore lower order terms, so technically,
When,
Then, for the Brute-Force TwoSum problem, the can be cost model with N inputs can be simplified to N².
xxxxxxxxxx
11N(N-1) ~ N²
The most commonly used notation in algorithm analysis is the big-oh notation O. It also has two friends, big-theta Θ and big-omega Ω.
xxxxxxxxxx
11N² + 2N + 1 ~ Θ(N²)
xxxxxxxxxx
21N² + 2N + 1 ~ O(N²)
22N + 1 ~ O(N²)
xxxxxxxxxx
21N² + 2N + 1 ~ Ω(N²)
2N³ + 2N + 1 ~ Ω(N²)
xxxxxxxxxx
91growth-rate Problem Size
21 any
3logN any
4N 1,000,000,000
5NlogN 100,000,000
6N² 10,000
7N²logN 10,000
8N³ 1,000
92^N 30
131boolean 1
2byte 1
3char 2
4int 4
5float 4
6long 8
7double 8
8char[] 2N + 24
9int[] 4N + 24
10double[] 8N + 24
11char[][] ~ 2 M N
12int[][] ~ 4 M N
13double[][] ~ 8 M N