Wednesday, November 28, 2012

Fast Exponentiation

Computing the power of a number is quite common. A simple form appears frequently in coding interviews is to compute the power of integers.
Problem: given two integers n and k, compute the number \( n^k \).
One well-known algorithm is to perform exponentiation by repeated squaring. I saw the following pseudo code in Udi Manber's book:
Input: n and k 
Output: P
P := n;
j := k;
while j > 1 do
  P := P*P;
  if j mod 2 = 1 then
    P := P*n;
  j = j div 2;
The code is wrong, as can be seen by testing k = 5, n = 2. The above program produces 16, while 32 is the expected result. The simplest implementation of the repeated squaring is to use recursion, which is difficult to be wrong:
long power(int n, int k) {
  assert n > 0 && k > 0;
  if (k == 1) return n;

  if (k & 1 == 1) { // odd
    return n * power(n, k-1);
  else {
    long half = power(n, k/2);
    return half*half;
The pseudo code we presented at the beginning is iterative, and hence more efficient. However, as no correct loop invariant is maintained in it, it fails to compute the desired value. A simple intuition to compute the \(n^k\) is to scan the exponent k bit-wise from low to high. Every time we check whether the current bit i is 1, if so, we multiply the corresponding \(2^i\). There are all together \(log_2(k)\) bits.
long power_iterative(int n, int k) {
  assert n > 0 && k > 0;
  int b = k; // high bits to be processed
  long result = 1;
  long base = n;
  while (b != 0) {
    // invariant: result*(base^b) == n^k
    if (b & 1 == 1) {
      result *= base;
    base *= base;
    b >>= 1;
  return result;
If the arguments are not guaranteed to be positive, we have to take extra care.
double power(int n, int k) {
  if (n == 0 && k <= 0) 
    throw new RuntimeException("For 0, only positive exponent allowed!");
  if (k == 0) return 1;
  if (n == 0) return 0; 

  int sign = 1; 
  if (n < 0) {
    sign = (k & 1) == 1 ? -1 : 1;
    n = 0 - n;

  long result = sign;
  result *= power_iterative(n, Math.abs(k));
  return k > 0 ? result : 1.0/result;
One application of fast exponentiation is to compute Fibonacci number. We recall that the Fibonacci number can be expressed in a matrix form: $$ \begin{bmatrix} f(n+1) \\ f(n) \end{bmatrix}= \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}\times \begin{bmatrix} f(n) \\ f(n-1) \end{bmatrix} $$ If we denote by A the matrix: \( \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} \), then we know $$\begin{bmatrix} f(n+1) \\ f(n) \end{bmatrix} = A^{n}\times\begin{bmatrix} f(1) \\ f(0) \end{bmatrix}$$
The algorithm is in O(log(n)), while an iterative accumulation algorithm takes O(n).
class Matrix{
  long a, b , c, d;
  Matrix(long i , long j, long k, long l) {a=i; b=j; c=k; d=l;}
  Matrix(Matrix m) {a = m.a; b = m.b; c = m.c; d = m.d;}
Matrix multiply (Matrix n, Matrix m) {
   return new Matrix(n.a*m.a+n.b*m.c, n.a*m.b+n.b*m.d, n.c*m.a+n.d*m.c, n.c*m.b+n.d*m.d);
Matrix fastExponentiation(Matrix m, int k) {
  assert k > 0;
  Matrix base = new Matrix(m);
  Matrix result = new Matrix(1, 0, 0, 1);
  int b = k;
  while (b != 0) {
    if(b & 1 == 1) result = multiply(result, base);
    base = multiply(base, base);
    b >>= 1;
long fibonacci(int n) {
  assert n >= 0;
  if (n == 0 || n== 1) return 1;
  Matrix m = new Matrix(1, 1, 1, 0);
  Matrix power = fastExponentiation(m, n);
  return power.c;

Friday, November 23, 2012

Triangle in Undirected Graph


Given an undirected graph, design a O(V*E) algorithm to detect whether there is a triangle in the graph or not.


The idea of the algorithm comes from Manber's book. The central observation is that in order to form a triangle, say (i, j, k), then the corresponding rows of the first two vertices i and j should have both 1 on the k-th column. We can simply do a linear intersection to check this. However, in addition, we have to check that vertex i appears in the j-th row and vice versa. Using adjacency matrix, the complexity is then O(V^3), with V being the number of vertices. For sparse matrix, a O(V*E) algorithm is more efficient. Instead of using adjacency matrix, we now use adjacency lists. We can do the intersection using a procedure similar to merge-sort. More specifically, we perform the following:
  1. sort all the adjacency lists
  2. for each two pair of vertices
    1. intersects the edges to find common end points
    2. meanwhile check whether the pair is an edge
To see such an algorithm is O(V*E), we have to notice that each adjacency list has been intersected with all other V-1 adjacency lists. If we denote by \(L_i\) the length of the adjacency list incident on node i, then the total intersection time is: $$\Sigma_i (V-1)L_i = (V-1)\Sigma_i L_i = (V-1)E$$ We also notice that the cost of sorting the adjacency lists are: $$\Sigma_i L_ilog(L_i) \leq \Sigma_i L_ilog(E) = ElogE$$ To sum up the complexity is O(ElogE+V*E) = O(V*E).


class Graph{
  List<Integer > [] adjLists; // adjacency lists
  int n; // size of graph

boolean containsTriangle(Graph g) {
  // sort all adjacency lists in arrays
  Integer[][] arrays = new Integer[g.n][];
  for (int i = 0; i < g.n, i++) {
    arrays[i] = g.adjLists[i].toArray(new Integer[0]);
  // iterate all pairs
  for (int i = 0; i < g.n; i++) {
   for (int j = i + 1; j < g.n; j++) {
     // test intersection and connectivity
     int pi = 0;
     int pj = 0;
     boolean intersected = false;
     while (pi < arrays[i].length && pj < arrays[j].length) {
       if (arrays[i][pi] == arrays[j][pj]) {
         intersected = true;
       else if (arrays[i][pi] < arrays[j][pj]) {
       else { 

     // check connectivity by checking whether i is in list of j
     boolean connected = false;
     for (Integer k : arrays[j]) {
       if (k == i) { 
         connected = true;
       else if (k > i) {

     if (intersected && connected) return true; // early return
  return false;

Minimal Truncation

I saw an interview question on CareerCup.

Problem Description

 Given an array A of positive integers. Convert it to a sorted array with minimum cost. The only valid operation are:
1) Decrement with cost = 1
2) Delete an element completely from the array with cost = value of element

Algorithm Development

An intuitive idea is to use dynamic programming. The analysis is as follows:
  1.  find the min and max of the array int[] a in O(n) time, with n being the size of the array
  2. maintain a n*(max-min+1) table memorizing minimal cost (MC): MC(i, k) is the minimal cost of transforming the subarray a[i:] with a lower bound k
  3.  set up recursion
    1.  base case MC(n-1, k) = 0 if k<=a[n-1] or a[n-1] if k>a[n-1]
    2.  given the minimal costs of subarray a[i+1:], estimate MC for a[i:]
      1.  if k > a[i], a[i] has to be deleted, otherwise the lowerbound is violated, which leads to  MC(i, k) = a[i] + MC(i+1, k) (denoted by r1). The minimal cost of a[i:] with lower bound k is then the cost of deleting a[i] plus the minimal cost of the subsequence a[i+1:] with the same lower bound.
      2.  if k <= a[i], it is safe to always reserve a[i], because any truncated subsequence from a[i+1:] can be improved by prepending a possibly truncated portion of a[i]. However, it is unknown how much a[i] should be truncated to achieve optimality. What we know is the optimal sorted array for a[i:] is a truncated a[i] prepended to a subsequence from a[i+1:]. The key is to enumerate all the possible lower bounds for the subarray a[i+1:], which are the range [k, a[i]]. The lower end of the range is from the lower bound of the current sequence, while the upper end of the range is the largest value that may make a difference for the cost. For lower bound larger than a[i], there is always no cost for truncating a[i]. Moreover, MC(i, K1) < MC(i,K2) if K1 < K2. We hence have:
        r2) MC(i, k) = min_{k<=j<=a[i]}{MC(i+1, j) + (a[i] - j)}
        The minimal cost is then taken as the minimum of all candidate new lower bounds for subarray a[i+1:].
        The cost is then the truncation cost (a[i]-j) plus the cost from subarray a[i+1:].
        To make the recursion simpler, we can rewrite r2) as:
        r3) MC(i, a[i]) = MC(i+1, a[i])
        r4) MC(i, j) = min{MC(i, j-1), a[i]-j + MC(i+1, j)} for j from a[i]-1 to min
  4. Finally, the minimal entry in the first column MC(0,k) is the minimal cost. If we need to find the truncation, just search in MC column wise bottom up for the first switch of the costs. If MC(0, k)->MC(0,k+1) is the first switch, then k is the maximal lower bound in a[0:]. Truncate a[0] to k, if smaller than k, remove a[0]. Now search right up in MC and similar for remaining columns.


// time O(n*max); space O(n*max), reducible to O(n) if only minimal cost is needed
int findMinCost(int[] array){
    int min, max, n;
    min = findMin(array);// feasible in O(n)
    max = findMax(array);// feasible in O(n)
    n = array.length;
    // memory table for estimates
    int[][] mc = new int[n][max+1];// wasted a little bit memory; we only use mc[i][min:max]
    // initialize last column
    for(int k = min; k<=array[n-1]; k++){
    for(int k = a[n-1] + 1; k <= max; k++){
        mc[n-1][k] = a[n-1];

    // fill up table column by column backwards
    for(int i = n-2; i >= 0; i--){
        for(int k = max; k>array[i]; k--){
            mc[i][k] = array[i]+mc[i+1][k];
        mc[i][array[i]] = mc[i+1][array[i]];
        for(j = array[i]-1; j>=min; j--){
            mc[i][j] = Math.min(mc[i][j-1], array[i]-j+mc[i+1][j]);
    return mc[0][min];

* Time: O(n+max)
Integer[] extractTruncatedArray(int[] array, int[][] mc, int min, int max){
    ArrayList list = new ArrayList();
    int lowerBound = min;//initial lower bound
    for(int i = 0; i  < array.length; i++) {// extract truncated elment at i
      // find first switch
      while(lowerBound < max && mc[i][lowerBound] == mc[i][lowerBound+1]){
      if(a[i] >= lowerBound){//truncate a[i]
      }//otherwise a[i] is deleted
    return list.toArray(new Integer[0]);

Tuesday, November 20, 2012

Rang Minimum Query

Range minimum query is the problem to answer algorithmically the minimum of an arbitrary range in the form of [lo, hi], given a large integer array a[0:n). Of course without pre-processing, each query is O(n), which is unacceptable if the queries are frequent. Another straightforward way is to memorize the answer to all ranges and perform a table lookup for the answer. Though this leads to an O(1) query answering operation, the pre-processing takes \(O(n^2)\) time. More problematic, the space complexity is also \(O(n^2)\). For an array to an order of \(10^6\), the space consumption already exceeds the memory size of normal commodity machines nowadays. Our goal is derive an algorithm that has a fast enough query answering time and meanwhile is not too expensive regarding pre-processing.

We first identify the API of the problem:
void preprocess(int[] a);

int min(int lo, int hi);

We denote the complexity explicitly into two parts: pre-processing and query answering. For instance, the direct processing without pre-processing is of complexity \(O(1)-O(n)\), whereas the memorizing-all approach is of complexity \(O(n^2)-O(1)\).

1.Disjoint Segment Algorithm \(O(n)-O(\sqrt{n})\)

The first algorithm comes from an interview. The idea is to divide the whole array into segments and maintain a summary (in our case, the minimum) of the segment. When answering a query, we need to gather all the segments overlapping with the query range and return an answer. Because the covered segments are already aggregated during the pre-processing time, the computation is saved. One question comes into mind: how do we determine the length of a segment. We notice that a range may cover several segments completely, and overlap with at most two segments additionally. Therefore, we can optimize the estimated cost of a query, depending on whether we opt for worst case or average case. As for average case, we can simply choose each segment to be of length \(\sqrt{n}\).

int[] segmentMin; 
int segmentLen;

void preprocess(int[] a) { 
  assert a != null && a.length != 0; 
  int len = a.length;
  segmentLen = Math.sqrt(len);

  segmentMin = new int[Math.ceil(len/segmentLen)];

  for (int i = 0; i*segmentLen < len; i++) {
       // invariant: already processed i segments
       int min = a[i*segmentLen];
       int upper = Math.min((i+1)*segmentLen, a.length);
       for (int j = i*segmentLen + 1; j < upper; j++) {
           min = Math.min(min, a[j]);
       // assert min == range_min[i*segmentLen,
       //               (i+1)*segmentLen);
       segmentMin[i] = min;

int min(int lo, int hi) {
  assert hi > = lo;
  int rangeMin = a[lo];

  // assert segmentLen != 0;
  int startSegment = Math.ceil(lo/(float)segmentLen);
  int endSegment = Math.floor(hi/(float)segmentLen);

  // collect summaries from fully covered segments
  for (int i = startSegment; i <  endSegment; i++) {
      rangeMin = Math.min(rangeMin, segmentMin[i];
  if (lo % segmentLen != 0) { // there is a partial overlap
    for (int i = lo; i < segmentLen*startSegment; i++) {
      rangeMin = Math.min(rangeMin, a[i]);
  if (hi % segmentLen != 0) { // partial overlap
    for (int i = endSegment*segmentLen; i <= hi; i++) {
      rangeMin = Math.min(rangeMin, a[i]);

  return rangeMin;

In the preprocessing phase, we scan the whole array. Therefore, the complexity is O(n). Each query will collect at most \(O(\sqrt{n})\) summaries, and scan two segments, each of which is of length \(\sqrt{n}\). Therefore, the query is performed in \(O(\sqrt{n})\) time.

2. Variable Length Summary Algorithm \(O(nlog(n))-O(1)\)

I saw this algorithm in Rujia Liu's book on OI. Compared to the disjoint segment algorithm, this algorithm stores more summaries on overlapping segments. The rise in the number of summaries gives better query answering performance. In one sentence, we store at each index i the range minimum in a segment of length of the form \(2^j\). As there are n positions in the array, and the number of different length options are log(n), we are able to store all the summary in a table of the size nlog(n). When we need to process a query [lo, hi], we first determine the length of the range, say L. Let k be \(\lfloor log(L) \rfloor\), the range minimum is then minimum of the two summaries at positions (lo, k), (hi + 1- L/2, k).
We now give the complete code in Java.
int[][] summary; // table of the form summary[beginIndex][lengthIndex]
int length; // array length
int level; // maximal exponent of a range length

void preprocess(int[] a)  {
  assert a != null && a.length > 0;
  length = a.length;
  int level = Math.ceil(Math.log(length)/Math.log(2));
  summary = new int[length][level+1];
  for (int i = 0; i < length; i++) {
    summary[i][0] = a[i];

  // build summary bottom-up
  for (int l = 1; l < level + 1; l++) {
    // invariant: already built l levels
    for (int i = 0; i < length; i++) {
      summary[i][l] = Math.min(getSummary(i, l-1), getSummary(i + 1<<(l-1), l-1));
// handle overflow index
int getSummary(int pos, int level) {
  if (pos >= length) {
    pos = length - 1;
  return summary[pos][level];

int min(int lo, int hi) {
  assert hi >= lo;
  if (hi == lo) return summary[lo][0]; // a single element
  int rangeLen = hi+1-lo;
  assert rangeLen > 1;
  int halfRangeLevel = Math.ceil(Math.log(rangeLen)/Math.log(2)) - 1;
  assert halfRangeLevel >= 0;
  return Math.min(getSummary(lo, halfRangeLevel),
                  getSummary(hi + 1 - (1 << halfRangelevel), halfRangelevel)
Since each range query only incurs two table lookup operations, we know it is O(1). The crux of O(nlog(n)) preprocessing time lies in the fact we employ a bottom-up dynamic programming computation for the summaries.

3. Hierarchical Summary Algorithm \(O(n)-O(log(n))\)

This algorithm is inspired by an exercise in Udi Manber's book on algorithms. The basic idea is to organize hierarchical summaries in a tree, with the array elements as the leaves. Building the tree bottom-up requires O(n) time and space, while query requires O(log(n)) time to navigate the tree. Different from the disjoint summary, the summaries are overlapping. Furthermore, also different from the variable length summary, we now build summaries in a hierarchy in which a parent summary covers the range of two children. The summaries now form a tree. In order to make the bottom-up building convenient, we use linked lists to memorize nodes on the same level.
class Node {
  int min; // range min
  int lo, hi; // remembering range [lo, hi)
  Node left, right;
  Node(int l, int h, int m, Node left, Node right) {
    lo = l; hi = h; min = m; this.left = left, this.right = right;
  Node (in l, int h, int m) {
    lo = l; hi = h; min = m; 

Node root;

void preprocess(int [] a) {
  assert a != null && a.length > 0;
  int level = Math.ceil(Math.log(a.length)/Math.log(2));
  LinkedList [] nodesByLevel = new LinkedList[level + 1];
  nodesByLevel[0] = new LinkedList();
  for (int i = 0; i < a.length; i++) {
    nodesByLevel[0].add(new Node(i, i+1, a[i]));

  for (int l = 1; l < level + 1; l++) {
    // invariant: already processed l levels
    // invariant: level l summarizes range of length 2^l (except for boundaries)
    LinkedList pre = nodesByLevel[l-1];
    nodesByLevel[l] = new LinkedList(); // current level
    Node first = null, second = null;
    Iterator it = pre.iterator();
    while(it.hasNext()) {
      first =;
      if (it.hasNext()) {
        second =;
      int lo = first.lo;
      int hi = second == null ? first.hi : second.hi;
      int min = second == null ? first.min : Math.min(first.min, second.min);
      nodesByLevel[l].add(new Node(lo, hi, min, first, second));
  root = nodesByLevel[level];

int min(int lo, int hi) {
  return min(root, lo, hi+1);

// internally we use [lo, hi)
int min(Node node, int lo, int hi) [
  if (node == null) return Integer.MAX_VALUE;
  if (!isOverlapping(node, lo, hi)) {
    return Integer.MAX_VALUE;
  else if ( isCovered(node, lo, hi) ) {
    return node.min;

  // assert node range partially overlaps with query range
  // or query range covered by node range
  int leftMin = min(node.left, lo, hi);
  int rightMin = min(node.right, lo, hi);

  return Math.min(leftMin, rightMin);

// is the node range overlapping with [lo, hi)
boolean isOverlapping(Node node, int lo, int hi) {
  assert node != null;
  return node.hi > lo && node.lo < hi;
// Is the range of the node covered by [lo, hi)
boolean isCovered(Node node, int lo, int hi) {
  assert node != null;
  return lo <= node.lo && hi >= node.hi; 
First, the preprocessing is in O(n), because the tree has O(n) nodes, and to create each node, we only need constant time. To see the query is in O(log(n)) is more involved. If a node range is fully covered by a query or disjoint with a query, then it directly returns and incurs no additional cost. The problem arises when a query partially overlaps with both children of a node, which perform two recursions. The main observation is a query can partially overlaps with two children of a node only once, after which only one out of disjoint or fully covered can happen. More intuitively, the query processing navigates the summary tree to find the two ends without exploring internal nodes. As the tree is of height O(log(n)), the query is also O(log(n)).