## Friday, November 23, 2012

### Minimal Truncation

I saw an interview question on CareerCup.

### Problem Description

Given an array A of positive integers. Convert it to a sorted array with minimum cost. The only valid operation are:
1) Decrement with cost = 1
2) Delete an element completely from the array with cost = value of element

### Algorithm Development

An intuitive idea is to use dynamic programming. The analysis is as follows:
1.  find the min and max of the array int[] a in O(n) time, with n being the size of the array
2. maintain a n*(max-min+1) table memorizing minimal cost (MC): MC(i, k) is the minimal cost of transforming the subarray a[i:] with a lower bound k
3.  set up recursion
1.  base case MC(n-1, k) = 0 if k<=a[n-1] or a[n-1] if k>a[n-1]
2.  given the minimal costs of subarray a[i+1:], estimate MC for a[i:]
1.  if k > a[i], a[i] has to be deleted, otherwise the lowerbound is violated, which leads to  MC(i, k) = a[i] + MC(i+1, k) (denoted by r1). The minimal cost of a[i:] with lower bound k is then the cost of deleting a[i] plus the minimal cost of the subsequence a[i+1:] with the same lower bound.
2.  if k <= a[i], it is safe to always reserve a[i], because any truncated subsequence from a[i+1:] can be improved by prepending a possibly truncated portion of a[i]. However, it is unknown how much a[i] should be truncated to achieve optimality. What we know is the optimal sorted array for a[i:] is a truncated a[i] prepended to a subsequence from a[i+1:]. The key is to enumerate all the possible lower bounds for the subarray a[i+1:], which are the range [k, a[i]]. The lower end of the range is from the lower bound of the current sequence, while the upper end of the range is the largest value that may make a difference for the cost. For lower bound larger than a[i], there is always no cost for truncating a[i]. Moreover, MC(i, K1) < MC(i,K2) if K1 < K2. We hence have:
r2) MC(i, k) = min_{k<=j<=a[i]}{MC(i+1, j) + (a[i] - j)}
The minimal cost is then taken as the minimum of all candidate new lower bounds for subarray a[i+1:].
The cost is then the truncation cost (a[i]-j) plus the cost from subarray a[i+1:].
To make the recursion simpler, we can rewrite r2) as:
r3) MC(i, a[i]) = MC(i+1, a[i])
r4) MC(i, j) = min{MC(i, j-1), a[i]-j + MC(i+1, j)} for j from a[i]-1 to min

4. Finally, the minimal entry in the first column MC(0,k) is the minimal cost. If we need to find the truncation, just search in MC column wise bottom up for the first switch of the costs. If MC(0, k)->MC(0,k+1) is the first switch, then k is the maximal lower bound in a[0:]. Truncate a[0] to k, if smaller than k, remove a[0]. Now search right up in MC and similar for remaining columns.

### Code

// time O(n*max); space O(n*max), reducible to O(n) if only minimal cost is needed
int findMinCost(int[] array){
int min, max, n;
min = findMin(array);// feasible in O(n)
max = findMax(array);// feasible in O(n)
n = array.length;

// memory table for estimates
int[][] mc = new int[n][max+1];// wasted a little bit memory; we only use mc[i][min:max]

// initialize last column
for(int k = min; k<=array[n-1]; k++){
mc[n-1][k]=0;
}
for(int k = a[n-1] + 1; k <= max; k++){
mc[n-1][k] = a[n-1];
}

// fill up table column by column backwards
for(int i = n-2; i >= 0; i--){
for(int k = max; k>array[i]; k--){
mc[i][k] = array[i]+mc[i+1][k];
}

mc[i][array[i]] = mc[i+1][array[i]];

for(j = array[i]-1; j>=min; j--){
mc[i][j] = Math.min(mc[i][j-1], array[i]-j+mc[i+1][j]);
}
}

return mc[0][min];
}

/**
* Time: O(n+max)
*/
Integer[] extractTruncatedArray(int[] array, int[][] mc, int min, int max){
ArrayList list = new ArrayList();
int lowerBound = min;//initial lower bound
for(int i = 0; i  < array.length; i++) {// extract truncated elment at i
// find first switch
while(lowerBound < max && mc[i][lowerBound] == mc[i][lowerBound+1]){
lowerBound++;
}
if(a[i] >= lowerBound){//truncate a[i]