Thursday, 14 March 2013

Intersecting Discs Problem in Ruby

So I was looking at the Codility sample problems and came across the following problem.

Given an array A of N integers we draw N discs in a 2D plane, such that i-th disc has center in (0,i) and a radius A[i]. We say that k-th disc and j-th disc intersect, if
and k-th and j-th discs have at least one common point.
Write a function
int number_of_disc_intersections(int[] A);
which given an array A describing N discs as explained above, returns the number of pairs of intersecting discs. For example, given N=6 and
A[0] = 1
A[1] = 5
A[2] = 2
A[3] = 1
A[4] = 4
A[5] = 0
there are 11 pairs of intersecting discs: 0th and 1st 0th and 2nd 0th and 4th 1st and 2nd 1st and 3rd 1st and 4th 1st and 5th 2nd and 3rd 2nd and 4th 3rd and 4th 4th and 5th
so the function should return 11. The function should return -1 if the number of intersecting pairs exceeds 10,000,000. The function may assume that N does not exceed 10,000,000.
The trick here is to do a solution that works in O(N log(N)) time and O(N) space.
There was a ruby solution posted here by "user691307" as follows
 def number_of_disc_intersections(a)  
   event ={|h,k| h[k] = {:start => 0, :stop => 0}}  
   a.each_index {|i|  
     event[i - a[i]][:start] += 1  
     event[i + a[i]][:stop ] += 1  
   sorted_events = (event.sort_by {|index, value| index}).map! {|n| n[1]}
   past_start = 0  
   intersect = 0  
   sorted_events.each {|e|  
     intersect += e[:start] * (e[:start]-1) / 2 +  
            e[:start] * past_start  
     past_start += e[:start]  
     past_start -= e[:stop]  
   return intersect  

Which is a beautiful solution and runs in O(N log(N)) time and O(N) space, as we wanted. The only problem is it is very difficult to understand what it is doing and why. So in this blog post, I hope to explain the above code.
The algorithm is as follows. Imagine the initial array that we are passed in spans out forever on each side. Instead of the individual numbers we are passed in, we want to know how many discs start and stop at each position. Once we can say, for example, at index 4, 2 discs start and 1 disc ends - and do that for every index - we are able to calculate how many intersections there are.
You can see in the first 5 lines of the function it creates a new hash where the value is defaulted to another hash that set start and stop to zero. This allows us to increment using "+= 1" rather than having to check for nils. It then populates the event hash by iterating over the passed in array (a) and incrementing the start value at the lower bounds of the disc, and the stop at the upper bounds of the disc.
sorted_events = (event.sort_by {|index, value| index}).map! {|n| n[1]}

The 6th line of the function looks complicated but all it is doing is sorting by the key of the event hash (the index) and then discarding the keys and just using the values of the hash. But we can't use ".values" on the result of the sort_by, because it returns an array (hashes can't be sorted)

 past_start = 0   
 intersect = 0   
 sorted_events.each {|e|   
  intersect += e[:start] * (e[:start]-1) / 2 +   
     e[:start] * past_start   
  past_start += e[:start]   
  past_start -= e[:stop]  

Now for the really tricky part. We iterate over our sorted array of hashes which tell us how many discs start and stop at each position. There are two cases we need to worry about:
1) how many discs are already going when the new disc starts (e[:start] * past_start)
2) if multiple discs start at the same position (e[:start] * (e[:start]-1) / 2)
The first condition is fairly simple. If there are 3 discs that we're in the middle of and a new disc starts, it intersects with all 3 of them. If there are 3 discs that we're in the middle of, and 2 new discs start, each of them will intersect with the 3 existing ones, making 6 intersections.
The two lines at the bottom keep track of how many discs are currently open at each index.
The second condition is a little trickier. If multiple discs start at the same location we need to use some maths. It's the same problem you probably did in school: If 5 people all shake hands with each other, how many handshakes have there been?
To do this, because it is groups of 2, you have to pick 2. So on the first pick, there are 5 options, then once you've picked that first one, there are only 4 options. So 5 x 4 = 20. But, because when person A shakes person B's hand, it's the same as when person B shakes person A's hand, we have to divide our result by 2, so as not to double count.
This is the same with the intersections. If there are 5 new discs starting at a position, the number of intersections between them is equal to 5x4/2 = 10.

So that's it. The number of intersections at each index is n*(n-1)/2 - for if multiple discs start at that index - plus n*past_starts for when discs are already in progress when we start a new one. Add them all together and you have your solution.
I hope this has made sense to people. If not, please leave a comment and I'll try to address it asap.

Wednesday, 13 March 2013

Finding the Longest Palindrome in Ruby

Non-optimal Solution - best case O(N), worst case O(N^2)

This is a non optimal (see below for optimal solution), but much easier to understand solution.
It uses the following algorithm. At a position of the string, it compares the character at the left and right to see if they're the same. If they are, it looks one character further out on each side and compares them. And so on until it reaches an end of the string or finds the two characters are different. 
It does this at each position of the string, finding the longest palindrome centered around each position, keeping track of which is longest overall.

 def find_longest_palindrome(s)  
  longest = ""  
  0.upto s.length do |i|  
   odd = palindrome_at_position(s, i, i)  
   longest = odd if odd.length > longest.size  
   # handle even length palindromes  
   even = palindrome_at_position(s, i, i+1)  
   longest = even if even.length > longest.size  
 def palindrome_at_position(s, left, right)  
  palindrome = ""  
  while (right < s.length &&  
      left >= 0 &&  
      s[left] == s[right])  
   palindrome = s[left,(right-left+1)]  
   left -= 1  
   right += 1  

Optimal Solution - Linear time O(n)

Here is a linear time solution. Guaranteed to be 2N or fewer comparisons.
This solution is similar to the first solution but it uses findings from previous palindrome length calculations to reduce how much processing it has to do.
It takes advantage of the fact that if you know there is a large palindrome centered around one character, the characters to the immediate right of that character must be at the center of palindromes at least as long as the palindromes centered around characters to the immediate left.
When you're looking at the 'b' at index 3, you calculate that it's the center of a palindrome 7 characters long (spanning out 3 characters in each direction).
So when you go to calculate the palindrome centered around index 4 (the second 'c'), you know that it must be at least as long as the palindrome you've already calculated at index 2 (the first 'c') which is 3 characters long. 
And seeing that those 3 characters don't take you past the end of the palindrome centered around 'b' at index 3, you know that it can only be 3 characters long.

And similarly when you look at index 5 (the third 'b'), you know that its palindrome must be the same as the palindrome at index 1 (the first 'b'), which is only 1 character long.

 def longest_palindrome(str)  
  palindrome_lengths = {}  
  center = right = 0  
  # This gsub changes a string "abc" into "~a~b~c~" 
  # This is done to avoid the problem of even length palindromes    
  processed_str = str.gsub(/(\w|$)/, '~\1')  
  0.upto (processed_str.length - 1) do |i|  
    i_mirror = center - (i - center)  
    if (right > i)  
      palindrome_lengths[i] = [right-i, palindrome_lengths[i_mirror]].min  
      palindrome_lengths[i] = 0  
    while (processed_str[i + 1 + palindrome_lengths[i]] == processed_str[i - 1 - palindrome_lengths[i]])
      palindrome_lengths[i] += 1
    if (i + palindrome_lengths[i] > right)  
      center = i;  
      right = center + palindrome_lengths[i];  
  max = palindrome_lengths.values.max  
  center_index = palindrome_lengths.key(max)  
  str[(center_index - max)/2, max]