Monday, October 27, 2008

Synchronising two directory trees - tree-sync 2.4

tree-sync.pl has had a little update thanks to a contribution from Dave Stafford.
  • Added option 'diff' to view only changes

  • Added option 'ignore' to exclude certain extensions

  • Added option 'brief' to remove src/dest directories in report view to make cleaner output

In the 2.4 release I've also added the 'force' option to control read-only file handling.

tree-sync.pl is a bit of a geeky tool, suitable for cases where you want detailed control over a file-based copy/mirror/backup process from the command-line. There are many flashier solutions available these days, but if tree-sync scratches your itch - go for it!

tree-sync.pl continues to be available on CPAN, but from the 2.4 release I have moved the source to github. If you have changes to suggest or bugs to fix, please go ahead and join this project.

Tuesday, October 21, 2008

Rolling Project Euler on Ruby

I first heard about Project Euler last week on the stackoverflow podcast. Michael Pryor (fogcreek co-founder) makes a quick side reference in discussion with Joel Spolsky, Jeff Atwood and the rest of the SO team.

Well I checked it out last week, got hooked and spent most of the weekend earning my "level 1" badge;-)

Aside from dusting off some long-forgotten and arcane knowledge from my youth, I found it a fantastic opportunity to stretch my fundamental ruby chops. In fact, I'd recommend a few questions at Project Euler as a right-of-passage whenever you are learning a new programming language.

I've only been using ruby for a year or so, and in that time thought I had picked up a fair bit. But I was still able to amaze myself at just how many of the problems were knocked over in just 1 or 2 lines with a bit of duck punching and liberal use of blocks with Enumerables.

I'm late to the Project Euler craze, so you will already find many people posting hints for specific questions if you just google. I thought I'd share some of the "common code" I've been building up as I go through the questions.

I put a recent copy of the source up on github for anyone who is interested (MyMath.rb), but what follows a sampling of some of the more interesting pieces.

First thing you will note is that I have written all these "common" routines as extensions to some of the fundamental classes in the ruby library: Integer, Array, String.

It doesn't have to be this way, and for less trivial activities you might be right to be concerned about messing with the behaviour of the standard classes. Maybe I'm still enjoying my ruby honeymoon period, but I do get a thrill out of being able to write things like
1551.palindrome?
=> true


Integer Extensions


It's just so easy to code up simple calculation and introspection routines..

class Integer
# @see project euler #15,20,34
def factorial
(2..self).inject(1) { |prod, n| prod * n }
end

# sum of digits in the number, expressed as a decimal
# @see project euler #16, 20
def sum_digits
self.to_s.split('').inject(0) { |memo, c| memo + c.to_i }
end

# num of digits in the number, expressed as a decimal
# @see project euler #25
def num_digits
self.to_s.length
end

# tests if all the base10 digits in the number are odd
# @see project euler #35
def all_digits_odd?
self.to_s.split('').inject(0) { |memo, s| memo + ( s.to_i%2==0 ? 1 : 0 ) } == 0
end

# generates triangle number for this integer
# http://en.wikipedia.org/wiki/Triangle_number
# @see project euler #42
def triangle
self * ( self + 1 ) / 2
end
end


Prime numbers feature heavily on Project Euler, and I think calculating a prime series was my first lesson on why you can't brute-force everything;-) Enter the Sieve of Eratosthenes and related goodness..
class Integer 
# http://en.wikipedia.org/wiki/Prime_factor
# @see project euler #12
def prime_factors
primes = Array.new
d = 2
n = self
while n > 1
if n%d==0
primes << d
n/=d
else
d+=1
end
end
primes
end

# http://en.wikipedia.org/wiki/Divisor_function
# @see project euler #12
def divisor_count
primes = self.prime_factors
primes.uniq.inject(1) { |memo, p| memo * ( ( primes.find_all {|i| i == p} ).length + 1) }
end

#
# @see project euler #12, 21, 23
def divisors
d = Array.new
(1..self-1).each { |n| d << n if self % n == 0 }
d
end

# @see project euler #
def prime?
divisors.length == 1 # this is a brute force check
end

# prime series up to this limit, using Sieve of Eratosthenes method
# http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
# @see project euler #7, 10, 35
def prime_series
t = self
limit = Math.sqrt(t)
a = (2..t).to_a
n = 2
while (n < limit) do
x = n*2
begin
a[x-2]=2
x+=n
end until (x > t )
begin
n+=1
end until ( a[n-2] != 2 )
end
a.uniq!
end

# @see project euler #23
def perfect?
self == divisors.sum
end

# @see project euler #23
def deficient?
self > divisors.sum
end

# @see project euler #23
def abundant?
self < divisors.sum
end
end


Next we visit the Collatz conjecture and an interesting routine to make numbers "speak english"..
class Integer     
# http://en.wikipedia.org/wiki/Collatz_conjecture
# @see project euler #14
def collatz_series
a = Array.new
a << n = self
while n > 1
if n % 2 == 0
n /= 2
else
n = 3*n + 1
end
a << n
end
a
end

# express integer as an english phrase
# @see project euler #17
def speak
case
when self <20
["zero", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten",
"eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen" ][self]
when self > 19 && self < 100
a = ["twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"][self / 10 - 2]
r = self % 10
if r == 0
a
else
a + "-" + r.speak
end
when self > 99 && self < 1000
a = (self / 100).speak + " hundred"
r = self % 100
if r == 0
a
else
a + " and " + r.speak
end
when self > 999 && self < 10000
a = (self / 1000).speak + " thousand"
r = self % 1000
if r == 0
a
else
a + ( r <100 ? " and " : " " ) + r.speak
end
else
self
end
end
end


Calculating integer partitions is one of my favourites ... a nice, super-fast recursive algorithm. For problems like "how many ways to make $2 in change?"
class Integer 

# calculates integer partitions for given number using array of elements
# http://en.wikipedia.org/wiki/Integer_partition
# @see project euler #31
def integer_partitions(pArray, p=0)
if p==pArray.length-1
1
else
self >= 0 ? (self - pArray[p]).integer_partitions(pArray ,p) + self.integer_partitions(pArray,p+1) : 0
end
end
end


Finally, rotations and palindromes (base 2 or 10): methods that rely on some underlying String routines that come later...
class Integer 
# returns an array of all the base10 digit rotations of the number
# @see project euler #35
def rotations
self.to_s.rotations.collect { |s| s.to_i }
end

# @see project euler #4, 36, 91
def palindrome?(base = 10)
case base
when 2
sprintf("%0b",self).palindrome?
else
self.to_s.palindrome?
end
end
end


Array Manipulations


Array handling is particularly important. Start with some simple helpers, then move onto greatest common factor and a couple of least-common multiple implementations. My favourite here - lexicographic permutations.
class Array

# sum elements in the array
def sum
self.inject(0) { |sum, n| sum + n }
end

# sum of squares for elements in the array
# @see project euler #6
def sum_of_squares
self.inject(0) { |sos, n| sos + n**2 }
end

# @see project euler #17
def square_of_sum
( self.inject(0) { |sum, n| sum + n } ) ** 2
end

# index of the smallest item in the array
def index_of_smallest
value, index = self.first, 0
self.each_with_index {| obj, i | value, index = obj, i if obj<value }
index
end

# removes numbers from the array that are factors of other elements in the array
# @see project euler #5
def remove_factors
a=Array.new
self.each do | x |
a << x if 0 == ( self.inject(0) { | memo, y | memo + (x!=y && y%x==0 ? 1 : 0) } )
end
a
end

# http://utilitymill.com/edit/GCF_and_LCM_Calculator
# @see project euler #5
def GCF
t_val = self[0]
for cnt in 0...self.length-1
num1 = t_val
num2 = self[cnt+1]
num1,num2=num2,num1 if num1 < num2
while num1 - num2 > 0
num3 = num1 - num2
num1 = [num2,num3].max
num2 = [num2,num3].min
end
t_val = num1
end
t_val
end

# http://utilitymill.com/edit/GCF_and_LCM_Calculator
# @see project euler #5
def LCM
a=self.remove_factors
t_val = a[0]
for cnt in 0...a.length-1
num1 = t_val
num2 = a[cnt+1]
tmp = [num1,num2].GCF
t_val = tmp * num1/tmp * num2/tmp
end
t_val
end

# brute force method:
# http://www.cut-the-knot.org/Curriculum/Arithmetic/LCM.shtml
# @see project euler #5
def lcm2
a=self.remove_factors
c=a.dup
while c.uniq.length>1
index = c.index_of_smallest
c[index]+=a[index]
end
c.first
end

# returns the kth Lexicographical permutation of the elements in the array
# http://en.wikipedia.org/wiki/Permutation#Lexicographical_order_generation
# @see project euler #24
def lexicographic_permutation(k)
k -= 1
s = self.dup
n = s.length
n_less_1_factorial = (n - 1).factorial # compute (n - 1)!

(1..n-1).each do |j|
tempj = (k / n_less_1_factorial) % (n + 1 - j)

s[j-1..j+tempj-1]=s[j+tempj-1,1]+s[j-1..j+tempj-2] unless tempj==0
n_less_1_factorial = n_less_1_factorial / (n- j)
end
s
end

# returns ordered array of all the lexicographic permutations of the elements in the array
# http://en.wikipedia.org/wiki/Permutation#Lexicographical_order_generation
# @see project euler #24
def lexicographic_permutations
a=Array.new
(1..self.length.factorial).each { |i| a << self.lexicographic_permutation(i) }
a
end

end


String Helpers


Last but not least, some String methods that just make things so much easier...
class String

# sum of digits in the number
# @see project euler #16, 20
def sum_digits
self.split('').inject(0) { |memo, c| memo + c.to_i }
end

# product of digits in the number
# @see project euler #8
def product_digits
self.split('').inject(1) { |memo, c| memo * c.to_i }
end

#
# @see project euler #4, 36, 91
def palindrome?
self==self.reverse
end

# returns an array of all the character rotations of the string
# @see project euler #35
def rotations
s = self
rots = Array[s]
(1..s.length-1).each do |i|
s=s[1..s.length-1]+s[0,1]
rots << s
end
rots
end

end


With all the above in place - and with the aid of a few brain cells - some deceptively complicated questions (like "How many different ways can £2 be made using any number of coins?") are essentially one-liners:
require 'benchmark'
require 'MyMath'

Benchmark.bm do |r|
r.report {
answer = 200.integer_partitions([200,100,50,20,10,5,2,1])
}
end

Love it;-)

Tuesday, October 07, 2008

Best Oracle OpenWorld 2008 quote so far..

WSJ blogger Ben Worthen on the HP Oracle Exadata Storage Server:
Oracle now has a device that is kind of like an iPod except that it is a lot bigger and a lot more expensive and not as cool.


"... priced at US$4,000 per terabyte of storage, plus the database license costs" Oh my.

Sunday, October 05, 2008

On context paths in Java EE

I was recently involved (tangentially) in an exercise to migrate a tomcat-based JSP application to Java EE packaging, which had me taking a fresh look at the concept of context paths and considering best practices for handling them.

When you package and deploy a Java EE web application, it has a context-root which effectively becomes the path on your application server that the application is available under. For example, the following module definition would setup MyApp to be available under http://server.domain/myapp/
<module>
<web>
<web-uri>MyApp.war</web-uri>
<context-root>myapp</context-root>
</web>
</module>

Context paths make it possible to host many applications on the one server as long as you keep the paths unique. As advised in Build to Spec! Part II:
Always specify a unique context root for every Web application you deploy to avoid naming collisions with applications already deployed.

It is possible to install an application with a context root of "/" but there are two considerations to bear in mind:

  1. Applications servers will usually have a default application already installed under "/" which would need to be removed first.

  2. The reason why the default application exists is to host resources that are not packaged as a web application (which may or may not be a concern, depending on what you are serving).


Problems arise when applications are coded with implicit assumption or explicit reference to the context path they will run under.

This is often the case - as I discovered - when migrating simple JSP applications to Jave EE packaging. Either the app assumes it will run from "/", or it has hard-coded paths under the root.

It can also occur in applications designed to be packaged as a .war or .ear file, if the developer assumes that the context-path will remain the same and does some hard-coded shortcuts. This breaks the Java EE separation of duties design, and prevents the system administrator from chosing to deploy the application on another context path (if, for example, there is a conflict with another application).

What the Specs Say


The JSR 53: JavaTM Servlet 2.3 and JavaServer PagesTM 1.2 Specifications define the context path concept, and some relevant API features.

Firstly there is getContextPath() which allows you to obtain the context path.

There's always been some doubt as to how sendRedirect() should behave though, but now that is cleared up in the 2.3 spec. From Servlet 2.3: New features exposed
And finally, after a lengthy debate by a group of experts, Servlet API 2.3 has clarified once and for all exactly what happens on a res.sendRedirect("/index.html") call for a servlet executing within a non-root context. The issue is that Servlet API 2.2 requires an incomplete path like "/index.html" to be translated by the servlet container into a complete path, but doesn't say how context paths are handled. If the servlet making the call is in a context at the path "/contextpath," should the redirect URI translate relative to the container root (http://server:port/index.html) or the context root (http://server:port/contextpath/index.html)? For maximum portability, it's imperative to define the behavior; after lengthy debate, the experts chose to translate relative to the container root. For those who want context relative, you can prepend the output from getContextPath() to your URI.


How to Retrofit Correct Context Path Handling


Migrating an application to Java EE packaging can be a it of a nightmare if url references are not nicely relative, and avoid any assumptions about the full path to the application.

Obviously, in this situation you probably can't avoid going in to clean up the code at some point.

But there are some tricks that can be used to delay that activity.

I've experimented with using servlet filters to do rewrites on the oubound HTML, and that seems to work fine. The filter intercepts all the output of the application, and can be used to fixup text/html or css using regex replacements, and even change the sendRedirect behaviour if desired. But it does introduce some overhead, and I wouldn't see it as a permanent solution. (see EnforceContextRootFilter-1.0-src.zip for all the lowdown on the servlet filter approach, including code you can use and adapt if it meets your needs)

Saturday, October 04, 2008

restful_authentication and OpenID with Rails 2

Note to self: Yes! managed to navigate the various OpenID resources for rails and managed to successfully setup OpenID with restful_authentication on Rails 2.1.

There are a few tricks to be aware of. Prompted by a question on stackoverflow.com, I thought I would post my notes here.

Installing restful_authentication


Usually, you will be plugging open_id_authentication into your app's identity management framework. One of the most popular - and the one I've been using of late - is restful_authentication.

Getting restful_authentication working seems to require a few tweaks however - at least for the latest version found up on github.

Firstly, I've had no success directly installing it using the plugin script like this:
ruby script/plugin install git://github.com/technoweenie/restful-authentication.git

Rather, I've had to manually clone the github repository and manually setup the plugin directory:
$ git clone git://github.com/technoweenie/restful-authentication.git
$ mv restful-authentication/ myapp/vendor/plugins/restful_authentication
$ rm vendor/plugins/restful_authentication/.gitignore
$ rm -fR vendor/plugins/restful_authentication/.git

Note that I'm cleaning up some of the git bits, and also changing the plugin directory name to use an underscore rather than a hyphen (Rails usually complains to me otherwise).

Setting up restful_authentication is then pretty straight-forward:
$ mkdir lib
$ ruby script/generate authenticated user sessions
$ rake db:migrate

Note: if you don't already have the 'lib' directory in your application, it must be created first, otherwise the generator will fail.

Installing open_id_authentication


Ryan Bates' Railscast on Openid is the best thing I've found to follow. Even though it was recorded with Rails 1.2.3, I've been able to successfully follow the tutorial with Rails 2.1.0. The only point to note is that for:
gem install ruby-openid

I installed 2.1.2, rather than the 1.1.4 used in railscast.

Installing open_id_authentication is then a doddle.
$ ruby script/plugin install open_id_authentication

Follow the Railscast from this point to integrate OpenID with restful_authentication.

Using restful_authentication on heroku


The plugin installation problems mentioned above also mean that you need to use a few tricks to get it working in a heroku-hosted application. I've found it best to clone your heroku app and add the restful_authentication plugin locally, and then git push it back to heroku when done.