my recent reads..

Atomic Accidents: A History of Nuclear Meltdowns and Disasters; From the Ozark Mountains to Fukushima
Power Sources and Supplies: World Class Designs
Red Storm Rising
Locked On
Analog Circuits Cookbook
The Teeth Of The Tiger
Sharpe's Gold
Without Remorse
Practical Oscillator Handbook
Red Rabbit

Sunday, June 23, 2013

Design thinking is not rocket science

OH on the ABC Radio National By Design podcast (00:56): In the field: Paul Bennet
For us the idea of small ideas that people can actually connect together and actually implement are very big ideas.

And I'm sure you've heard people describe design thinking as sort of a combination of rocket science, string theory and calculus.

It isn't. It's not rocket science at all. It's actually very very straight-forward.

It's looking in the world, being inspired by people, co-creating with them, prototyping and then iterating. And it has to be impactful. It has to work.

Writing simple ruby utilities for Google IMAP + OAuth 2.0

(blogarhythm ~ Unpretty/Fanmail: TLC)

There are some good ruby gems available for dealing with OAuth 2.0 and talking to Google APIs, for example:
  • google-api-client is the official Google API Ruby Client makes it trivial to discover and access supported APIs.
  • oauth2-client provides generic OAuth 2.0 support that works not just with Google
  • gmail_xoauth implements XAUTH2 for use with Ruby Net::IMAP and Net::SMTP
  • gmail provides a rich Ruby-esque interface to GMail but you need to pair it with gmail_xoauth for OAuth 2 support (also seems that it's in need of a new release to merge in various updates and extensions people have been working on)

For the task I had at hand, I just wanted something simple: connect to a mailbox, look for certain messages, download and do something with the attachments and exit. It was going to be a simple utility to put on a cron job.

No big deal. The first version simple used gmail_xoauth to enable OAuth 2.0 support for IMAP, and I added some supporting routines to handle access_token refreshing.

It worked fine as a quick and dirty solution, but had a few code smells. Firstly, too much plumbing code. But most heinously - you might seen this yourself if you've done any client utilities with OAuth - it used the widely-recommended Python script to orchestrate the initial authorization. For a ruby tool!

Enter the GmailCli gem

So I refactored the plumbing into a new gem called gmail_cli and it is intended for one thing: a super-simple way to whip up utilities that talk to Google IMAP and providing all the OAuth 2.0 support you need. It actually uses google-api-client and gmail_xoauth under the covers for the heavy lifting, but wraps them up in a neat package with the simplest interface possible. Feel free to go use and fork it!

With gmail_cli in your project, there are just 3 things to do:

  1. If you haven't already, create your API project credentials in the Google APIs console (on the "API Access" tab)
  2. Use the built-in rake task or command-line to do the initial authorization. You would normally need to do this only once for each deployment:
    $ rake gmail_cli:authorize client_id='id' client_secret='secret'
    $ gmail_cli authorize --client_id 'id' --client_secret 'secret'
  3. Use the access and refresh tokens generated in step 2 to get an IMAP connection in your code. This interface takes care of refreshing the access token for you as required each time you use it:
    # how you store or set the credentials Hash is up to you, but it should have the following keys:
    credentials = {
      client_id:     'xxxx',
      client_secret: 'yyyy',
      access_token:  'aaaa',
      refresh_token: 'rrrr',
      username:      ''
    imap = GmailCli.imap_connection(credentials)

A Better Way?

Polling a mailbox is a terrible thing to have to do, but sometimes network restrictions or the architecture of your solution makes it the best viable option. Much better is to be reactive to mail that gets pushed to you as it is delivered.

I've written before about Mandrill, which is the transactional email service from the same folks who do MailChimp. I kinda love it;-) It is perfect if you want to get inbound mail pushed to your application instead of polling for it. And if you run Rails, I really would encourage you to checkout the mandrill-rails gem - it adds Mandrill inbound mail processing to my Rails apps with just a couple of lines of code.

Tuesday, June 18, 2013

Ruby Tuesday

(blogarhythm ~ Ruby - Kaiser Chiefs)
@a_matsuda convinced us to dive into Ruby 2.0 at RedDotRubyConf, so I guess this must be the perfect day of the week for it!

Ruby 2.0.0 is currently at p195, and we heard at the conference how stable and compatible it is.

One change we learned that may catch us if we do much multilingual work that's not already unicode is the change that Ruby now assumes UTF-8 encoding for source files. So the special "encoding: utf-8" marker becomes redundant, but if we don't include it the behaviour in 2.0.0 can differ from earlier versions:
$ cat encoding_binary.rb 
s = "\xE3\x81\x82"
p str: s, size: s.size
$ ruby -v encoding_binary.rb 
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin11.4.2]
{:str=>"あ", :size=>1}
$ ruby -v encoding_binary.rb 
ruby 1.9.3p429 (2013-05-15 revision 40747) [x86_64-darwin11.4.2]
{:str=>"\xE3\x81\x82", :size=>3}

Quickstart on MacOSX with RVM

I use rvm to help manage various Ruby installs on my Mac, and trying out new releases is exactly the time you want it's assistance to prevent screwing up your machine. There were only two main things I needed to take care of to get Ruby 2 installed and running smoothly:
  1. Update rvm so it knows about the latest Ruby releases
  2. Update my OpenSSL installation (it seems 1.0.1e is required although I haven't found that specifically documented anywhere)
Here's a rundown of the procedure I used in case it helps (note, I am running MacOSX 10.7.5 with Xcode 4.6.2). First I updated rvm and attempted to install 2.0.0:
$ rvm get stable
# => updated ok
$ rvm install ruby-2.0.0
Searching for binary rubies, this might take some time.
No binary rubies available for: osx/10.7/x86_64/ruby-2.0.0-p195.
Continuing with compilation. Please read 'rvm mount' to get more information on binary rubies.
Installing requirements for osx, might require sudo password.
-bash: /usr/local/Cellar/openssl/1.0.1e/bin/openssl: No such file or directory
Updating certificates in ''.
mkdir: : No such file or directory
mkdir: : No such file or directory
Can not create directory '' for certificates.
Not good!!! What's all that about? Turns out to be just a very clumsy way of telling me I don't have OpenSSL 1.0.1e installed.

I already have OpenSSL 1.0.1c installed using brew (so it doesn't mess with the MacOSX system-installed OpenSSL), so updating is simply:
$ brew upgrade openssl
==> Summary
 /usr/local/Cellar/openssl/1.0.1e: 429 files, 15M, built in 5.0 minutes
So then I can try the Ruby 2 install again, starting with the "rvm requirements" command to first make sure all pre-requisites are installed:
$ rvm requirements
Installing requirements for osx, might require sudo password.
Tapped 41 formula
Installing required packages: apple-gcc42.................
Updating certificates in '/usr/local/etc/openssl/cert.pem'.
$ rvm install ruby-2.0.0
Searching for binary rubies, this might take some time.
No binary rubies available for: osx/10.7/x86_64/ruby-2.0.0-p195.
Continuing with compilation. Please read 'rvm mount' to get more information on binary rubies.
Installing requirements for osx, might require sudo password.
Certificates in '/usr/local/etc/openssl/cert.pem' already are up to date.
Installing Ruby from source to: /Users/paulgallagher/.rvm/rubies/ruby-2.0.0-p195, this may take a while depending on your cpu(s)
OK, this time it installed cleanly as I can quickly verify:
$ rvm use ruby-2.0.0
$ ruby -v
ruby 2.0.0p195 (2013-05-14 revision 40734) [x86_64-darwin11.4.2]
$ irb -r openssl
2.0.0p195 :001 > OpenSSL::VERSION
 => "1.1.0"
2.0.0p195 :002 > OpenSSL::OPENSSL_VERSION
 => "OpenSSL 1.0.1e 11 Feb 2013"

Saturday, June 15, 2013

Optimising presence in Rails with PostgreSQL

(blogarhythm ~ Can't Happen Here - Rainbow)
It is a pretty common pattern to branch depending on whether a query returns any data - for example to render a quite different view. In Rails we might do something like this:
query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.presence
  results.each {|row| ... }
  # do something else
When this code executes, we raise at least 2 database requests: one to check presence, and another to retrieve the data. Running this at the Rails console, we can see the queries logged as they execute, for example:
(0.9ms)  SELECT COUNT(*) FROM "users" WHERE "users"."deleted_at" IS NULL
 User Load (15.2ms)  SELECT "users".* FROM "users" WHERE "users"."deleted_at" IS NULL
This is not surprising since under the covers, presence (or present?) end up calling count which must do the database query (unless you have already accessed/loaded the results set). And 0.9ms doesn't seem too high a price to pay to determine if you should even try to load the data, does it?

But when we are running on PostgreSQL in particular, we've learned to be leery of COUNT(*) due to it's well known performance problems. In fact I first started digging into this question when I started seeing expensive COUNT(*) queries show up in NewRelic slow transaction traces. How expensive COUNT(*) actually is depends on many factors including the complexity of the query, availability of indexes, size of the table, and size of the results set.

So can we improve things by avoiding the COUNT(*) query? Assuming we are going to use all the results anyway, and we haven't injected any calculated columns in the query, we could simply to_a the query before testing presence i.e.:
query = User.where(deleted_at: nil).and_maybe_some_other_scopes
if results = query.to_a.presence
  results.each {|row| ... }
  # do something else

I ran some benchmarks comparing the two approaches with different kinds of queries on a pretty well-tuned system and here are some of the results:
Query Using present? Using to_a Faster By
10k indexed queries returning 1 / 1716 rows 17.511s 10.938s 38%
4k complex un-indexed queries returning 12 / 1716 rows 23.603s 15.221s 36%
4k indexed queries returning 1 / 1763218 rows 22.943s 20.924s 9%
10 complex un-indexed queries returning 15 / 1763218 rows 23.196s 14.072s 40%

Clearly, depending on the type of query we can gain up to 40% performance improvement by restructuring our code a little. While my aggregate results were fairly consistent over many runs, the performance of individual queries did vary quite widely.

I should note that the numbers were *not* consistent or proportional across development, staging, test and production environments (mainly due to differences in data volumes, latent activity and hardware) - so you can't benchmark on development and assume the same applies in production.

Things get murky with ActiveRecord add-ons

So far we've talked about the standard ActiveRecord situation. But there are various gems we might also be using to add features like pagination and search magic. MetaSearch is an example: a pretty awesome gem for building complex and flexible search features. But (at least with version 1.1.3) present? has a little surprise in store for you:
irb> User.where(id: '0').class
=> ActiveRecord::Relation
irb> User.where(id: 0).present?
   (0.8ms)  SELECT COUNT(*) FROM "users" WHERE "users"."id" = 0
=> false
irb> 0).class
=> MetaSearch::Searches::User
irb> 0).present?
=> true

Any Guidelines?

So, always to_a my query results? Well, no, it's not that simple. Here are some things to consider:
  • First, don't assume that <my_scoped_query>.present? means what you think it might mean - test or play it safe
  • If you are going to need all result rows anyway, consider calling to_a or similar before testing presence
  • Avoid this kind of optimisation except at the point of use. One of the beauties of ActiveRecord::Relation is the chainability - something we'll kill as soon as we hydrate to a result set Array for example.
  • While I got a nice 40% performance bonus in some cases with a minor code fiddle, mileage varies and much depends on the actual query. You probably want to benchmark in the actual environment that matters and not make any assumptions.

Sunday, June 09, 2013

My Virtual Swag from #rdrc

(blogarhythm ~ Everybody's Everything - Santana)

So the best swag you can get from a technology conference is code, right? Well RedDotRubyConf 2013 did not disappoint! Thanks to some fantastic speakers, my weekends for months to come are spoken for. Here's just some of the goodness:

Will I still be a Rubyist in 5 years? #rdrc

(blogarhythm ~ Ruby - Kaiser Chiefs)
The third RedDotRubyConf is over, and I think it just keeps getting better! Met lots of great people, and saw so many of my Ruby heroes speak on stage. Only thing that could make it even better next year would be to get the video recording thing happening!

I had the humbling opportunity to share the stage and here are my slides. Turned out to be a reflection on whether I'd still be a Rubyist in another 5 years, and what are the external trends that might change that. Short story: Yes! Of course. I'll always think like a Rubyist even though things will probably get more polyglot. The arena of web development is perhaps the most unpredictable though.

A couple of areas I highlight that really need a bit more love include:
  • There's a push on SciRuby. Analytics are no longer the esoteric domain of bioinformaticists. Coupled with Big Data (which Ruby is pretty good at), analytics are driving much of the significant innovation in things we build.
  • Krypt - an effort lead by Martin Boßlet to improve the cryptographic support in Ruby. My experience building megar made it painfully obvious why we need to fix this.

Let it never be said, the romance is dead
'Cos there’s so little else occupying my head

I mentioned a few of my projects in passing. Here are the links for convenience:
  • RGovData is a ruby library for really simple access to government data. It aims to make consuming government data sets a "one liner", letting you focus on what you are trying to achieve with the data, and happily ignore all the messy underlying details of transport protocols, authentication and so on.
  • sps_bill_scanner is a ruby gem for converting SP Services PDF invoices into data that can be analysed with R. Only useful if you are an SP Services subscriber in Singapore, but other wise perhaps an interesting example of extracting postitional text from PDF and doing some R.
  • megar ("megaargh!" in pirate-speak) is a Ruby wrapper and command-line (CLI) client for the API. My example of how you *can* do funky crypto in Ruby ... it's just much harder than it should be!