Google API – Sinatra + OAuth2

Here is a quick example of using the Google with OAuth2 and Sinatra and it should take less than 20 minutes to get running!

If you’re here to figure out how to use GMail’s API with OAuth2 then you should know it does not work as of 7/23/2012. Use OAuth 1.0/XOAUTH.

Requirements

  1. Ruby 1.8.7+
  2. Bundler (gem install bundler)

We will also need a Google Client ID and Secret. To generate these go to the Google APIs Console and on the left menu select API Access. On this page you’ll want to Create a Client ID. Follow these steps, get your access, and then follow along…

Let’s run

Okay by this point we have Ruby, Bundler, and our Google credentials. Create a new directory somewhere…

mkdir oauth2-sinatra && cd oauth2-sinatra

Let’s create a few of our boilerplate files we’ll be using.

## file: Gemfile
 
source :rubygems
 
gem 'sinatra'
gem 'json'
gem 'oauth2'

Next, run this command to get our dependencies installed:

bundle install

We’re going to run Sinatra as a rack application. Don’t worry too much about this step if it’s confusing. Rack gives us a lightweight way to run our Sinatra application so let’s make our super simple config file invoke our Sinatra application (line 9).

1
2
3
4
5
6
7
8
9
## file: config.ru
 
require 'rubygems'
require 'bundler'
Bundler.require
 
require File.expand_path(File.dirname(__FILE__) + '/app')
 
run Sinatra::Application

Line 7 of the above code snippet should make you a bit concerned since we have not created an app.rb file yet so that require is going to throw an error. Don’t believe me? Try it out and run rackup.

But before we add app.rb I really want to get the rest of the little stuff out of the way. Our Sinatra application is going to have a few views so execute the next command

mkdir views

and create the following files.

<!-- file: index.erb -->
<a href="/auth">Auth</a>
<!-- file: success.erb -->
<%=@message.inspect%>
<%=@access_token.inspect%>
<%=@email.inspect%>

Great. We should now have the following files in our oauth2-sinatra directory:

- Gemfile
- config.ru
+ views
-- index.erb
-- success.erb

Good? Okay. Now for the fun part. I’ll describe what’s happening and then show you how we implement it.

We want to create a small application that will let a user visit our home page. Our home page will have a link to /auth which is our action that starts the authentication process with Google. The process is quite simple: we redirect our user to Google’s authentication URL and then Google sends them back to us once user has approved our request to make API calls on his or her behalf. We can then save that token (not in this demo) or do whatever we want with it. That’s pretty much it.

So here it is in code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
## file: app.rb
 
require 'sinatra'
require 'oauth2'
require 'json'
 
enable :sessions
 
# Scopes are space separated strings
SCOPES = [
    'https://www.googleapis.com/auth/userinfo.email'
].join(' ')
 
unless G_API_CLIENT = ENV['G_API_CLIENT']
  raise "You must specify the G_API_CLIENT env variable"
end
 
unless G_API_SECRET = ENV['G_API_SECRET']
  raise "You must specify the G_API_SECRET env veriable"
end
 
def client
  client ||= OAuth2::Client.new(G_API_CLIENT, G_API_SECRET, {
                :site => 'https://accounts.google.com', 
                :authorize_url => "/o/oauth2/auth", 
                :token_url => "/o/oauth2/token"
              })
end
 
get '/' do
  erb :index
end
 
get "/auth" do
  redirect client.auth_code.authorize_url(:redirect_uri => redirect_uri,:scope => SCOPES,:access_type => "offline")
end
 
get '/oauth2callback' do
  access_token = client.auth_code.get_token(params[:code], :redirect_uri => redirect_uri)
  session[:access_token] = access_token.token
  @message = "Successfully authenticated with the server"
  @access_token = session[:access_token]
 
  # parsed is a handy method on an OAuth2::Response object that will 
  # intelligently try and parse the response.body
  @email = access_token.get('https://www.googleapis.com/userinfo/email?alt=json').parsed
  erb :success
end
 
def redirect_uri
  uri = URI.parse(request.url)
  uri.path = '/oauth2callback'
  uri.query = nil
  uri.to_s
end

And that’s it.

For the full code please check out GitHub gmail-oauth2-sinatra The demo on GitHub uses GMail even though it’s not currently supported. I’m getting ready for the future…

Saturday Scrape — Surf Reports

It’s Saturday. The surf is bad today which is why I decided to write a surfline.com scraper. It’s a freestyle post written while I code so don’t sweat the small stuff. This is entirely a quick, dirty solution to getting some data to play with.

Tools used during this session:

  1. Ruby 1.9.2
  2. Nokogiri (HTML Parsing)
  3. Typhoeus (HTTP Library for Fetching HTML)

First… Why write a scraper? No API exists for SurfLine.com and I want data.

SurfLine.com offers two ways of accessing data:

  1. Consume the web page HTML
  2. Consume the widget HTML

Each present their own problems. As I go through this short tutorial I’ll show you when things change and why knowing how to pivot is important.

The data that I want (right now) is the height of the waves for a surfing spot I frequent. The URL is http://www.surfline.com/surf-report/38th-ave-central-california_4191/ Hey, cool, turns out the height is wrapped in a nice identifiable DOM element:

<p id="text-surfheight">1-2 ft</p>

So a quick XPath selection on the Nokogiri::HTML document gets what we want…

elem = Nokogiri::HTML(page).search("//p[@id = 'text-surfheight']")

elem now contains an array of the elements we found from our search. Let’s pull the first one out and grab the inner_text

elem.first.inner_text

We’re done right? Unfortunately, surf reports are user reported and not always in the format we’d expect. I quickly discovered some pages don’t contain a text-surfheight id, but instead a short sentence describing the height:

<p class="text-data bottom-space-10">Inconsistent occ. 2 ft. </p>

That’s frustrating since now our code can’t simply look for the same element every time. So we improvise. Instead of spending time figuring out how to triangulate what I want out of this big page I start to look and see if there is a widget or API that could give me the surf report; there is a widget service. It makes a JavaScript call to load an HTML iFrame up. Great. So I jump right in and check out the new HTML page I’m looking at. The good thing about this widget is it’s only the surf report and not a bunch of web site features, videos, and links that I don’t need to look at. And, most importantly, the widget displays something about the wave height _always_. Unfortunately, the widget HTML is disgustingly ugly and has no apparent patterns. Sometimes the surf report height is contained in a span element and other times it’s thrown into a div; neither have ids. Iterating over a few different surf report pages I find that the widget does have one pattern: CSS Styling. (I know. Yucky)

But, the nice thing about extensive hardcoded styling in HTML is that it can actually serves as uniquely identifiable keys when looking at a small amount of html (like a widget!). So we can write an XPath search:

# Helper method to take a Nokogiri search and return nil
# or the value of a non-empty element
def inner_text nokogiri_search
  nokogiri_search.first.inner_text rescue nil # exist or set to nil
end
 
# spot_id comes out of a hash. Check out the full code linked @ the
# bottom of this page to see more
n = Nokogiri::HTML(
  grab_page(
    "http://www.surfline.com/widgets2/widget_camera_mods.cfm?id=#{spot_id}&amp;mdl=0111&amp;ftr=&amp;units=e&amp;lan=en"
  )
)
 
height = inner_text(n.xpath("//span[@style='font-size:21px;font-weight:bold']")) ||
  inner_text(n.xpath("//div[@style='font-size:12px;padding-left:10px;margin-bottom:7px;']")) ||
  "Report Not Available"

If the first clause passes than we have a wave height. If the second conditional passes we have a short sentence describing the surf conditions. If we can’t find anything we just default to “Report Not Available”

Okay, so it is not pretty but we’ve now got a decent way to identify wave heights from surf reports on Surfline.com’s Widgets which I’ve tested across 10 surf spots and seems to work OK for this initial prototype.

What’s next? Adding in the Tides table on the widget. It’s also a fun trickster since you have to look for the test “TIDE,” take first search result, and grab the parent element:

tides = n.xpath("//div//small[contains(text(),'TIDES')]").first.parent

Which gives us:

"\nTIDES:\n\n \n \n \n \n 02/24\u00A0\u00A0\u00A005:48AM\u00A0\u00A0\u00A01.23ft.\u00A0\u00A0\u00A0LOW\n \n \n \n \n 02/24\u00A0\u00A0\u00A011:46AM\u00A0\u00A0\u00A04.45ft.\u00A0\u00A0\u00A0HIGH\n \n \n \n \n 02/24\u00A0\u00A0\u00A005:49PM\u00A0\u00A0\u00A01.07ft.\u00A0\u00A0\u00A0LOW\n \n \n \n \n 02/25\u00A0\u00A0\u00A012:07AM\u00A0\u00A0\u00A04.97ft.\u00A0\u00A0\u00A0HIGH\n"

That looks ugly. Why is there Unicode in there? Let’s pull out just what we want…

prettier_tides = tides.text.gsub("\u00A0\u00A0\u00A0"," ").scan(/\d(.*?)\n/)
# => ["02/24 05:48AM 1.23ft. LOW", "02/24 11:46AM 4.45ft. HIGH", "02/24 05:49PM 1.07ft. LOW", "02/25 12:07AM 4.97ft. HIGH"]

What you do with this data is now up to you. I store it in a SQLite database and run the script every hour or so to get updates form 8 am to 2 pm PST.

ShouldISurf GitHub and for the scraping code you should look at lib/grab_reports.rb


As of 4/12/2012 this code has been running daily for almost 2 months and serving up surf tides on shouldisurf.com. Let me go knock on wood. Okay, back. The code base is small and effective. I’m glad I didn’t invest any time in making a robust solution now!