Local vs Remote MySQL Inserts

I wrote this really long blog post and then said to myself TL;DR; Here are some take aways

  • Ruby does not have great support for reading large Excel (100MB+) files
    • Roo, SpreadSheet, ParseExcel all try to load whole report into memory -> 5GB+
  • Python has excellent Excel parsing support
    • OpenPyXL supports row-by-row iteration
  • Insert to a local MySQL database then dump & load into your remote MySQL database
    • ((24000000 rows/250 remote inserts a second)/60 seconds)/60 minutes = 26.6 hours
    • vs ((24000000 rows/3000 local inserts a second)/60 seconds)/60 minutes = 2.2 hours

Ultimately, dividing and conquering a problem is a great strategy and can significantly reduce your time to process large data sets. However, be careful that the co-routine you’re using in your divisions of labor are as fast as possible. It is way faster to fill up a local MySQL DB and then transfer it to a Remote host.

Google API – Sinatra + OAuth2

Here is a quick example of using the Google with OAuth2 and Sinatra and it should take less than 20 minutes to get running!

If you’re here to figure out how to use GMail’s API with OAuth2 then you should know it does not work as of 7/23/2012. Use OAuth 1.0/XOAUTH.

Requirements

  1. Ruby 1.8.7+
  2. Bundler (gem install bundler)

We will also need a Google Client ID and Secret. To generate these go to the Google APIs Console and on the left menu select API Access. On this page you’ll want to Create a Client ID. Follow these steps, get your access, and then follow along…

Let’s run

Okay by this point we have Ruby, Bundler, and our Google credentials. Create a new directory somewhere…

mkdir oauth2-sinatra && cd oauth2-sinatra

Let’s create a few of our boilerplate files we’ll be using.

## file: Gemfile
 
source :rubygems
 
gem 'sinatra'
gem 'json'
gem 'oauth2'

Next, run this command to get our dependencies installed:

bundle install

We’re going to run Sinatra as a rack application. Don’t worry too much about this step if it’s confusing. Rack gives us a lightweight way to run our Sinatra application so let’s make our super simple config file invoke our Sinatra application (line 9).

1
2
3
4
5
6
7
8
9
## file: config.ru
 
require 'rubygems'
require 'bundler'
Bundler.require
 
require File.expand_path(File.dirname(__FILE__) + '/app')
 
run Sinatra::Application

Line 7 of the above code snippet should make you a bit concerned since we have not created an app.rb file yet so that require is going to throw an error. Don’t believe me? Try it out and run rackup.

But before we add app.rb I really want to get the rest of the little stuff out of the way. Our Sinatra application is going to have a few views so execute the next command

mkdir views

and create the following files.

<!-- file: index.erb -->
<a href="/auth">Auth</a>
<!-- file: success.erb -->
<%=@message.inspect%>
<%=@access_token.inspect%>
<%=@email.inspect%>

Great. We should now have the following files in our oauth2-sinatra directory:

- Gemfile
- config.ru
+ views
-- index.erb
-- success.erb

Good? Okay. Now for the fun part. I’ll describe what’s happening and then show you how we implement it.

We want to create a small application that will let a user visit our home page. Our home page will have a link to /auth which is our action that starts the authentication process with Google. The process is quite simple: we redirect our user to Google’s authentication URL and then Google sends them back to us once user has approved our request to make API calls on his or her behalf. We can then save that token (not in this demo) or do whatever we want with it. That’s pretty much it.

So here it is in code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
## file: app.rb
 
require 'sinatra'
require 'oauth2'
require 'json'
 
enable :sessions
 
# Scopes are space separated strings
SCOPES = [
    'https://www.googleapis.com/auth/userinfo.email'
].join(' ')
 
unless G_API_CLIENT = ENV['G_API_CLIENT']
  raise "You must specify the G_API_CLIENT env variable"
end
 
unless G_API_SECRET = ENV['G_API_SECRET']
  raise "You must specify the G_API_SECRET env veriable"
end
 
def client
  client ||= OAuth2::Client.new(G_API_CLIENT, G_API_SECRET, {
                :site => 'https://accounts.google.com', 
                :authorize_url => "/o/oauth2/auth", 
                :token_url => "/o/oauth2/token"
              })
end
 
get '/' do
  erb :index
end
 
get "/auth" do
  redirect client.auth_code.authorize_url(:redirect_uri => redirect_uri,:scope => SCOPES,:access_type => "offline")
end
 
get '/oauth2callback' do
  access_token = client.auth_code.get_token(params[:code], :redirect_uri => redirect_uri)
  session[:access_token] = access_token.token
  @message = "Successfully authenticated with the server"
  @access_token = session[:access_token]
 
  # parsed is a handy method on an OAuth2::Response object that will 
  # intelligently try and parse the response.body
  @email = access_token.get('https://www.googleapis.com/userinfo/email?alt=json').parsed
  erb :success
end
 
def redirect_uri
  uri = URI.parse(request.url)
  uri.path = '/oauth2callback'
  uri.query = nil
  uri.to_s
end

And that’s it.

For the full code please check out GitHub gmail-oauth2-sinatra The demo on GitHub uses GMail even though it’s not currently supported. I’m getting ready for the future…