Local vs Remote MySQL Inserts
I wrote this really long blog post and then said to myself TL;DR; Here are some take aways
- Ruby does not have great support for reading large Excel (100MB+) files
- Roo, SpreadSheet, ParseExcel all try to load whole report into memory -> 5GB+
- Python has excellent Excel parsing support
- OpenPyXL supports row-by-row iteration
- Insert to a local MySQL database then dump & load into your remote MySQL database
((24000000 rows/250 remote inserts a second)/60 seconds)/60 minutes = 26.6 hours- vs
((24000000 rows/3000 local inserts a second)/60 seconds)/60 minutes = 2.2 hours
Ultimately, dividing and conquering a problem is a great strategy and can significantly reduce your time to process large data sets. However, be careful that the co-routine you're using in your divisions of labor are as fast as possible. It is way faster to fill up a local MySQL DB and then transfer it to a Remote host.
Google API – Sinatra + OAuth2
Here is a quick example of using the Google with OAuth2 and Sinatra and it should take less than 20 minutes to get running!
If you're here to figure out how to use GMail's API with OAuth2 then you should know it does not work as of 7/23/2012. Use OAuth 1.0/XOAUTH.
Requirements
- Ruby 1.8.7+
- Bundler (
gem install bundler)
We will also need a Google Client ID and Secret. To generate these go to the Google APIs Console and on the left menu select API Access. On this page you'll want to Create a Client ID. Follow these steps, get your access, and then follow along...
Let's run
Okay by this point we have Ruby, Bundler, and our Google credentials. Create a new directory somewhere...
mkdir oauth2-sinatra && cd oauth2-sinatra
Let's create a few of our boilerplate files we'll be using.
## file: Gemfile source :rubygems gem 'sinatra' gem 'json' gem 'oauth2'
Next, run this command to get our dependencies installed:
bundle install
We're going to run Sinatra as a rack application. Don't worry too much about this step if it's confusing. Rack gives us a lightweight way to run our Sinatra application so let's make our super simple config file invoke our Sinatra application (line 9).
1 2 3 4 5 6 7 8 9 | ## file: config.ru require 'rubygems' require 'bundler' Bundler.require require File.expand_path(File.dirname(__FILE__) + '/app') run Sinatra::Application |
Line 7 of the above code snippet should make you a bit concerned since we have not created an app.rb file yet so that require is going to throw an error. Don't believe me? Try it out and run rackup.
But before we add app.rb I really want to get the rest of the little stuff out of the way. Our Sinatra application is going to have a few views so execute the next command
mkdir views
and create the following files.
<!-- file: index.erb --> <a href="/auth">Auth</a>
<!-- file: success.erb --> <%=@message.inspect%> <%=@access_token.inspect%> <%=@email.inspect%>
Great. We should now have the following files in our oauth2-sinatra directory:
- Gemfile
- config.ru
+ views
-- index.erb
-- success.erb
Good? Okay. Now for the fun part. I'll describe what's happening and then show you how we implement it.
We want to create a small application that will let a user visit our home page. Our home page will have a link to /auth which is our action that starts the authentication process with Google. The process is quite simple: we redirect our user to Google's authentication URL and then Google sends them back to us once user has approved our request to make API calls on his or her behalf. We can then save that token (not in this demo) or do whatever we want with it. That's pretty much it.
So here it is in code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | ## file: app.rb require 'sinatra' require 'oauth2' require 'json' enable :sessions # Scopes are space separated strings SCOPES = [ 'https://www.googleapis.com/auth/userinfo.email' ].join(' ') unless G_API_CLIENT = ENV['G_API_CLIENT'] raise "You must specify the G_API_CLIENT env variable" end unless G_API_SECRET = ENV['G_API_SECRET'] raise "You must specify the G_API_SECRET env veriable" end def client client ||= OAuth2::Client.new(G_API_CLIENT, G_API_SECRET, { :site => 'https://accounts.google.com', :authorize_url => "/o/oauth2/auth", :token_url => "/o/oauth2/token" }) end get '/' do erb :index end get "/auth" do redirect client.auth_code.authorize_url(:redirect_uri => redirect_uri,:scope => SCOPES,:access_type => "offline") end get '/oauth2callback' do access_token = client.auth_code.get_token(params[:code], :redirect_uri => redirect_uri) session[:access_token] = access_token.token @message = "Successfully authenticated with the server" @access_token = session[:access_token] # parsed is a handy method on an OAuth2::Response object that will # intelligently try and parse the response.body @email = access_token.get('https://www.googleapis.com/userinfo/email?alt=json').parsed erb :success end def redirect_uri uri = URI.parse(request.url) uri.path = '/oauth2callback' uri.query = nil uri.to_s end |
And that's it.
For the full code please check out GitHub gmail-oauth2-sinatra The demo on GitHub uses GMail even though it's not currently supported. I'm getting ready for the future...



