Posts in Category: Dev

Want a lightweight in memory database – try Boon!

So you have a million objects in memory, you’ve got them in a Map – so you can access them by key very quickly, but now you want to find all that match on another field. Well, they are in memory, so you can go through them all. However that is going to be slow.

static Collection
<map>database = new ArrayList
<map>();
...
Collection
<map>results = new LinkedList
<map>();
for (Map entry : database) {
if (entry.get("colour").equals("red")) {
results.add(entry);
}
}

You could stick them in a DB (like MySQL or MongoDB) however it will still be pretty slow to query/convert to/from or you could write your own query engine from scratch. Admittedly its optimised for that, but it’s still a separate process and the data has been converted into your

If only you could index the collection by that other field…

Well you can – just put the collection into a Boon Repo and add an index on that field and it will take care of ensuring the index is kept up to date.

Repo&lt;Integer, Map&gt; dbRepo = Repos.builder()
.primaryKey("name")
.lookupIndex("colour")
.build(int.class, Map.class);
...
dbRepo.addAll(database);
...
List

<map>result = dbRepo.query(eq("colour", colour));

For these simple queries the search changes from O(n) to O(1) – bucket loads quicker!

It provides a rich set of criteria facilities that allows you to query in a variety of ways – equality, in, between, etc and these can be composed using and/or …

List

<map>result = dbRepo.query(and( eq("sport", sport), eq("job", job)));

It effectively provides an In memory object database, searchable, you define the indexes you want to use – doesn’t have to be unique.
– its pure Java (1.7) – although I have a 1.6 backport.

Ideal for the case where you have low frequency updates but lots of varied queries.

Its a little quirky when it comes to updating items – to update an item, you need to either delete/re-add it or get the existing object, clone it, amend it and re-add it.

managers = dbRepo.query(eq("job", "manager"));
Map updatedMgr = Maps.copy(managers.get(0)); // Maps.copy is a boon cloning util
updatedMgr.put("colour","blue");
dbRepo.modify(updatedMgr);

I have found performance to vary somewhat on more complex queries – but I am sure that is something that will improve greatly.

As with all libraries, you need to use it and see if it helps with what you need.

Links to similar data repo articles
Boon’s Data Repo by Example
Boon Data Repo Indexed Collections and more
Background on Data Repo
What if Java Collections and Java Hierarchies were Easily Searchable?
– Unrelated, but… Boon beer

Thanks to Rick Hightower for giving us this boon!

@skillsmatter presents @unclebobmartin ‘s TDD and Refactoring course

A few weeks ago I attended Uncle Bob’s TDD and Refactoring course – here are my highlights…

Number one for me was this quote “TDD is double entry bookkeeping for programmers!”

Skills You Need When Looking at code:

– able to identify problems
– knowing whats better
– transform in small steps towards the better version

When the code makes you do something ugly – the design is wrong!

An interesting presentation style point was the intermission discussions – completely unrelated.  A few of us were trying to work out how they related to the course, until someone asked and found out they didnt :)

Thanks again, Mr Martin.

WoW Activity Feed – the tech behind it…

As mentioned previously – its live!

The app is built on nodejs, using coffeescript with a MongoDB database.

I chose coffeescript, as I prefer the Ruby-like syntax and really don’t like Javascript curly-brace’s. I also found NodeJS to be very fast and lightweight. Also, the callback model in node takes some getting used to.

Libraries like async – help make the callbacks more manageable and avoid the pyramid of death.

I used the rapidly growing expressjs framework – which was great at keeping out of your way and letting you just do what you need.

HTML was put together via Jade templates and Stylus stylesheet helper.

The core of the app accesses the Armory via a great little library – node-armory.

The data from the armory is saved directly as JSON to MongoDB and as further updates come in, the jsondiffpatch library – is used to determine whats changed.

There are some tests written with mocha and sinon for stubs/mocks.  My testing style is to use them when there is a problem – so they are probably broken at the moment.

I found a few features missing with the node-armory library and so forked the project to address them, such as support for using a proxy (for debug purposes) and using Mike Reinstein’s version of the http lib request that supports compressed request/replies.

See package.json for details of all the libraries used.

A couple of MongoDB’s features that came in handy, like:

  • Time to live (TTL) collections – records automatically removed when a date field is older than specified.  Useful to ensure only recent data is kept.
  • Capped collections – limit how big a table gets, automatically throws away old records when it gets full. Good for logs.
  • Aggregate queries – a map/reduce like facility for querying the DB.

Definitely the worst bit of the code is the RSS feed entry formatting – need to think of some ways to refactor it sensibly.

To make the search feature nice and responsive, backbone was used in a very basic way.

The main thing I missed from Rails was the asset pipeline, which lets you combine all the client side assets (javascript and css files) into just a few minified files.

One of the most interesting parts of the app (at least to me), is the scheduled job that checks the armory for updates.  The core of which uses an async queue to kick off many calls to the WoW API and collect the results.

Rails 3.2.11 upgrade (lorra lorra security issues…)

There have been quite a few Rails issues over the last few weeks… so I better upgrade mine too :( Note I use rvm with gemsets to separate gem versions – probably could stop using gemsets given the latest bundler, but havent got on that bandwagon yet.

Bingo Caller!

link Not currently live on a public site, but good basic example to test the upgrade. This is currently rails 3.2.8 (rails 1.9.3/sqlite), which is not too far behind the latest, 3.2.11 – so it should be easy… First step, rvm implode and re-install – I have lots of rvm/gemset cruft and now seems like a good time to tidy that up.  Then run bundle to get the gems that should currently work. Then realised that rvm wasnt loaded properly and so the gem installs did not go into my gemset – so re-started the terminal session and ran bundle again (probably could have source’d rvm, but probably better this way). Make sure the db is up to date (rake db:migrate). Then try running the app (rails server) [yup – no tests…].  Seems to be working fine – thought for a moment I was done, but then realised I have not upgraded rails yet – doh! bundle update Now I am on rails 3.2.11.  Lets try again – rake db:migrate; rails server. And seems to be working ok. Checked the log files – one error about binary data in a string field (encrypted_password) – but then we had that previously. Also used the rails_admin gem for a quick built in db-viewer – thats working ok too. Time to see if it works on Heroku too…

  1. Install the Heroku Toolbelt
  2. (re-) create the app – heroku create –app bingo-caller
  3. push the code – git push heroku master
  4. Which failed :( – something about sqlite3 – but that shouldn’t be in prod mode…
  5. Gemfile did not specify ‘pg’ for Postgres – so amended it to say sqlite3 for dev/test and pg for prod (group :production do … end)
  6. This deployed, but then got an error when trying the app ( http://bingo-caller.herokuapp.com )
  7. Maybe it needs the db setup (heroku rake db:migrate) – that also failed “undefined method `database_authenticatable’ for #<ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::TableDefinition:0x00000004a16f68>”.  Looks like some Devise related issue.
  8. Mmm – lets try re-creating the sqlite db – does that work? Nope – same error.
  9. This was a Devise 2.1.2 to 2.2.1 upgrade… a quick google highlights that the schema layout has changed. Re-organised as per that link and now its deploying/working ok.

Humming Now

link Largely a javascript based site – but does use Devise, so that upgrade might make it problematic. Rails 3.2.1 (ruby 1.9.2/sqlite/postgres)

  1. Decided to make sure gcc was good (surprised I didnt get issues above…)
  2. Installed brew.
  3. Install brew gcc etc utils, see here and here.
  4. Install 1.9.2 (seemed to be more reliable for this app, ssl related issues – although that could be an issue with my mac config)
  5. Now we can get to the app, run bundle to install the current gem versions.
  6. but that failed to find one gem version – “Could not find jquery-rails-2.0.0 in any of the sources” – strange, has it been removed? (shows as ‘yanked’ here – http://rubygems.org/gems/jquery-rails/versions )
  7. So, trying to get the latest version, via ‘bundle update jquery-rails’
  8. But then found another missing one – “Could not find twitter-2.0.2 in any of the sources”
  9. So maybe I will just update them all :) ‘bundle update’ – phew, that seems to have worked.
  10. Moved aside the sqlite db and tried ‘rake db:migrate’, which got the devise/db auth error as above.
  11. So followed the above fix, re-did the schema and that worked.
  12. Tried rails console and that seems ok
  13. Then tried rails server, got a warning from Devise – “Devise.use_salt_as_remember_token is deprecated and has no effect. Please remove it”
  14. The home page seems to load, but there is a javascript error behind the scenes, looks like its not escaping the json stuff ” … Router({&#x27;url_root&#x …”
  15. Looks like ruby/erb is encoding the model json stuff – noticing that I have <%== (double equals) in the other js parameter, which is working – so I tried that in the first one and it worked.  Cant seem to find it defined anywhere, but presumably its to not escape the value.
  16. Tried running rails console – but got [BUG] Segmentation fault.  Then ran rake db:migrate and tried rails c again and it worked (also changed directory elsewhere and back…)
  17. The Twitter gem used has moved on from 2.0.2 which is currently used – latest is 4.4.4, in particular, Twitter has removed the public timeline, which this site uses.
  18. Tried to remove public timeline, but getting issues with current_user not being defined.
  19. Tweetstream gem allows access to public timeline via the streaming API – thats worked :) , but entities/link urls are coming through strangely…  :(
  20. Getting some issues around the user/session.  It seems to let me login (via Twitter) and the user home page comes up.  But any subsequent requests are failing – it finds no current user. Its like the session is losing the user or its being re-created.
  21. I then reverted those changes and tried to just get the current app running and I am still getting the same issue.  I had to update some gems which had been yanked, but then that led to other gems getting updated… But have fixed the Rails/devise/rack/omniauth gems at their original versions.
  22. I would think that devise/rails looks after the user in the session – so that would be the place to look – but now I have pinned those to the current live version, that doesnt seem likely…
  23. Time to try a different approach – just patching the current Rails version – see if that can be done on Heroku…
  24. As a couple of the gems have been ‘yanked’, I amended the Gemfile to use github versions, like so
  25. gem ‘twitter’, :git => ‘https://github.com/sferik/twitter.git‘, :tag => ‘v1.6.0’

  26. And it worked – well, now I have a running app, time to patch it!
  27. Best link I found is this Engine Yard one.
  28. Before/after I tried this exploit tester – but it seemed like it was ok before/after.
  29. From the above work trying the big bang update, I re-applied the changes to get the public timeline working via tweetstream.  Oops – just noticed search does not work… something for later.
  30. And then pushed it to production, next!

Quoter

Largely a backend site, but has a few public urls with stats on.  Use Mechanize and Savon, will probably need to update these too.

  1. Following the steps above, via rvm/bundler – get all the required gems in place.
  2. First, lets try the exploit tester… says its not vulnerable (at least without logging in)- but better safe than sorry…
  3. Ok, lets try a full update of Rails – currently on 3.2.3… change the Gemfile and run ‘bundle update rails’
  4. But that didnt work – seems some old gem is causing issues, getting error “NoMethodError: undefined method `field_changed?'”
  5. so lets try ‘big bang’ – ‘bundle update’ …
  6. It seemed to work, but needs more testing ….
  7. So, went for the patch option from the Engine Yard link above.  Did some testing, seems ok.
  8. And then pushed to prod :)

Image Site

Not deployed, so perhaps leave… last version used  – 3.0.9 …

Other useful links:

  • http://railsapps.github.com/updating-rails.html
  • https://groups.google.com/forum/?fromgroups=#!msg/rubyonrails-security/61bkgvnSGTQ/nehwjA8tQ8EJ

World of Warcraft Activity Feeds – now live!

Hi,

A few years ago, the WoW Armory had a way (unofficially) to access a feed of updates for your in game character, eg gaining a level etc.  (or so I seem to remember, but maybe my mind is playing tricks…).

Then along came the new Armory site and that feed wasn’t available anymore :( ..

Last year, Blizzard came out with an API to access WoW character progress data and it was thought that an official feed would be produced. However 12 months later, there is no sign of a feed.

Recently I wanted such a feed, had a look around and finding none – decided to write one.  And its now live http://wowactivity.kimptoc.net/ :)

As users of the WoW API have to be open source – the code is here on github.

The feed providers both character and guild RSS feed. Its largely based on the news/feed items that come with character and guild lookups but it also does track changes, eg when level changes.  This RSS feed can be used in many ways, from piping it into a feed reader to sending it to twitter, facebook or guild websites.

Its currently tracking over 150 characters and guilds.

Its built using coffeescript (aka javascript) and MongoDB – will probably do a separate post on that.

Enjoy!

Chris

 

Ruby Timeout problem – the cheaters way out ….

There is (still) a general issue with Timeout’s in Ruby, relating to its use of Thread.kill etc, see this link for details.

Unfortunately its an issue in JRuby too.  The solutions suggested in the link are quite low-level (and no other high option seems to be exposed).

I have some code that uses Net::HTTP that we need to set timeouts on (we need to know if the other end has a problem) which under load, hits the above issue – we start running out of resources (too many open files).  Being a lazy/pragmatic programmer and having the advantage of working with JRuby – I decided to cheat and switch to a Java library that solves the timeout problem properly.  Meet my friend, httpcomponents (formerly httpclient)

I tried to use it directly in Ruby, but that gets messy, so I wrapped it up a bit:


package com.x;

import org.apache.http.HttpEntity;
import org.apache.http.HttpHost;
import org.apache.http.HttpResponse;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.AuthCache;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.client.protocol.ClientContext;
import org.apache.http.conn.params.ConnRoutePNames;
import org.apache.http.entity.StringEntity;
import org.apache.http.impl.auth.BasicScheme;
import org.apache.http.impl.client.BasicAuthCache;
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.params.CoreConnectionPNames;
import org.apache.http.params.HttpParams;
import org.apache.http.params.SyncBasicHttpParams;
import org.apache.http.protocol.BasicHttpContext;
import org.apache.http.util.EntityUtils;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

/**
* User: kimptonc
* Date: 09/01/13
* Time: 12:05
* Thin wrapper around HttpClient lib - used by the ruby code
*/
public class HttpClient {

	private String host;
	private int port;
	private String user;
	private String pass;
	private DefaultHttpClient httpclient;
	private HttpHost targetHost;
	private BasicAuthCache authCache;
	private int conn_timeout;
	private int so_timeout;
	private static Object lock = new Object();

	public HttpClient(String host, int port, String user, String pass, int conn_timeout, int so_timeout) {
		this.host = host;
		this.port = port;
		this.user = user;
		this.pass = pass;
		this.conn_timeout = conn_timeout;
		this.so_timeout = so_timeout;

		while (httpclient == null)
		{
			synchronized (lock) {
				connect();
			}
		}

	}

	public Map post(String url, String requestBody) throws Exception
	{
		try {
			while (httpclient == null)
			{
				synchronized (lock) {
					connect();
				}
			}
			HttpPost post = new HttpPost(url);

			StringEntity requestEntity = new StringEntity(requestBody);
			requestEntity.setContentType("application/xml");
			post.setEntity(requestEntity);

			// Add AuthCache to the execution context

			BasicHttpContext localcontext = new BasicHttpContext();
			localcontext.setAttribute(ClientContext.AUTH_CACHE, authCache);

			HttpResponse response = httpclient.execute(targetHost, post, localcontext);
			HttpEntity entity = response.getEntity();

			Map responseMap = new HashMap();
			String responseBody = EntityUtils.toString(entity);
			responseMap.put("body",responseBody);
			responseMap.put("status",response.getStatusLine());

			EntityUtils.consume(entity); // needed?

			return responseMap;
		} catch (Exception e) {
			httpclient.getConnectionManager().shutdown();
			httpclient = null;
			throw e;
		}
	}

	private void connect() {
		if (httpclient != null)
			return; // looks like we are connected, so ignore

		targetHost = new HttpHost(host, port, "http");

		HttpParams params = new SyncBasicHttpParams();
		params
			.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, so_timeout)
			.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, conn_timeout);

		httpclient = new DefaultHttpClient(params);

		httpclient.getCredentialsProvider().setCredentials(
		new AuthScope(targetHost.getHostName(), targetHost.getPort()),
		new UsernamePasswordCredentials(user, pass));

		// Create AuthCache instance
		authCache = new BasicAuthCache();
		// Generate BASIC scheme object and add it to the local
		// auth cache
		BasicScheme basicAuth = new BasicScheme();
		authCache.put(targetHost, basicAuth);

		// for testing against a proxy in dev (eg Charles)
		// HttpHost proxy = new HttpHost("localhost", 8888);
		// httpclient.getParams().setParameter(ConnRoutePNames.DEFAULT_PROXY, proxy);


	}

	public static void main(String[] args) throws Exception {
		HttpClient hc = new HttpClient("localhost", 8080,"one", "two",1,1);
		Map resp = hc.post("/v1/bond","<bond/>");
		System.out.println("Resp:"+resp);
	}
}

Then to use it in the Ruby, code its just this:


@http_client = com.x.HttpClient.new(@svc_host, @svc_port.to_i,@svc_user,@svc_pass, open_timeout.to_i, read_timeout.to_i)

resp = @http_client.post @url_path, xml

And it works – when we put this under load, we dont run out of resources :)

Scaling Rails…

For the last few weeks I have been trying to tune a rails app… getting to the point where we are wondering which way to scale – bigger box or more boxes…

I have tried to summarise the problem on Stackoverflow here.

If you have any ideas/suggestions, please add them there.

Thanks in advance :)

I created a (Ruby) DSL …

A DSL is a Domain Specific Language – that is, small language that should help frame solutions to particular problems.

In this case, we had some config for a report writer program that was parsed using custom code, things like so:

<report name>
{
  FORMAT = CSV
  RECORD = A.PATH.TO.RECORD
  MODE = streaming
  KEYFIELD = TradeId
  FIELDS {
    Id
    TradeId
    RecordType
    Date
    Time
  }
  WHERE Date = "20120924"
}

The parser was getting hairier and we need to add more features – more complex WHERE clauses etc.  So, it seemed an opportunity to throw in a DSL.  This is the new DSL format for the above:

report "<report name>" do
  format "CSV"
  record "A.PATH.TO.RECORD"
  mode "streaming"
  keyfield "TradeId"
  fields do
    column "Id"
    column "TradeId"
    column "RecordType"
    column "Date"
    column "Time"
  end
  where do
    Date == "20120924"
  end
end

When I first saw things like this, it seemed like magic – Ruby must be doing really complex stuff to handle it – but it isnt :)

Its all down to defining methods with the above names and handling the blocks passed to them.

For example there is a “report” method, like so:

def report(name, &block)
   # save report name and &block of code for later use
end

The “&block” bit is little funky – its a way of capturing the code passed between the “do … end” block above.

To handle the next level down – the code within the report block, I defined a class with those methods and the block is “call”‘d within the context of that block.


class WriterDefInRuby < Java::com.WriterDefBase # to make things fun, the class needs to implement a Java interface :)    def where(&block)      @where_clause = block    end    def setup(&block)      self.instance_eval &block    end ... end [/code] So, the above class is used like this: [code] def report(name, &block)    writer_def = WriterDefInRuby.new    writer_def.name = name    writer_def.setup(&block) end [/code] And thats it. We had a third level down for the fields stuff - but its just more of the same, another class, eval the block in the context of that class. PS Thanks to several sites explaining DSL’s in much better detail and several example gems, like Rose and Docile

Things I love and hate about Node.js

I thought I’d try one of those link bait articles…

LOVE

  1. Threads/no threads – since everything runs in the same thread, but written as if there are many threads via callbacks, it makes for easy multi-threaded apps.  Just make sure you keep things simple, using tools like asyncjs to control multiple callbacks- you dont want a callback pyramid of doom!
  2. Lightweight/fast – its a VM based language like Java and C#, but is already very fast to start.
  3. Same language client/server – less brainfarts when switching between code for the front end and the back end
  4. Dynamic – no need to tell the compiler what type things are, it works out what you mean.
  5. Its new and cool :)

HATE

  1. Its Javascript – there seem to be lots of “bad parts” to the language.  Thus I code my Javascript via Coffee Script :)
  2. OTT braces and semi-colons –
    function() { var a=1; }

    . Probably this is hang up from seeing too much Java/C#

  3. Function based programming – and not in a Lisp like way – every other line seems to be a new function definition.
  4. Variable declarations – declarations get pulled up to the top of the block, so things might not work as you expect
  5. == versus === – see the bad parts link above :(
  6. No types – especially in function parameters. So you can get problems when you miss a parameter and so the function is doing the wrong thing with the wrong parameter…

I am still in the “honeymoon” phase – its benefits seem to outweigh the disadvantages.  Lets see what happens over the next 6-12 months…

Close scrapes with Ruby / Mechanize

A few years ago I agreed to help with a side project that needed to scrape some websites and also talk to some webservices.

Various front ends would generate the requests and my process would go through the db and process them.

The core of the engine was not a webapp – but Rails/ActiveRecord (AR) provided a good way of interacting with MySQL.

I tried a few threading strategies, but hit (seemingly) issues with accessing the db and MRI threading, although in retrospect, I think the issues were more of my own making – overlapping threads/poor design.

Moving to JRuby seemed to address some of the threading issues.

I initially used the Parallel gem, which seemed to do largely what I wanted. However I still was getting AR issues, so I switched to JRuby.  It then seemed more appropriate to use a Java based parallelisation gem – so I went for ActsAsExecutor, which is thin wrapper around the Java concurrency features.  This was used to manage a pool of threads that can handle the scraping/webservice calling

ActsAsExecutor::Executor::Factory.create 15, false # 15 max threads, schedulable-false?

To do the polling for any work to be done, the Rufus scheduler gem was used.  It was setup to check for pending records every few seconds, like so:


scheduler = Rufus::Scheduler.start_new
scheduler.every "2s", :allow_overlapping => false do
# do some work
end

This was kicked off in a Rails initializer.

One mistake I made was to not have the allow_overlapping flag – which meant that if any job was slow, the next one would be started and would try to do the same work again.

Another trick which I think helped was to wrap db accessing sections like so:


ActiveRecord::Base.connection_pool.with_connection do
# db work...
end

The scheduled task analyses each request and generates work items to be done in the thread pool.  Another mistake I made was to set the work items status to NEW and then separately, re-queried the db for NEW items to queue up for the thread pool. Only when they were picked off the queue did their status advance.  This led to a window of opportunity for a subsequent scheduled job/task analysis to re-queue the same items.  The change to address this was to not re-query the db – I am generating the items and so know them without going to the db.  Thus subsequent runs will only work on items that they generate themselves.

Each work item in the thread pool did the above trick to re-connect to the DB and then loaded their work item.

To separate the various webscraping/webservice calls out, the code for that is held in a text field in the DB.  This is then loaded dynamically as each call is required.  This is “instance_eval”‘d into the work item object – so it has access to work item details.

There are largely 2 kinds of work items – web scrapers and webservice calls.  The scraping is done via Mechanize and the webservices via Savon.

For Mechanize, the various pages/frames/forms are navigated to achieve the desired results.

For Savon, the message body is constructed and the service called.

The results are then saved back to the db.