Transform’s ElasticSearch project: Eddie Abou-Jaoude offers a developer’s view

As one of Transform’s associate developers I was interested to see the role of development featuring more prominently in this year’s Digital Maturity Index (DMI 2014). The phrase ‘it’s the combination of code and coders that makes the connection between customer, device and organisation’ certainly struck a chord, and caused me to think about how much people really understand about the world I live and breathe every day.

Rory Woodward recently published a post on Open Government and how Transform is giving something back using Github. For the uninitiated, here’s a little more background on the core elements of this project and how they work.

What is Open Source?

The Open Source concept allows you to share your work with the community so that others can use it, review it and contribute. Popular Open Source repositories include: Linux, jQuery, PHP, Ruby, Git. The advantage of this process is that a code base benefits from many thousands developing it, rather than just a few, and this continual iteration improves code, tests, documentation and reduces bugs.

To maintain some integrity to your project and a degree of control around how these contributions are exercised, you can choose to classify your project under a ‘license’. There are a variety of licenses available, many explained very well here.

Which licence to choose?

| I want it simple and permissive
| The MIT License is a permissive license that is short and to the point. It lets people do anything they want with your code as long as they provide attribution back to you and don’t hold you liable
| jQuery and Rails use the MIT License

| I’m concerned about patents
| The Apache License is a permissive license similar to the MIT License, but also provides an express grant of patent rights from contributors to users
| Apache, SVN, and NuGet use the Apache License

| I care about sharing improvements
| The GPL (V2 or V3) is a copyleft license that requires anyone who distributes your code, or a derivative work, to make the source available under the same terms. V3 is similar to V2, but further restricts use in hardware that forbids software alterations
| Linux, Git, and WordPress use the GPL

Personally I prefer to use the MIT license because of its simplicity and the fact that it provides a high degree of freedom to those who wish to use it, albeit at their own risk.

You may ask why would anyone give work and time away for free. If you’re a developer you probably use a lot of Open Source projects and libraries already. These projects or tools only become great because people contribute freely; even giving fifteen minutes will help the community as well as your CV. It’s similar to crowdsourcing, work done by one individual is not going to be as good as work reviewed and contributed to by hundreds, even thousands in some cases. Contributions can range from logging bugs, to fixing bugs, from documentation to adding new features and functionality – all this helps to improve and move a project forward.

What is Github?

Github is essentially Facebook for Techies! But instead of showing off about your weekend, it allows developers and designers to show off their code, fonts, blog posts and 3D diagrams. And like Facebook people can comment on their contributions.

Github is built on top of Git, the most popular version control system, and it compliments Git with a very useful and impressive toolset that includes visualisation and workflow. A few of the most frequently used ones are diff (before and after), fork, code review via pull request and inline commenting, web hooks to push/pull to integrate with various other external tools, issue tracker, wiki. You can read more about it here.

| Powerful collaboration, code review, and code management for open source and private projects

The main workflow of Github is to fork a project into your own area (personal or organisation), make changes and then submit a pull request. An owner (or authorised contributor) of the original project can then review the changes, make suggestions or ask for changes/updates. Once satisfied, these can be merged in and shared with the rest of the network (forks / clones etc).

Here is a visual representation by Github (use the left/right arrows to move through the stages).

Transform’s Organisation on Github

With Transform having multiple projects with various clients on the go, it was no surprise that it would start releasing some Open Source projects under its own organisation on Github for its teams and the public to use; a place to contribute back. At the time of writing there’s one public project underway with more in the pipeline.

Non Open Source (i.e. private) projects are available as well with the ability to add users with different access.

Working for a company that believes in Open Source is refreshing. Not only the ability to use Open Source technologies but also to have the potential to release work as Open Source, which in my opinion encourages better work from developers, as it could be on display to the world!

At Transform, we’ve created an Open Source repository for ElasticSearch with full examples including test data, request, response content – with nothing left out. It’s early days, but with an example data set and complex conditional query already in place, we’re on the right track. ElasticSearch is a fantastic tool, however, we’ve found the documentation has bite size information, but when piecing it together or combining conditions, it is difficult to understand.

What is ElasticSearch?

| Elasticsearch is a flexible and powerful open source, distributed, real-time search and analytics engine.

Features include:

  • real-time data
  • real-time analytics
  • distributed
  • high availability
  • full text search
  • document oriented
  • conflict management
  • RESTful api
  • build on top of apache lucene

ElasticSearch was designed for fast full text search and fast big data analytics.

In additional to the core ElasticSearch tool, the ElasticSearch team has other useful products that work well together. Currently these include:

  • Marvell to monitor your cluster
  • Logstash to help take logs and other time based event data from your application
  • Kibana is Elasticsearch’s data visualisation engine

It’s impressive how fast ElasticSearch is with large data and minimal hardware. Give it a go! It’s easy to install and communicate to ElasticSearch over a RESTful API. Unfortunately, ElasticSearch documentation and examples are a little lacking in complete examples, hence why we have created the repository with some real life examples.

Feedback and contributions on the project are obviously welcome.

So for me the concept of Open Source is no longer about the lone developer sharing personal projects. Increasingly it’s being used by large organisations, and even government bodies, for the improvement and transparency of their projects. With the global community working together, as if they were local, this can only have the benefit of increasing momentum in advancing technology.


Image courtesy of Nat W at