Sunday, 26 April 2015

Example joining 2 datasets using Apache Spark

A quick example to show how to do a 'Reduce-side-join' using Apache Spark's Java API here.

It gets verbose with Java very quickly and could be really concise with the Scala API which uses implicit Type conversions and the API is sleek as well.

In my next post i will rewrite this in Java 8 using Lambda's and the next one will use the Scala API.


Sunday, 19 April 2015

Character Count Job using Apache Spark in Java

Was playing around with the Java API for Apache Spark. This is a Character count job written using the Java API - WordCount is everywhere :) so i did a simple Character count job.

https://gist.github.com/amithn/344a648b7471988d2472

For the scala version see here https://gist.github.com/amithn/8148311c5522f3866f4b