Amith Nambiar's blog: 2015

Sunday, 13 December 2015

Converting a Java Map to Scala Map using JavaConverters

import java.util.HashMap;
import scala.collection.JavaConverters._

object JavaMapToScalaMap extends App {
val javaMap = new HashMap[Integer, String]()
javaMap.put(1, "ABC")
javaMap.put(2, "DEF")
javaMap.put(3, "GHI")

println(javaMap.get(1))

val scalaMap = javaMap.asScala
// foreach and other collections like functions available
scalaMap.foreach( x => println("Key = " + x._1 + " Value = " + x._2))

// Key = 1 Value = ABC
//Key = 2 Value = DEF
//Key = 3 Value = GHI
}

Sunday, 15 November 2015

Not getting PiarRDD Functions while using Apache Spark with Scala?

Quick note (to myself and you) that if you do not import
import org.apache.spark.SparkContext._

You will not get the functions which work on PairRDD's in Apache Spark while using Scala.

val domainsUserIds = records1.map(x => (x.getUserId.toString, x.getDomain.toString))

domainsUserIds.reduceByKey - Will not be available unless you explicitly do:

import org.apache.spark.SparkContext._

domainsUserIds.reduceByKey (now available)

Wednesday, 14 October 2015

Do not ignore your .gitignore

A quick way to find out what is ignored?

$ git check-ignore -v *

The story:

I had a source (src) directory called src/main/orchestra/weekly/build i was working with.
Pushed my changes from work and pulled it from home, i could not see all the work i had done in the src/main/orchestrator/weekly/build directory because i had this entry in my .gitignore

$ vim .gitignore

**/.gradle
**/build/
.idea/
**/.iml
**/*.pyc

The second entry was there to ignore the ./build directory which has all the jars, classes and the app packaged together. This directory is normally ignored in Java projects. This can be very confusing.

All the tests had passed, because the files still exist on disk, but will not be pushed. I did ignore the build failure on the CI (Continous Integration) box because there were some configuration changes needed and i did not have access to the Build plan and the admin was away sick.

You see this wasn't entirely my mistake ;)

Moral of the story is to run:
$ git check-ignore -v *

.gitignore:2:build/ build ---> Result of running the command

To check what is actually ignored by git. If i had run it then i would seen that:

src/main/orchestrator/weekly/build was ignored because of the rule - **/build/ in my .gitignore.

Solution:
Explicitly ignore the root level build directory in your .gitignore with the entry:

build/

Hope that helps someone somewhere.

Wednesday, 15 July 2015

Character Count Job using Apache Spark in Scala

As mentioned in my previous post on Apache Spark (WordCount job Java version here) I wanted to show the same example using Scala. Impressed with how concise and expressive Scala is when compared to Java.

Gist is https://gist.github.com/amithn/8148311c5522f3866f4b

Thursday, 21 May 2015

IntelliJ Creating a desktop Icon on Ubuntu

IntelliJ does not create any desktop entry by defualt on Ubuntu, in fact there is no install process at all. So, i use to run idea.sh from the command line. This could become a problem when you close terminals.

But, here is a way of adding a Desktop entry by clicking on Tools->Create Desktop Entry which is a really nice feature.

Once that is done you can now use it from the desktop plus add it to the Launcher on Ubuntu.

Sunday, 26 April 2015

Example joining 2 datasets using Apache Spark

A quick example to show how to do a 'Reduce-side-join' using Apache Spark's Java API here.

It gets verbose with Java very quickly and could be really concise with the Scala API which uses implicit Type conversions and the API is sleek as well.

In my next post i will rewrite this in Java 8 using Lambda's and the next one will use the Scala API.

Sunday, 19 April 2015

Character Count Job using Apache Spark in Java

Was playing around with the Java API for Apache Spark. This is a Character count job written using the Java API - WordCount is everywhere :) so i did a simple Character count job.

https://gist.github.com/amithn/344a648b7471988d2472

For the scala version see here https://gist.github.com/amithn/8148311c5522f3866f4b

Thursday, 1 January 2015

Leader election in a Cluster using Apache Zookeeper + Curator

Distributed systems often have the need for electing a leader among a cluster of nodes. Apache Zookeper provides an elegant solution to this difficult problem.

We have a usecase with a similar problem and I tried Apache Zookeper's Leader election. I have used Apache Curator which is a framework which eases the interaction with Zookeeper.

I did a quick prototype and pushed it to Github. https://github.com/amithn/zookeeper-leader-election

Let me know if you have any questions.