So I have a lot of little data fixing up scripts I have to write to maintain our large and growing elasticsearch database. Typically, I use shell stack (bash/jq/curl) to do such tasks. When things get a little bigger I switch to python. I’m a big fan of jq/curl but for anything that isn’t small, it gets nasty. I’m a big fan of python too, but I’m still a baby using it, so when things get bigger or more complicated, I’m not very efficient.
Anyway, a lot of these tasks end up doing scroll/scan or search in ES, and then feed data back in, or to message queues, or to various endpoints to do work. They are often long lived. Code them up, throw them in rundeck, and let them run for hours or days.
One frustration is that it doesn’t go fast enough in simply queueing the work, my storm jobs which are doing the processing, go way faster than the input to them, when I use these scripting options. I know I could learn how to do async in python, or try to shard things up in shell stack and use ‘parallel’, or find some other solutions. But since I already have a lot of business logic in my runtime code in Scala, it would be nice to just use that, but without the headache of builds and deployments, something faster and lighter weight, I can still just dump into rundeck. I know how to control async, threads, and concurrency in this environment, and I know I’m not limited by the toolset.
I looked into this once before but gave up at the dependency part of it. Then I discovered this blog.
Basically using a hacked up version of sbt, you can write a single file script in scala that will, download dependencies, compile, and launch, with no fus. I’ll show you how I got it hooked up, mostly followed that blog, and a few others things I found helpful. Here’s the steps I followed to get it setup on my rundeck linux box:
curl https://raw.githubusercontent.com/foundweekends/conscript/master/setup.sh | sh
Install the sbt runner:
cs sbt/sbt --branch 0.13
Put scalas in your path:
export CONSCRIPT_HOME="$HOME/.conscript" export CONSCRIPT_OPTS="-Dfile.encoding=UTF-8" export PATH=$CONSCRIPT_HOME/bin:$PATH
Create a script:
$ chmod +x script.scala $ ./script.scala hello
Ok, so now you can add in the location or your artifacts and dependencies inside the /*** comment like this:
/*** scalaVersion := "2.11.7" resolvers += Resolver.url("typesafe-ivy-repo", url("http://repo.typesafe.com/typesafe/releases"))(Resolver.ivyStylePatterns) resolvers += "Your Repo" at "http://artifactory.yourplace.com/artifactory/repo" resolvers += Resolver.mavenLocal libraryDependencies += "org.scala-sbt" % "io" % "0.13.11" libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.7" libraryDependencies += "ch.qos.logback" % "logback-core" % "1.1.7" libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21" libraryDependencies += "com.trax.platform" % "trax-elasticsearch-loader" % "1.3.43" libraryDependencies += "com.trax.platform" % "trax-platform-utils" % "1.3.7" */
When you run it, it will automatically download dependencies, compile, and run your script. The internet downloads to here: $CONSCRIPT_HOME/boot
You can also bring in a logger, control it programmatically. It starts to look a lot like the python script, at least in regards to how simple it is to configure logging in python. But with access to all your fancy Scala tools. When was the last time you could do tail recursion in a script and not worry about a stack overflow ? @tailrec to the rescue!
Scripting with Scala is not good for everything, or everyone. The compile time stinks when you have a small task, but if it’s going to run for hours or days, and run considerably faster because you can do it concurrently, the few extra seconds to compile is worth it.
Also if you have business logic wrapped up in the java ecosystem already, you may find it an easy way to unlock some stuff quickly without having to put things behind a Rest interface or message queue, or what have you.
It also might be an easy way to explore Scala for some real tasks. So if you are curious and want to dabble around with it a bit, without having to bet the farm on some new tech that nobody knows or is willing to invest in heavily, give it a go.
Here’s the body of the little log util I use for controlling logback without any config files. Something I found on stack overflow and its good enough for command line and simple scripts. Don’t get me started on how logging sucks in the Java ecosystem, many a day wasted attempting to do things that should be easy…