Back pressure isn’t a silver bullet

Before there was automatic back pressure in reactive/streaming systems, we spent a lot of time trying to tune our reads and our writes so that we wouldn’t over run our sinks (databases). At first its a weird unfamiliar place, a lot of us, like me, came from the web, where we were in this constant request/response cycle and the goal was to try to go faster. Support more load, quicker response times etc. Then when you get to streaming patterns, you realize the goal ends up being go slower. In many use cases, you end up being able to read way faster than you can write, and you end up killing your stores, or spending more money. I think a few times I came pretty close to melting CPUs. Now yeah, certain stores handle fast writes better than others, but lets face it, if you have a file with a bazillion records and you are just reading it, and send it to a store with a whole bunch of threads, there is absolutely no doubt, that the read is going to be faster than the write.

So what happens here is you have to tune your inputs to slow down to whatever rate your particular storage solution can handle. This was a painful and slow process of trial and error, turning knobs here and there, and staring at dashboards and metrics. You would go from slow and logical scientific careful procedures to F it, turn that shit up and just go fast. Eventually, you would end up somewhere stable and fast enough. It took a long time, there must be a better way.

At some point this idea of back pressure came around. From akka to flink, I think its a natural idea in a lot of frameworks. The idea is that if we can detect that writes (or any downstream activity) is slow, we can slow down the reads. So in effect that should eliminate all this time spent tuning, and cursing, and staying up late. Well, I think I learned something today. That yes its true frameworks like Flink have back pressure, and they will slow down inputs, but…

What if your source is keeping a connection open? Well that means as back pressure increases the likely hood of that connection timing out increases. This is what happened to us. Our database was becoming the bottleneck because of deadlocks and not optimal writing configuration this led to increased back pressure. One of our sources is s3. Well AWS S3 clients have a timeout, where if you don’t request any new data, they will close, and that exception can kill your pipeline. Yeah, there are retries and timeout settings, and yeah you can tune those, and struggle with how to get your framework to set those (sometimes its not as easy as you think). But aren’t you right back to tuning? Like did that back pressure actually solve your problem, or did it just move it somewhere else ?

Now sure we can find a way to engineer things to not keep connections open, otherwise really solve these problems, but the fact remains you still need to tune the whole thing end to end. Any little setting, or optimization matters. These things lead to real costs, especially in the cloud. It’s far to easy, and the cloud providers are super happy to just give you more ram and more cpu. Engineers, if the people with the purse strings let them, are often happy to just spend more. In the end this back pressure thing, while I think a great idea, isn’t a silver bullet, you still have to tune your in and outs. It’s not just about making it stable or making it fast, its about cost. That’s the bottom line, as a developer you are there to create value. You get paid to turn X resource (the amount they pay you) into X*Y. You want your Y to be as high as possible. Don’t just roll over and believe back pressure is gonna self tune your stuff, there is still hard work to do. We haven’t arrived at the place where streaming systems self tune themselves, we are still far away from that. But its a fun journey, enjoy the ride!

Embedded data meets big data – sqlite in spark, are you crazy ?

Everyone of these new runtimes I go deep on, or distributed system, takes me to the edge of sanity until I punch through.  My recent experience with Spark was no different, but I finally punched through.

We have an existing ETL system that pulls views from our databases.  Out of the top 10 list here, we use 8 of them.  Most of these views are 500,000 rows or less (95%), but the top 5% which get used the most may have an many as 10,000,000 rows.   100 or more columns is normal as well.  These are wide flattened tables.  People use them to download and do offline analysis, simple introspection in a web view, but its also starting to seed our data lake.

When we explored building the first generation of this system, we looked at all the popular open source databases for speed at being able to arbitrarily filter, group, and sort wide tables up to 10 million records and 200 columns wide.  We looked at postgresql, mysql, and sqlite.  We use elasticsearch quite a bit, but ruled it out for this use case due to cost (lots of ram) and time to write, those are maybe things we could have overcome, but decided not to.  We wanted to be able to avoid creating indices because we didn’t know how users were going to look at the data.  Our API is similar to elastic’s where you can basically, filter, sort, and group on any dimension.   SQLite won the war, if you turn off transactions and journaling, you can get amazing performance for what we needed, for nearly everything even on commodity boxes you can see sub second response times across the board.

The first version used Storm to extract the data into Cassandra.  The next version was going to use Spark to extract into parquet and sqlite as well.  We’d have Spark simply prepare our sqlite files for us, so that we could drop them in S3 and have the API layer pick them up and do its thing.  This saved us cost of running and managing Cassandra, and started to prepare our data lake with parquet files, which from Spark to Presto is where you want to be.

Spark dropped the lines of code in half because it has drivers for all our data sources and we can avoid all the giant pattern matching blocks to deal with typing issues, as the data frame just does this for you.  In addition, because of clever partitioning with our extraction we were able to achieve more parallelism out of the source.   We were able to see extraction rates go from 1500-3000 rows per second to 8000-30,000 rows per second depending on the width of the table.  So that’s a huge improvement we can get 10 of millions of rows out in minutes.  Less code, fast, strategic for the future, good stuff!

Now the bad…

Everything was going swimmingly, until we started testing the larger datasets and we’d see the sqlite creation part stalling in Spark.  No errors, the job would just hang forever and have to be killed.  Sometimes they would work, but usually not.  We got stuck.  Enter the tuning and runtime learning process…

When you submit a job to yarn,  you start learning about executors and cores.  The vernacular here I found confusing.  Because an executor is a process and a core isn’t a CPU, its a task, or a thread.  So when you tell spark-submit you want –executor-cores 5 and –num-executors 4.  You are requesting 4 jvm processes, that will each run basically 5 threads.  From my familiar Storm world this translates to workers (executors) and task (cores).  There is also a driver process that is separate from the executors, more on that later…

I read everything I could about that and tried experimenting with the number of each and couldn’t move the dial on my problem.  The size of the cluster in terms of nodes or CPUs or executors or cores didn’t make any difference, the problem still occurred.

It felt like a memory problem, maybe the system was tied up in GC hell spinning.  The Spark UIs reports some of this kind of thing, but nothing glaring was showing up there or from various metrics.  But regardless, I started experiments with the –executor-memory setting.  But again, that didn’t move the ball.

So let’s talk about dataframes and storage for a minute…

When you pull data into Spark into a dataframe or RDD, you are basically pulling it into memory across your cluster with pieces of the dataframe spread out among your executors.  So your data is not all on one machine.  When you ask Spark to filter or group or otherwise transform your dataframe, Spark will spawn this work out to the various executors so they can all work on the problem on the piece of the dataset they own.

When you run Spark in cluster mode you are relying on HDFS storage, if you use EMR in AWS like we do, you can make this end up in S3.  But regardless, in a cluster environment where you are writing files, you have to have some type of shared network storage solution.  Now come back to SQLite.  SQLite drivers don’t understand HDFS, you can’t directly talk to it, you need to write to local storage.

So the situation here is that you need to bring all your data back to one node, in order to write to this local file.  No problem Spark will let you do a collect() or a toLocalIterator() that brings all the data back to the driver.  This is the piece that would stall, the work that happened after the collect().  Well, here is where my stupidity came into play.  In turns out all my fudging with the executor memory was doing nothing about the driver memory.  As it turns out the default driver memory is only 500MB.  That’s when I realized omg, I’m an idiot.  There is a another spark-submit option for the driver memory, which for whatever reason doesn’t seem to end up in examples or stack overflows on this subject (hence this blog…)  Once I set this to something large enough to handle the SQLite data, my problem went away, see full example that follows.


spark-submit
–class com.mycompany.Job
–deploy-mode cluster
–master yarn
–conf spark.yarn.submit.waitAppCompletion=false
–driver-memory 4g
–num-executors 4
–executor-memory 2g
–executor-cores 5
s3://mycompany/artifact.jar

view raw

gistfile1.txt

hosted with ❤ by GitHub

So now I’m in the happy zone, my hundreds of views are happily extracting and I can get onto the next step which is incremental pulls from source.  Then I will go from 10 million to 100 million and beyond.  I let you know how that part works out, its only in my head right now…

Scala scripting ?!

So I have a lot of little data fixing up scripts I have to write to maintain our large and growing elasticsearch database.  Typically, I use shell stack (bash/jq/curl) to do such tasks.  When things get a little bigger I switch to python.  I’m a big fan of jq/curl but for anything that isn’t small, it gets nasty.  I’m a big fan of python too, but I’m still a baby using it, so when things get bigger or more complicated, I’m not very efficient.

Anyway, a lot of these tasks end up doing scroll/scan or search in ES, and then feed data back in, or to message queues, or to various endpoints to do work.  They are often long lived.  Code them up, throw them in rundeck, and let them run for hours or days.

One frustration is that it doesn’t go fast enough in simply queueing the work, my storm jobs which are doing the processing, go way faster than the input to them, when I use these scripting options.  I know I could learn how to do async in python, or try to shard things up in shell stack and use ‘parallel’, or find some other solutions.  But since I already have a lot of business logic in my runtime code in Scala, it would be nice to just use that, but without the headache of builds and deployments, something faster and lighter weight, I can still just dump into rundeck.  I know how to control async, threads, and concurrency in this environment, and I know I’m not limited by the toolset.

I looked into this once before but gave up at the dependency part of it.  Then I discovered this blog.

Basically using a hacked up version of sbt, you can write a single file script in scala that will, download dependencies, compile, and launch, with no fus.  I’ll show you how I got it hooked up, mostly followed that blog, and a few others things I found helpful.  Here’s the steps I followed to get it setup on my rundeck linux box:

Install conscript

curl https://raw.githubusercontent.com/foundweekends/conscript/master/setup.sh | sh

Install the sbt runner:

cs sbt/sbt --branch 0.13

Put scalas in your path:

export CONSCRIPT_HOME="$HOME/.conscript"
export CONSCRIPT_OPTS="-Dfile.encoding=UTF-8"
export PATH=$CONSCRIPT_HOME/bin:$PATH

Create a script:


#!/usr/bin/env scalas
/***
scalaVersion := "2.11.7"
*/
println("hello")

$ chmod +x script.scala 
$ ./script.scala
hello

Ok, so now you can add in the location or your artifacts and dependencies inside the /*** comment like this:

/***
scalaVersion := "2.11.7"
 
resolvers += Resolver.url("typesafe-ivy-repo", url("http://repo.typesafe.com/typesafe/releases"))(Resolver.ivyStylePatterns)

resolvers += "Your Repo" at "http://artifactory.yourplace.com/artifactory/repo"

resolvers += Resolver.mavenLocal

libraryDependencies += "org.scala-sbt" % "io" % "0.13.11"

libraryDependencies += "ch.qos.logback" %  "logback-classic" % "1.1.7"

libraryDependencies += "ch.qos.logback" %  "logback-core" % "1.1.7"

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"

libraryDependencies += "com.trax.platform" % "trax-elasticsearch-loader"  % "1.3.43"

libraryDependencies += "com.trax.platform" % "trax-platform-utils" % "1.3.7"
*/

When you run it, it will automatically download dependencies, compile, and run your script.  The internet downloads to here:  $CONSCRIPT_HOME/boot

You can also bring in a logger, control it programmatically.  It starts to look a lot like the python script, at least in regards to how simple it is to configure logging in python.  But with access to all your fancy Scala tools.  When was the last time you could do tail recursion in a script and not worry about a stack overflow ?  @tailrec to the rescue!


#!/usr/bin/env scalas
/***
scalaVersion := "2.11.7"
resolvers += Resolver.url("typesafe-ivy-repo", url("http://repo.typesafe.com/typesafe/releases"))(Resolver.ivyStylePatterns)
resolvers += "Your Artifactory" at "http://yourstuff.com/artifactory/repo"
resolvers += "mandubian maven bintray" at "http://dl.bintray.com/mandubian/maven"
resolvers += Resolver.mavenLocal
libraryDependencies += "org.scala-sbt" % "io" % "0.13.11"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.7"
libraryDependencies += "ch.qos.logback" % "logback-core" % "1.1.7"
libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "com.trax.platform" % "trax-elasticsearch-loader" % "1.3.43"
libraryDependencies += "com.trax.platform" % "trax-platform-utils" % "1.3.7"
*/
import com.sksamuel.elastic4s.ElasticDsl._
import com.sksamuel.elastic4s.{ElasticClient, SearchType}
import com.trax.common.log.LogbackUtils
import com.trax.elasticsearch.EsClient
import com.trax.platform.util.scala.RestUtils
import com.typesafe.scalalogging.{Logger, LazyLogging}
import org.json4s.JsonDSL._
import org.json4s.jackson.JsonMethods._
import org.slf4j.LoggerFactory
import scala.annotation.tailrec
import scala.collection.JavaConversions._
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
LogbackUtils.setLogLevel(null, "INFO")
val logger = Logger(LoggerFactory.getLogger("log"))
@tailrec
def next(client: ElasticClient, scrollId: String, total:Int): Int = {
val resp = client.execute { search scroll scrollId keepAlive "1m" }.await
if (resp.getHits.getHits.isEmpty) total
else {
val futures = resp.getHits.getHits.map { hit =>
val msg = compact(render(("env_id" -> hit.field("Environment.EnvId").getValues.head.toString) ~ ("action" -> "index") ~ ("type" -> "fb") ~ ("id" -> hit.id)))
logger.debug(msg)
Future {
RestUtils.postRestContent("http://someserver/sendmessage/invoice_search", msg, "application/json")
}
}.toSeq
Await.result(Future.sequence(futures), 1.hour)
next(client, resp.getScrollId, total + futures.size)
}
}
val client = EsClient.getInstance("somecluster",9300,"cluster_name").client
val resp = client.execute {
search in "fps-fbs*" / "fb" size(1000) fields("FbNorm.Businessflow","Environment.EnvId") query wildcardQuery("FbNorm.Businessflow","*unknown") searchType SearchType.Scan scroll "1m"
}.await
logger.info(s"${resp.getHits.getTotalHits} hits")
val docs = next(client, resp.getScrollId, 0)
logger.info(s"fixed ${docs} docs")

Final Thoughts

Scripting with Scala is not good for everything, or everyone.  The compile time stinks when you have a small task, but if it’s going to run for hours or days, and run considerably faster because you can do it concurrently, the few extra seconds to compile is worth it.

Also if you have business logic wrapped up in the java ecosystem already, you may find it an easy way to unlock some stuff quickly without having to put things behind a Rest interface or message queue, or what have you.

It also might be an easy way to explore Scala for some real tasks.  So if you are curious and want to dabble around with it a bit, without having to bet the farm on some new tech that nobody knows or is willing to invest in heavily, give it a go.

LogbackUtils

Here’s the body of the little log util I use for controlling logback without any config files. Something I found on stack overflow and its good enough for command line and simple scripts.  Don’t get me started on how logging sucks in the Java ecosystem, many a day wasted attempting to do things that should be easy…


import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.lang.reflect.Field;
import java.lang.reflect.Method;
/**
* Contains methods to access and manipulate logback framework dynamically at run-time. Here 'dynamically' means without
* referencing the logback JAR, but using it if found in the classpath.
*
* @author Muhammed Demirbas
* @since 19 Mar 2013
*/
public final class LogbackUtils
{
public static final String LOGBACK_CLASSIC = "ch.qos.logback.classic";
public static final String LOGBACK_CLASSIC_LOGGER = "ch.qos.logback.classic.Logger";
public static final String LOGBACK_CLASSIC_LEVEL = "ch.qos.logback.classic.Level";
private static final Logger logger = LoggerFactory.getLogger(LogbackUtils.class);
private LogbackUtils()
{
// Prevent instance creation
}
/**
* Dynamically sets the logback log level for the given class to the specified level.
*
* @param loggerName Name of the logger to set its log level. If blank, root logger will be used.
* @param logLevel One of the supported log levels: TRACE, DEBUG, INFO, WARN, ERROR, FATAL,
* OFF. {@code null} value is considered as 'OFF'.
*/
public static boolean setLogLevel(String loggerName, String logLevel)
{
String logLevelUpper = (logLevel == null) ? "OFF" : logLevel.toUpperCase();
try
{
Package logbackPackage = Package.getPackage(LOGBACK_CLASSIC);
if (logbackPackage == null)
{
logger.info("Logback is not in the classpath!");
return false;
}
// Use ROOT logger if given logger name is blank.
if ((loggerName == null) || loggerName.trim().isEmpty())
{
loggerName = (String) getFieldVaulue(LOGBACK_CLASSIC_LOGGER, "ROOT_LOGGER_NAME");
}
// Obtain logger by the name
Logger loggerObtained = LoggerFactory.getLogger(loggerName);
if (loggerObtained == null)
{
// I don't know if this case occurs
logger.warn("No logger for the name: {}", loggerName);
return false;
}
Object logLevelObj = getFieldVaulue(LOGBACK_CLASSIC_LEVEL, logLevelUpper);
if (logLevelObj == null)
{
logger.warn("No such log level: {}", logLevelUpper);
return false;
}
Class<?>[] paramTypes = { logLevelObj.getClass() };
Object[] params = { logLevelObj };
Class<?> clz = Class.forName(LOGBACK_CLASSIC_LOGGER);
Method method = clz.getMethod("setLevel", paramTypes);
method.invoke(loggerObtained, params);
logger.debug("Log level set to {} for the logger '{}'", logLevelUpper, loggerName);
return true;
}
catch (Exception e)
{
logger.warn("Couldn't set log level to {} for the logger '{}'", logLevelUpper, loggerName, e);
return false;
}
}
private static Object getFieldVaulue(String fullClassName, String fieldName)
{
try
{
Class<?> clazz = Class.forName(fullClassName);
Field field = clazz.getField(fieldName);
return field.get(null);
}
catch (ClassNotFoundException | IllegalAccessException | IllegalArgumentException | NoSuchFieldException |
SecurityException ignored)
{
return null;
}
}
}

Show me the error or shutup

Options, Try, Futures, oh my!  You can go read all the various articles on how the semantics of these are similar.  Here’s one,  I highly recommend the cheat sheet as a reference, comes in handy.  But here is a real world examples to help as well.  I find myself very often needing to handle an error just to print out a log message.  So I started out with the normal try/catch as in Java.  It’s a bunch of brackets and kinda verbose.

Then once you switch to using futures, at some point you’ll end up forgetting to handle a failure, and your root cause will just be gone, you’ll waste a bunch of time, and then start putting onComplete, onFailure, or recover everywhere.  At least that’s what I did, maybe I’m an idiot, and you are smarter.

Well,  that’s just as verbose as the try/catch, even more so when you don’t want to handle one side of the equation.  What you can do, and this is the same for Try or a Future, is use “.failed.map”.  Like this:

Try (someStuffThatMightFail).failed.map(e => println(e.getMessage)).

This will invert the Try so that failed() returns you a Try[Throwable], you can then map over it to get your error.  getOrElse() and things like this don’t work because you want the error, and they won’t give it to you.  If you have no error, then the map(), won’t do anything, and you just go on about your business.

So much cleaner, here are some examples:


import scala.concurrent.Future
import scala.util.{Failure, Success, Try}
import scala.concurrent.ExecutionContext.Implicits.global
def fail = throw new RuntimeException("error now")
def failInFuture() = Future { throw new RuntimeException("error in future") }
try {
fail
} catch {
case e: Exception => println(e.getMessage)
}
failInFuture().onComplete {
case Success(result) =>
case Failure(t) => println("error reported in onComplete")
}
failInFuture().onFailure {
case e => println("error reported in onFailure")
}
failInFuture().recover {
case e => println("error reported in recover")
}
Try (fail).failed.map(e => println(e.getMessage))
failInFuture().failed.map(e => println(e.getMessage))

view raw

error.scala

hosted with ❤ by GitHub


scala> import scala.concurrent.Future
import scala.concurrent.Future
scala> import scala.util.{Failure, Success, Try}
import scala.util.{Failure, Success, Try}
scala> import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.ExecutionContext.Implicits.global
scala>
scala> def fail = throw new RuntimeException("error now")
fail: Nothing
scala>
scala> def failInFuture() = Future { throw new RuntimeException("error in future") }
failInFuture: ()scala.concurrent.Future[Nothing]
scala>
scala> try {
| fail
| } catch {
| case e: Exception => println(e.getMessage)
| }
error now
scala>
scala> failInFuture().onComplete {
| case Success(result) =>
| case Failure(t) => println("error reported in onComplete")
| }
error reported in onComplete
scala>
scala> failInFuture().onFailure {
| case e => println("error reported in onFailure")
| }
error reported in onFailure
scala>
scala> failInFuture().recover {
| case e => println("error reported in recover")
| }
res24: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise@bd1111a
error reported in recover
scala>
scala>
scala> Try (fail).failed.map(e => println(e.getMessage))
error now
res25: scala.util.Try[Unit] = Success(())
scala> failInFuture().failed.map(e => println(e.getMessage))
error in future
res26: scala.concurrent.Future[Unit] = scala.concurrent.impl.Promise$DefaultPromise@65753040

view raw

error_run.sh

hosted with ❤ by GitHub

 

Async stormy weather

 

 

We’ve been using Apache Storm for 6 months or so.  We are at the point where we’ve proved it out, built a bunch of operational and release management practices around it.  It is starting to metastasize around our organization and become something of a gravity well in terms of heavy processing pipelines.

Then I went on vacation and all hell started breaking loose.

Most of our topologies are Kafka initiated.  I started seeing in one major job, the queue “get stuck”.  It would do some work, and then nothing.  In this particular topology we are basically copying data from one data source into another, we have a fairly large message timeout, as some of these can take quite awhile.  What we were seeing was after we would restart the worker, processing would start up again, we might pop a few messages of the queue, but then after a bit we’d be “stuck” again.  No processing, but you can see in the Kafka offsets, that there are messages waiting that we aren’t picking up.

There was another problem, in one particular piece of data we were extracting the counts were coming out wrong, we were missing rows when we copied.  How could this happen this should have triggered a failure in our topology?  But it didn’t!

Well I spent a bunch of time chasing problem that I already solved, like configurations in the Spout or various params in the topology.  My spidey sense told me it has something to do with an async callback.  Because we are using Scala as our language of choice in Storm, and Storm sorta predates Futures, something about this made me uncomfortable.  So I kept digging…

Let me explain the callback in a little more detail…

We are essentially writing rows into Cassandra using the async approach.  I have a wrapper that turns the Cassandra java driver into Scala Futures that I borrowed from here: https://github.com/eigengo/activator-akka-cassandra/blob/master/src/main/scala/core/cassandra.scala.  If you are using Cassandra in Scala and relying on the Java driver natively, I highly recommend this approach, you simply import that, and your Cassandra java futures turn into Scala futures automagically, makes things much more idiomatically Scala and happy.

So from that, when all our inserts complete we end up in an onComplete block like this:


insertRows(data).onComplete {
case Failure(e) =>
collector.fail(tuple)
case Success(rs) =>
collector.ack(tuple)
}
}

view raw

gistfile1.scala

hosted with ❤ by GitHub

Here lies the problem.  It turns out that the Storm OutputCollector is not thread safe as of 0.8.   You can read about that in their forums, I found it from this article from Spotify.

This was the cause of both of my problems. Acks were not always getting back because of this, so the topology never finished, hence getting “stuck”.  I could see in the UI this was the case.  Also, this is why the failures that were causing me to loose data never materialized.

So what’s the solution ?  Don’t use async?  Nah, that would suck…

I followed Spotify’s approach of storing the completions in a LinkedBlockingQueue.  Then in my execute method I flush them.  In this way, more data (or ticks) coming into your topology will trigger the flushs once you are back on the right thread and everything works.  I just wrote a quick case class to store the data, and then a flush, like this:


case class Completion(tuple:Tuple, exOpt: Option[Throwable] = None)
class SomeBolt extends BaseRichBolt {
var completions = new LinkedBlockingQueue[Completion]()
def flushCompletions = {
if (completions.size() > 0) {
val completionArr = new util.ArrayList[Completion]()
completions.drainTo(completionArr)
logger.debug("completing " + completionArr.size() + " tuples")
completionArr.foreach { completion =>
if (completion.exOpt.isEmpty) {
collector.ack(completion.tuple)
} else {
collector.reportError(completion.exOpt.get)
collector.fail(completion.tuple)
}
}
}
}
override def execute(tuple: Tuple): Unit = {
flushCompletions
}
}

view raw

asyncbolt.scala

hosted with ❤ by GitHub

Now everything is awesome.  Hopes this help someone else.  I was about ready to throw in the towel on Storm, or update my  resume and move on… about the same time I break through typically 🙂

 

Sql slow query log

Here’s how I made my own slow query log with a Kibana dashboard.  See ya Appdynamics and Solarwinds.

First of all if you are using SQL in Scala take a look at Scalikejdbc.  I’ve been around the block with various ORM solutions and I always come back to wanting something closer to SQL, its easier to tune, and easier to communicate with DBA’s and other parts of your organization.  SQL is the interface to relational database’s, IMHO, arbitrary DSL’s invented simply for syntax sugar, or usually easy to start hard to finish.  Scalikejdbc is a great blend of SQL with efficiency tools that make binding and result set manipulations easy and idiomatic.

So the first part of the puzzle is leveraging the built in mechanism that lets you turn on sql logging.  See the docs for more help.


import scalikejdbc._
GlobalSettings.loggingSQLAndTime = LoggingSQLAndTimeSettings(
enabled = true,
singleLineMode = false,
printUnprocessedStackTrace = false,
stackTraceDepth= 15,
logLevel = 'debug,
warningEnabled = false,
warningThresholdMillis = 3000L,
warningLogLevel = 'warn
)

view raw

gistfile1.txt

hosted with ❤ by GitHub

 

Now you can create a function to be called when your sql query completes using GlobalSettings.taggedQueryCompletionListener.  You’ll notice I’m using the mapped diagnostic context, you can read more about that here.  I’ll explain how I get things into that in a bit, but it’s basically other contextual info that I want to log, like which database, driver, etc.


import org.slf4j.MDC
import scalikejdbc._
GlobalSettings.taggedQueryCompletionListener = (sql: String, params: Seq[Any], millis: Long, tags: Seq[String]) => {
if (millis > warningThresholdMillis) {
MDC.put("sql", sql)
MDC.put("params", params.mkString("[", ",", "]"))
MDC.put("millis", millis.toString)
MDC.put("error", null)
setExtraProps(tags)
logger.warn(s"completion")
}
}
def setExtraProps(tags: Seq[String]) {
MDC.put("app", appName)
if (tags.size > 0)
MDC.put("server", tags(0))
else
MDC.put("server", null)
if (tags.size > 1)
MDC.put("database", tags(1))
else
MDC.put("database", null)
if (tags.size > 2)
MDC.put("url", tags(2))
else
MDC.put("url", null)
if (tags.size > 3)
MDC.put("user", tags(3))
else
MDC.put("user", null)
if (tags.size > 4)
MDC.put("driver", tags(4))
else
MDC.put("driver", null)
}

view raw

gistfile1.txt

hosted with ❤ by GitHub

 

Ok next up.  This is how I shoved the contextual info into the tags so that I could pull it out in the log.  Most of my queries look something like this:


NamedDB(dbName) readOnly {
implicit session => {
sql"""select column1 from table1""".map(rs => rs.string(1)).list.apply()
}
}

view raw

gistfile1.txt

hosted with ❤ by GitHub

I have a number of different databases I connect to, I use NamedDB to look them up and get a session.  I noticed that the DBSession object in Scalikejdbc has tags, you can also put them on the queries, but I didn’t want to have to do that to all my existing code, and I’m pretty sure I’d forget to do it on the exact query that was causing all my pain, needed a better way.  So I created a new NamedDB case class that would do it for me:


case class TaggedDB(name: Any)(implicit context: ConnectionPoolContext = NoConnectionPoolContext) extends DBConnection with LazyLogging {
private[this] def connectionPool(): ConnectionPool = Option(context match {
case NoConnectionPoolContext => ConnectionPool(name)
case _: MultipleConnectionPoolContext => context.get(name)
case _ => throw new IllegalStateException(ErrorMessage.UNKNOWN_CONNECTION_POOL_CONTEXT)
}) getOrElse {
throw new IllegalStateException(ErrorMessage.CONNECTION_POOL_IS_NOT_YET_INITIALIZED)
}
override def connectionAttributes: DBConnectionAttributes = {
connectionPool().connectionAttributes
}
private lazy val db: DB = {
val cp = connectionPool()
DB(cp.borrow(), connectionAttributes)
}
def toDB(): DB = db
def conn: Connection = db.conn
def tags(name:Any, conn:Connection):Seq[String] = {
if (!name.isInstanceOf[String]) return Seq.empty[String]
val nameStr = name.asInstanceOf[String]
val pieces = nameStr.split("/")
val (server, dbName) = if (pieces.size > 1) (pieces(0), pieces(1)) else (nameStr, "???")
Seq(
server,
dbName,
conn.getMetaData.getURL,
conn.getMetaData.getUserName,
conn.getMetaData.getDriverName)
}
override def readOnlySession(): DBSession = {
val session = DBSession(conn, isReadOnly = true, connectionAttributes = connectionAttributes)
session.tags(tags(name, session.connection):_*)
}
override def autoCommitSession(): DBSession = {
val session = DBSession(conn, isReadOnly = true, connectionAttributes = connectionAttributes)
session.tags(tags(name, session.connection):_*)
}
}

view raw

gistfile1.txt

hosted with ❤ by GitHub

 

 

I still had to update my existing code, but simple search and replace of NamedDB to TaggedDB, done!

I had to do some hacking since Scala doesn’t like you extending case classes, and it’s not fully implemented, but you get the idea:  you can put tags into the session.  My dbName is like “server/databaseName” so I split that out and then grab some other stuff from the connection that I’m interested in logging.  You could do more here if you need it.

Ok so now thats done.  Let’s talk about how I got my logs to go into a special index in Elasticsearch.  I decided to bypass logstash once I found this logback Elasticsearch appender.  Basically you can just wire that into your logback config, and you are done.  It seems performant, it makes calls in a future, etc, etc.

The trick to the configuration is that logger at the top of this blog, that’s the one you want to connect up.  In this way all your existing logs go where they go, but send this slow query log stuff to Elasticsearch.


<configuration>
<appender name="ELASTIC" class="com.internetitem.logback.elasticsearch.ElasticsearchAppender">
<url>http://elasticsearch.yoursite.com/_bulk</url&gt;
<index>platform-logs-%date{yyyy-MM-dd}</index>
<type>logback</type>
<loggerName>es-logger</loggerName> <!– optional –>
<errorLoggerName>es-error-logger</errorLoggerName> <!– optional –>
<connectTimeout>30000</connectTimeout> <!– optional (in ms, default 30000) –>
<errorsToStdErr>true</errorsToStdErr> <!– optional (default false) –>
<includeCallerData>false</includeCallerData> <!– optional (default false) –>
<logsToStdErr>false</logsToStdErr> <!– optional (default false) –>
<maxQueueSize>104857600</maxQueueSize> <!– optional (default 104857600) –>
<maxRetries>3</maxRetries> <!– optional (default 3) –>
<readTimeout>30000</readTimeout> <!– optional (in ms, default 30000) –>
<sleepTime>250</sleepTime> <!– optional (in ms, default 250) –>
<properties>
<property>
<name>driver</name>
<value>%X{driver}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>user</name>
<value>%X{user}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>url</name>
<value>%X{url}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>app</name>
<value>%X{app}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>sql</name>
<value>%X{sql}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>params</name>
<value>%X{params}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>error</name>
<value>%X{error}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>server</name>
<value>%X{server}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>database</name>
<value>%X{database}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>millis</name>
<value>%X{millis}</value>
<allowEmpty>true</allowEmpty>
</property>
<property>
<name>host</name>
<value>${HOSTNAME}</value>
<allowEmpty>false</allowEmpty>
</property>
<property>
<name>severity</name>
<value>%level</value>
</property>
<property>
<name>thread</name>
<value>%thread</value>
</property>
<property>
<name>stacktrace</name>
<value>%ex</value>
</property>
<property>
<name>logger</name>
<value>%logger</value>
</property>
</properties>
</appender>
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<!– encoders are assigned the type
ch.qos.logback.classic.encoder.PatternLayoutEncoder by default –>
<encoder>
<pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger – %msg%n</pattern>
</encoder>
</appender>
<root level="info">
<appender-ref ref="STDOUT"/>
</root>
<logger name="es-logger" level="error"/>
<logger name="com.platform.sql" level="error">
<appender-ref ref="STDOUT" />
</logger>
<logger name="com.platform.sql" level="warn">
<appender-ref ref="ELASTIC" />
</logger>
</configuration>

view raw

gistfile1.txt

hosted with ❤ by GitHub

 

Every property that you put in as a “tag” you need to wire up in this logback config to get it to output into the ES json.  This seems a bit verbose and maybe now I’m seeing why the JVM has 40k loggers, people just simple get these APIs wrong, but in any event there is a way.

That’s it.  Now you’ll end up the a new index in ES with your slow query data that includes the sql, the parameters, how long the query took, and all the contextual info you need to root cause your issues.  Wire it up in a Kibana dashboard and make pretty pictures, hook it up to alerts, etc, etc.  And it didn’t cost you a million dollars for some root detection tool.

I’d show you some data but then I’d have to kill you, so you are going to just use your imagination.  If you’ve read the blog this far, I know you are a smart person and have a bit of an imagination.

Latest Update 1-14-2016

Update.  I found the reliance on Logback problematic using this in another places that rely on a different logging implementations, like Apache Storm needs Log4j2.  So rather than get in the business of supporting every single Java logging impl (why are there so many!), I decided to just take the writing to ES into my own hands.

The ES Java API is good for somethings, but in this case I didn’t want to bring in a bunch of dependencies.  So I borrowed this guys thin http client to Elasticsearch for Scala, https://github.com/gphat/wabisabi/blob/master/src/main/scala/wabisabi/Client.scala, and then queued up stuff and flush in batches (in a Future of course).  This gets hammered in some pretty intense workflows and works great (in the millions of logs per day).  That assumes you’ve done the work to make your ES cluster perform well, that can be a real pain depending on what you are doing, story for another time…


val slowQueryESclient = new Client("http://localhost:80&quot;)
var queue = new LinkedBlockingQueue[Map[String, Any]]()
def getExtraProps(tags: Seq[String]): Map[String, Any] = {
var extraProps = Map[String, Any]()
extraProps = extraProps + ("app" -> appName)
if (tags.size > 0)
extraProps = extraProps + ("server" -> tags(0))
if (tags.size > 1)
extraProps = extraProps + ("database" -> tags(1))
if (tags.size > 2)
extraProps = extraProps + ("url" -> tags(2))
if (tags.size > 3)
extraProps = extraProps + ("user" -> tags(3))
if (tags.size > 4)
extraProps = extraProps + ("driver" -> tags(4))
extraProps
}
GlobalSettings.taggedQueryCompletionListener = (sql: String, params: Seq[Any], millis: Long, tags: Seq[String]) => {
if (millis > warningThresholdMillis) {
val data = Map(
"stmt" -> sql,
"message" -> "completion",
"timestamp" -> DateTimeUtils.currentIsoString,
"params" -> params.mkString("[", ",", "]"),
"millis" -> millis)
slowQueryLog(data ++ getExtraProps(tags))
}
}
GlobalSettings.taggedQueryFailureListener = (sql: String, params: Seq[Any], e: Throwable, tags: Seq[String]) => {
val data = Map(
"stmt" -> sql,
"message" -> "failure",
"timestamp" -> DateTimeUtils.currentIsoString,
"params" -> params.mkString("[", ",", "]"),
"error" -> e.getMessage)
slowQueryLog(data ++ getExtraProps(tags))
}
def flushSlowQueryLog = {
Future {
implicit val formats = org.json4s.DefaultFormats
if (queue.size > 0) {
val dtf = DateTimeFormat.forPattern("yyyy-MM")
val indexName = "platform-logs-" + dtf.print(DateTimeUtils.nowUtc)
val reqs = new util.ArrayList[Map[String, Any]]()
queue.drainTo(reqs)
val payload = new StringBuffer()
reqs.foreach { data =>
payload.append( s"""{ "index" : { "_index":"$indexName", "_type":"$SLOW_QUERY_DOCTYPE"}}""" + "\n")
payload.append(Serialization.write(data) + "\n")
}
payload.append("\n")
logger.debug("flushing slow query log to elasticsearch:\n" + payload.toString)
slowQueryESclient.bulk(Option(indexName), Option(SLOW_QUERY_DOCTYPE), payload.toString).onComplete {
case Failure(e) => logger.error(e.getMessage)
case Success(r) => logger.debug("ES response code:" + r.getStatusCode.toString)
}
}
}
}
def slowQueryLog(data: Map[String, Any]) = {
queue.put(data)
if (queue.size() >= SLOW_QUERY_BATCH_SIZE) flushSlowQueryLog
}

view raw

stuff.scala

hosted with ❤ by GitHub

 

Typing Utopia in Elasticsearch

So if you are new to Elasticsearch you may naively think it’s schemaless.  You’ll start jamming stuff in all willy nilly and then eventually either at index time or during query time you’ll start run into some nasties:  NumberFormatException, MapperParsingException, or SearchParseException.

You’ll trace these to mapping issues, which sometimes you can fix, if you reindex all your gops of data, yuck!   If you don’t know the types of problems I’m talking about read this, that guy goes into some good examples that are representative of the types of things we’ve run into as well.  The skinny is that Elasticsearch isn’t schemaless at all, quite the opposite.  It tries to infer your types for you implicitly, which is convenient, but when it gets it wrong, you’ll find that the only way to fix it is to be explicit, which doesn’t sound like schemaless at all does it?

A great example of when this can go wrong is with a float or double.  If the first value you send in a doc  without a mapping is a zero, Elasticsearch will type that as an Int.  Well, then a value of 1.2 comes along, and guess what, the type for that field was selected as an Int, so you just lost precision as its rounded down to a 1 in this next doc.

We tried something similar to the Orchestrate solution, but what we found was that by moving everything into arrays you had to compromise on the aggregations you are able to do.  There are certain types of things that you can’t do with arrays in Elasticsearch.  For our use case this was a bit of a show stopper.

So here is what we did. It’s a combination of multifieldsdynamic mappings, templates, and ignore malformed.

You can use multifields to control ways that you might index a single field differently.  Commonly you’ll see this to store the raw non_analyzed data alongside the analyzed data.


{
"tweet": {
"type": "string",
"analyzer": "english",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}

view raw

test.json

hosted with ❤ by GitHub

So with a mapping like that you can refer to the field as “tweet” when you want to search using the analyzer, or use “tweet.raw” for aggregations, or sorting, or other times when you don’t want to analyze the field.

Dynamic mappings are ways you can apply mapping rules to fields that haven’t been mapped, based on name or type or other rules.  In a sense it allows you to configure how the implicit type determination happens.

Templates are just a convenient place to store your mapping rules so that they might auto-apply to new indices.  If you are creating a lot of indices this is a huge help, it will help simplify your client code bases as they won’t have to create or alter the mappings for indices anymore.

The ignore malformed is a way to tell Elasticsearch to swallow any errors if the mapping don’t work, instead of blowing up.  This one was the real key to our solution, because it meant you can try to cast everything into every type in a multifield at index time, if it doesn’t work it won’t be there, but nothing blows up.

So putting that altogether you get something like this:


{
"template1":{
"order":0,
"template":"my_indices_*",
"settings":{
},
"mappings":{
"_default_":{
"dynamic_templates":[
{
"all_strings":{
"mapping":{
"index":"analyzed",
"type":"string",
"fields":{
"raw":{
"index":"not_analyzed",
"type":"string"
},
"date":{
"index":"not_analyzed",
"ignore_malformed":true,
"format":"yyyy-MM-dd HH:mm:ss||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd||yyyy-MM-dd||yyyy-MM-dd'T'HH:mm:ssZ",
"type":"date"
},
"double":{
"index":"not_analyzed",
"ignore_malformed":true,
"type":"double"
},
"long":{
"index":"not_analyzed",
"ignore_malformed":true,
"type":"long"
}
}
},
"match_mapping_type":"string",
"match":"*"
}
}
]
}
},
"aliases":{
}
}
}

view raw

gistfile1.txt

hosted with ❤ by GitHub

So what this means is that you can essentially autocast any field to its raw value, as a long, double, or date (add other types if you need them) in any query, filter, or aggregation.

tweet
tweet.long
tweet.double
tweet.date
tweet.raw

Why this is useful?  Well aside from dealing with all the various types of mapping errors you will run into on gops of data, it’s really helpful in analysis.

One of our use cases for Elasticsearch is doing early adhoc data analysis, to figure out what we have before we know what we have.  We don’t know what types we might be getting, and in fact we are often looking for patterns of reliability or data quality in the data, so being able to quickly determine when and when we can’t cast into specific types is very informative.   Using aggregations on these typed multifields allows us to do that.  This one mapping will do this for any deeply nested docs and still allows you to run complicated aggs and filters on your nested data.

Reindexing a field in Elasticsearch with curl and jq

So now that I’ve been converted over to functional programming thanks to Scala.  I’ve had the profound realization (profound to me, probably obvious to most) that Unix has been doing functional stuff with piping long before it was cool.

I’ve been working a lot with Elasticsearch over the past few years, and curl is a fantastic tool for interacting with Rest endpoints.  But the problem is most endpoint these days use JSON.  So in the past if I had to manipulate the JSON in anyway, I’d run to real programming language and use a client API.  It turns out there’s this great unix tool called jq that has virtually everything you need to manipulate JSON, and make it easy to pipe it forward to other endpoints or whatever you are doing.  It’s functional in nature and really concise.  It has map, reduce, and ton of other built in functions, a compact syntax for filtering JSON and selecting elements.  It’s easy to work with arrays and objects and transform them, just about as easy as it is Javascript, maybe even easier in some cases.

The first thing I used it for was adding a timestamp to the JSON returned from one Rest call, so I could feed it into Elasticsearch:


curl -s http://localhost:9999/admin/offsets/fps_event/search-centralconfig_prod | \
jq –arg timestamp `date -u –iso-8601=seconds` '.timestamp = $timestamp' | \
curl -s -X PUT -H "Content-Type: application/json" -d @- http://localhost:9200/kafka-offset-log/offset/`date +"%s"`

view raw

gistfile1.txt

hosted with ❤ by GitHub

What’s going on here?

I wrote an endpoint that return back some information from Kafka, but I wanted this data in Elasticsearch so I can make some cool graphs in Kibana.  But there wasn’t a timestamp in my original design, and I really need that in Elasticsearch to do my time based reports.

You can see that you can feed it name/value pairs, via –arg, so I used the unix data cmd to format the date I needed.  Then it’s a matter of just ‘.timestamp = $timestamp’ to add a new field.  The dot essentially is a reference to the root of the JSON object. Did that just reinvent 1/2 of what logstash in one line ?  Don’t tell them they might get mad.

The next day feeling pretty good about jq’s coolness, I ran into another problem where I needed to reindex some data in certain documents in Elasticsearch.  Once you have gobs of data in Elasticsearch you need to be smart about updates and reindexing as things can take a long time.  In this case we needed to modify an existing field, and add a new multifield to it in order to improve some searches we were doing.  So first I used curl to close the index, adjust the settings on the index to add my new analyzer, then I reopen the index and add the new mapping.  That’s basic stuff, all easily done with curl or whatever client API you like.

But now I want to update the indices that had the problem, but only update this one field.  Our docs are pretty big, we don’t want to waste time reading and writing the whole thing.

Here’s the whole thing before I break it down:


updates="1"
while [ $updates -gt 0 ]
do
updates=`curl -s http://localhost:9200/fps-fbs_c94c36b8-e425-4bb8-b291-4b34f96d74ca*/_search -d '{
"size" : 500,
"_source" : ["FbKey"],
"query" : {
"filtered" :{
"query" : {
"multi_match" : {
"fields" : ["FbKey"],
"query" : "0",
"type" : "phrase_prefix"
}
},
"filter" : {
"missing" : {
"field" : "FbKey.no_pad"
}
}
}
}
}' | \
jq -c '.hits.hits[] | { "update" : {"_index" : ._index, "_id" :._id, "_type" : ._type}}, { "doc" : {"FbKey": ._source.FbKey}}' | \
curl -s -XPOST –data-binary @- http://localhost:9200/_bulk | \
jq '.items | length'`
echo "updated $updates records"
done

view raw

gistfile1.sh

hosted with ❤ by GitHub

So first we have the query to find all the affected docs by looking for docs that were missing the new field I added and have the unique data issue (a bunch of padded zeros) that we want to index better.  If you don’t know what Elasticsearch results look like this will give you an idea:


{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 354173,
"max_score": 1,
"hits": [
{
"_index": "kafka-log",
"_type": "kafka-event",
"_id": "d7d19ddc418415d5b59efeb857c4339c3980b171870435abec443772c85921ac",
"_score": 1,
"_source": {
}
}
]
}
}

view raw

gistfile1.json

hosted with ❤ by GitHub

Then jq comes in.

jq -c .hits.hits[] | { “update” : {“_index” : ._index, “_id” :._id, “_type” : ._type}}, { “doc” : {“FbKey”: ._source.FbKey}}

That one line builds up the update bulk update request, in this case 500 requests at a time.  So lets look at what is going on here.  We grab the results from the search request, as an array

hits.hits[]

Then we pipe that (all in jq) to an expression that builds up the bulk request.  The expressions like “._index” are actually selecting bits of data out of each hits.hits[] object as we pipe through it.

{ “doc” : {“FbKey”: ._source.FbKey}

That part is where we update the affected field, so that Elasticsearch will reindex with our new analyzer and mappings configured.

We then feed all this jq business to the next curl cmd to send it into Elasticsearch as a bulk request.  At the end “.items | length” looks at the result from the bulk cmd and tells us how many records we modified.

So we just keep running this until we run out of docs to update, meaning we’ve fixed everything.

Pretty sweet way to manage your data in elasticsearch with just curl and jq!  Now go get all Unix’y with your next Rest/JSON administrative task.

Maven Release Plugin Horrors with git

If you have spent any time trying to use the Maven release plugin, you probably hate it.  It wants to run your build twice, typically you have to wait 20 minutes to hours depending on the size of your project to figure out that it barfed and then try something else, barf again.  At some point you either dig in really hard, or start looking for other solutions.

As is just about typical with every major block I have, just when I’m about to kill myself, my dog, or anyone that comes near me, I figure things out.

So my story is this, we have the Maven release plugin working great locally, but doing releases from laptops is not ideal, and of course on the weekend I needed to do a release and the VPN was painful to run the test suite over.  I decided it was about time to get this thing working in jenkins, how hard could it be, throw it up there, wire in the maven goals, bada bing right ?

Not so much.

Ran the release goals, everything is awesome, then it goes to push the thing back into git:


[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.2:prepare (default-cli)
on project trax-platform: An error is occurred in the checkin process: Exception while executing SCM command.
Detecting the current branch failed: fatal: ref HEAD is not a symbolic ref -> [Help 1]

view raw

gistfile1.txt

hosted with ❤ by GitHub

I do some googling, find out that jenkins checks out git code into a detached head, so you need to deal with that.  After a few failed attempts doing things that seemed much more complicated then they needed to be I landed on this:

http://stackoverflow.com/questions/11511390/jenkins-git-plugin-detached-head

Basically you need to just pick the “check out to specific local branch” option and set your branch.  Grab a beer, wait another 25 minutes and then bamn!:


[ERROR] Failed to execute goal org.apache.maven.plugins:maven-release-plugin:2.5.2:prepare (default-cli) on project trax-platform: Unable to commit files
[ERROR] Provider message:
[ERROR] The git-push command failed.
[ERROR] Command output:
[ERROR] fatal: could not read Username for 'https://github.com&#39;: No such device or address
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.

view raw

gistfile1.txt

hosted with ❤ by GitHub

Well this one got got me stuck forever.  Convinced it was some credential problem in jenkins trying all sorts of things messing around with the git plugin, credentials, you name it nothing helping.  Started pondering on “this is why people hate Java”…

Then finally I was sitting there staring at that url,

could not read Username for 'https://github.com'

Thinking about where have I seen that before somewhere else.  Oh yeah, its in the maven pom.xml:


<scm>
<developerConnection>scm:git:https://github.com/TraxTechnologies/PlatformComponents/</developerConnection&gt;
<tag>HEAD</tag>
</scm>

view raw

gistfile1.txt

hosted with ❤ by GitHub

I changed the git url to this:


<scm>
<developerConnection>scm:git:git@github.com:TraxTechnologies/PlatformComponents.git</developerConnection>
<tag>HEAD</tag>
</scm>

view raw

gistfile1.txt

hosted with ❤ by GitHub

And hell yeah it finally worked.  It makes sense if you think about how git is handling security that the http protocol isn’t going to work here since since we are detached from the original git pull of the code.  Anyway, it was much pain and suffering and I couldn’t believe stackoverflow and google let me down, it took way too much time to figure this out.  That is why I’m blogging about it, so some other poor sorry soul won’t have to kill their dog over it.

As a side note, I ran into this while looking for a solution:

http://axelfontaine.com/blog/final-nail.html

This guys approach seems pretty interesting.  It eliminates the two builds and seems really simple and straightforward to implement.  I was just about to try it out, when I got the past this.  But for sure when I find some time, or next time this poops out, I might be giving that a go.

For the record, I have other builds using the SBT release plugin, and I went really deep on that too, I have battle scars from that one too, a tail for another day…

What is old is new ?

the_wheelHaving dabbled deeply in the Javascript world for a few weeks, and looking at some trends, it seems like either a whole lot of Java devs made their way into JS world and are trying some of the same patterns again, or certain patterns have a way of self emerging.  For example, React as a component framework is reminiscent of JSF only client side, and we hope sucks less.  But the basic idea and promise, sounds pretty familiar.  Or for example, things like immutable-js looks like some ideas where based on the Java Collection API.  Then we have FOAM and TypeScript bringing strong typing back.  FOAM is interesting getting all meta on stuff.  We’ve seen that pattern before except we used to code gen from the tables, or use some other home grown declarative approach.

This post isn’t a dig on the JavaScript landscape not at all, the quality of and velocity of innovation is pretty remarkable, especially for a core language that really doesn’t do that much by itself, and was invented in 10 days. I’ll write another post on my reflections on this subject, now back to the old and the new…

I remember when I was a young lad pursing my second undergraduate degree (don’t presume I finished the first, because I didn’t), I had to write a paper to get accepted into the computer science program.  I remember reflecting on how math doesn’t change, and even though you are teaching me antiquated methods, I recognize that the patterns are transferable.  Apparently, me telling them that their tech was not up to date didn’t piss them off enough to exclude me, or maybe I was spot on with that analysis.

Now after about 18 years in the industry, I think that reflection was spot on.  I’ve been watching client/server become request/response, and now single page apps look a lot more client/server.  It was do all rendering on the server and now its do all rendering on the client.  On the backend, we had don’t do any processing in the data store (kill those sprocs!), but then map/reduce comes around and says, wait do the processing close the the data.  Sorry man, we were wrong, that was a good idea after all, we just need to tweak it a bit.  Doesn’t docker remind you of the JVM conceptually, it’s something like a virtual machine that sits on another machine, but doesn’t require the whole OS, in other words I can run several dockers on one box, just like I can run several JVM processes, albeit JVM will run out of memory much sooner!.  Tech trends come and go, and a lot of them sound the same as before, sometimes they improve on the past, sometimes they make the same mistakes as the past.  In general, it always back to square one:

  • Pick the right tool for the job.
  • Don’t add unnecessary layers of abstraction where they aren’t necessary, or aren’t bringing real value.
  • Don’t trust that whatever X framework is promising without some tangible demonstration.  The academic promise, if its nothing more than that, can not be trusted, and even when it works, has limits.
  • Whatever fail safe pure layer you create, some asshole will find a way to leak it, trust me, I’m often that asshole.
  • Beware of easy to start hard to finish.  The industry is all about time to market, time to market, bring the value.  But remember 80% of the time and effort is spent maintaining and supporting the software.  Maintenance and sustainability are crucial.  Regression test are great, but the compiler gives you stuff for free.

So next time you want to poke fun at the COBOL, ok now its the legacy Java, because its 10 or 20 years old.  Think to yourself, will that transpiler that was just invented yesterday be around in 2 years even?  Software that keeps adding value over time is hard.  If you work somewhere that has some 5, 10, 20 year old code, instead of cursing at that shit, maybe stop and realize that its amazing that stinky crap still works, and its still providing value (and maybe your paycheck!), do you think your shitty ass code will be around that long?

I think the microservice trend is fine and I can see that value.  I like that it forces the division between layers, and decomposes problems, its largely about risk management, and quick deployments.  But on the other hand it’s also a cop out to believe that software can’t have any longevity.  Maybe its a dying art, maybe I’m just an old fogey, maybe the crap created today really won’t be around in 10 years, none of it and I should just accept that.  But seeing as how the patterns keep coming back and fading away just like fashion, I’m thinking a sturdy pairs of jeans will do just fine.  They might take awhile to put on, but once they are on, I can do anything. And if I can have quick time to market and maintainability, I’m picking that one… Scalalala