ctheu |

Why it’s important to log using slf4j ?

September 7th, 2016 | code quality, hadoop, java, logging, scala | ctheu

You are a Java or Scala programmer. You are logging stuff with different levels of severity. And you probably already used slf4j even without noticing.

This post is global overview of its ecosystem, why it exists and how does it work. It’s not because you’re using something everyday that you know the details right?

Why does slf4j even exist?

Why do we need something complicated like a logging framework to do something simple as put a message on stdout? Because not everybody wants to use only stdout, and because of dependencies that have their own logging logic too.

slf4j needs love

slf4j is an API that exposes logging methods (logger.info, logger.error and so on). It’s just a facade, an abstraction, an interface. By itself, it can’t log anything. It needs an implementation, a binding, something that truly logs the message somewhere. slf4j is just the entry point, it needs an exit.

slf4j breathes in logs

But it can also serve as a exit for other logging systems. This is thanks to the logging adapters/bridges, that redirect others logging frameworks to slf4j. Hence, you can make all your application logs to go through the same pipe even if the origin logging system is different.

slf4j is magic

The magic in that ? You can do all that and update the implementation without altering the existing code.

We are going to see several logging implementations slf4j can be bound to.

I’m going to use Scala code because it’s more concise, but that’s exactly the same in Java.

Simple logging using JUL

JUL stands for java.util.logging. This is a package that exists since the JDK1.4 (JSR 47). It’s quite simple to use and does the job:

val l = java.util.logging.Logger.getLogger("My JUL")
l.info("coucou")

Output:

Aug 18, 2016 11:41:00 PM App$ delayedEndpoint$App$1
INFO: coucou

App is my class, delayedEndpoint is the method.

It’s configurable through its API:

// we create a logger that accept ALL levels
val l = java.util.logging.Logger.getLogger("My JUL")
l.setLevel(Level.ALL)
// we output ALL the logs to the console
val h = new ConsoleHandler
h.setLevel(Level.ALL)

// and to a file but only greather or equal to WARNING
val f = new FileHandler("warn.log", true)
f.setLevel(Level.WARNING)
f.setFormatter(new SimpleFormatter)

l.addHandler(h)
l.addHandler(f)

// log stuff
l.entering(classOf[App].toString, "myMethod")
l.info("hello there")
l.severe("badaboom")
l.exiting(classOf[App].toString, "myMethod")

That can output something like :

sept. 07, 2016 11:16:53 PM interface scala.App myMethod
FINER: ENTRY
sept. 07, 2016 11:16:53 PM com.App$ myMethod
INFO: hello there
sept. 07, 2016 11:16:53 PM com.App$ myMethod
INFO: hello there
sept. 07, 2016 11:16:53 PM com.App$ myMethod
SEVERE: badaboom
sept. 07, 2016 11:16:53 PM com.App$ myMethod
SEVERE: badaboom
sept. 07, 2016 11:16:53 PM interface scala.App myMethod
FINER: RETURN

The default format is horrible but we can see our logs. You’ll notice we have the INFO and SEVERE twice but not the FINER. It’s because, by default, there is already a console handler logging all INFO minimum.

It’s also configurable through a properties file often named “logging.properties”.

For instance, on OSX, you can find the JVM global JUL configuration here (that contains the default console handler we just talked about):

/Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre/lib/logging.properties

You can use a file of yours by specifying its path in the system properties:

-Djava.util.logging.config.file=src/main/resources/logging.properties

Some values inside must be references (FQCN) that will be load dynamically, otherwise it’s simple properties (think beans).

.level = INFO
handlers=java.util.logging.ConsoleHandler
java.util.logging.ConsoleHandler.level = INFO
java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.SimpleFormatter.format=%4$s: %5$s [%1$tc]%n

We can get a reference to the global logger to change it’s minimum level:

java.util.logging.Logger.getGlobal.setLevel(Level.ALL)

The output is better:

FINER: ENTRY [Wed Sep 07 23:32:48 CEST 2016]
INFO: hello there [Wed Sep 07 23:32:48 CEST 2016]
SEVERE: badaboom [Wed Sep 07 23:32:48 CEST 2016]
FINER: RETURN [Wed Sep 07 23:32:48 CEST 2016]

Be careful, specifying a configuration file is not used as an override of the default! If you forget something (especially handlers=), you might not see any logging.

Note that we used the handler java.util.logging.ConsoleHandler but there is also available a FileHandler (if unconfigured, it logs into $HOME/java0.log).

LogManagers

All the Loggers created in the application are managed by a LogManager.

By default, there is a default instance created on startup. It’s possible to give another one, by specifying the property java.util.logging.manager.

It’s often used along with log4j that implements a custom LogManager (available in the package org.apache.logging.log4j:log4j-jul):

-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager

This way, any manager can have a hand on any Logger created in the application.

It can change their behavior and where do they read their configuration for instance. This is what we call a Logging Adapter or a bridge: you can log using JUL in the code and use log4j features to manipulate and save the logs. We’ll go into more details later in this post.

A smarter logging with slf4j-api

Let’s go into the main subject: slf4j.

The API

First, we need to add a dependency to its API:

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"

val sl: Logger = LoggerFactory.getLogger("My App")
sl.info("hello")

We are getting some logs, but not what we expect:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.

slf4j is using a org.slf4j.helpers.NOPLogger to log, but unfortunately, as the name says, all methods are empty shells:

// org.slf4j.helpers.NOPLogger.java
final public void info(String msg, Throwable t) {
    // NOP
}

The application still works, but without logs. slf4j tries to find a class “org.slf4j.impl.StaticLoggerBinder” available in the classpath. If it does found one, it fallbacks to the NOPLogger.

A simple slf4j binding

Fortunately, there is a simple implementation of slf4j :

libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.7.21"

Now it can find a org.slf4j.impl.StaticLoggerBinder to create a Logger (a SimpleLogger in this case).

By default, this logger publishes messages to System.err, but it can actually write to System.out or any file.

val sl: Logger = LoggerFactory.getLogger("My App")
sl.info("message from {}", "slf4j!")

Output:

[main] INFO My App — message from slf4j!

The style and destination can be configured using System variables or via a properties file.

-Dorg.slf4j.simpleLogger.showDateTime=true
-Dorg.slf4j.simpleLogger.dateTimeFormat="yyyy-MM-dd HH:mm:ss"
-Dorg.slf4j.simpleLogger.levelInBrackets=true
-Dorg.slf4j.simpleLogger.logFile=simpleLogger.log

Here, we say we want to log into a file “simpleLogger.log”.

For the sake of clarity and organization, it’s preferable to put those props in a dedicated file such as src/main/resources/simplelogger.properties:

org.slf4j.simpleLogger.showDateTime=true
org.slf4j.simpleLogger.dateTimeFormat="yyyy-MM-dd HH:mm:ss"
org.slf4j.simpleLogger.levelInBrackets=true
org.slf4j.simpleLogger.logFile=simpleLogger.log

This was our first sl4j logging implementation. But we already saw another one: JUL !

slf4j to JUL

slf4j can redirect its logs to JUL that provides the “writing” piece as we already saw.

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.slf4j" % "slf4j-jdk14" % "1.7.21"

The name “slf4j-jdk14″ is because JUL package appeared in the JDK1.4 as we said. Strange name to pick but well.

Output:

INFO: message from slf4j! [Thu Aug 18 23:45:15 CEST 2016]

The code is the same as previously, we just changed the implementation. Notice the output is different than the SimpleLogger’s.

This logger is actually an instance of JDK14LoggerAdapter. It’s using the style we defined at the beginning, in logging.properties, used by JUL, remember ?.

Note that you don’t have the full control on the Logger via the API as we had when using directly java.util.logging.Logger which exposes more methods. We just have access to the slf4j’s ones. This is why the configuration files comes in handy.

Multiple implementations

If we have multiple implementations available, slf4j will have to pick between them, and it will leave you a small warning about that.

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.slf4j" % "slf4j-jdk14" % "1.7.21"
libraryDependencies += "org.slf4j" % "slf4j-simple" % "1.7.21"

Output:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [.../slf4j-simple/jars/slf4j-simple-1.7.21.jar!...]
SLF4J: Found binding in [.../org.slf4j/slf4j-jdk14/jars/slf4j-jdk14–1.7.21.jar!...]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.SimpleLoggerFactory]
"2016-08-18 23:53:54" [main] [INFO] My App - message from slf4j!

As we said org.slf4j.impl.StaticLoggerBinder is the class slf4j-api is looking for in the classpath to get an implementation. This is the class that must exist in a slf4j implementation jar.

This message is just a warning, the logging will work. But slf4j will simply pick one available logging implementation and deal with it. But it’s a bad smell that should be fixed, because maybe it won’t pick the one you want.

It often happens when pom.xml or build.sbt imports dependencies that themselves depends on one of the slf4j implementation.

They have to be excluded and your own program should import a slf4j implementation itself. If you don’t, you could run in a no-logging issue.

A real case causing logs loss

For a real case, let’s import the hadoop client lib:

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"

If we restart our program, it’s getting more verbose and we’re getting a surprise:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [.../org.slf4j/slf4j-log4j12/jars/slf4j-log4j12–1.7.5.jar!...]
SLF4J: Found binding in [.../org.slf4j/slf4j-jdk14/jars/slf4j-jdk14–1.7.21.jar!...]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
log4j:WARN No appenders could be found for logger (My App).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

We can see some log4j warnings that we never imported, and we don’t even see our own message! Where did it go?

It went into log4j that is not configured, meaning into a blackhole.

One way is to exclude the log4j impl from the dependencies:

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.slf4j" % "slf4j-jdk14" % "1.7.21"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0" exclude("org.slf4j", "slf4j-log4j12")

If we restart our program, we can see that our JUL console logs are back.

Note that the hadoop logging will still be voided, because it still rely on log4j configuration we didn’t configured.

One way to fix this and get the hadoop logs, would be to redirect log4j api to slf4j api. It’s possible, we simply need to add a dependency to org.slf4j:log4j-over-slf4j.

Again, we’ll see that in details later in this article, but the point is: you shouldn’t have multiple logging implementations available in one program.

slf4j implementations should be declared as optional

A best practice when writing a library or any module that can be imported somewhere, is to set slf4j implementation dependency as “optional”:

libraryDependencies += "org.slf4j" % "slf4j-jdk14" % "1.7.21" % "optional"


  org.slf4j
  slf4j-jdk14
  true

With optional, the dependency won’t be imported by transitivity.

The program which depends on it can use anything, no need to exclude it. More details here https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html.

JCL/ACL

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.slf4j" % "slf4j-jcl" % "1.7.21"

JCL stands for Jakarta Commons Logging.

Jakarta is an old retired Apache project, basically, it’s known as ACL now, Apache Commons Logging. It’s not maintained anymore (since 2014), but we can find it in old projects.

It serves the same purpose as slf4j, meaning it’s an abstraction over different logging frameworks such as log4j or JUL.

slf4j’s getLogger() will return a JCLLoggerAdapter that will look for a specific “Log” implementation set by the System variable “org.apache.commons.logging.Log”.

If not set, it will try to fallback on any implementations it can find in the classpath (log4j, JUL..).

New projects should forget about it. Only, if they depends on an old project that depends on JCL, then it should be considered to add a bridge to redirect JCL logs to the implementation of the project.

log4j

log4j is a widely-used logging framework. v1.x has been refactored and improved a lot to create the v2.x called log4j2.

Again, it can be used as an abstraction over a logging implementation, but it can be used as an implementation as well.

log4j1.2

log4j1.2 has reached end of life in 2015.

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.21"

Note that this will pull log4j1.2 library too. Here is the dependency tree:


[info] +-org.slf4j:slf4j-log4j12:1.7.21
[info]   +-log4j:log4j:1.2.17
[info]   +-org.slf4j:slf4j-api:1.7.21

When calling slf4j’s getLogger(“My App”), it will use log4j API to create the logger:

org.apache.log4j.LogManager.getLogger(name);

Note that this LogManager has nothing to do with the JUL’s one.

When you don’t have slf4j but just log4j, this is the method you call to get a Logger. slf4j-log4j12 just does the same.

Anyway, that’s not enough:

log4j:WARN No appenders could be found for logger (My App).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

log4j needs a configuration file. We can create a simple properties file “src/main/resources/log4j.properties”:

log4j.rootLogger=DEBUG, STDOUT
log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
log4j.appender.STDOUT.layout=org.apache.log4j.PatternLayout
log4j.appender.STDOUT.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

If we restart our program, we can see our message:

0 [main] INFO My App — message from slf4j!

Or if we like xml (nobody?), we can create a file “log4j.xml” (notice the lowercase tags):

Output:

2016–08–22 01:06:38,194 INFO [main] App$ (App.scala:11) — message from slf4j!

But you shouldn’t useWhen you don’t have slf4j but just log4j, this is the method you call to get a Logger. slf4j-log4j12 just does the same.

log4j2

Now, let’s say we want to use the latest version of log4j. It may be the most popular slf4j’s binding used nowadays.

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.6.2"

Notice the organization of the binding is “org.apache.logging.log4j”, and not “org.slf4j” like log4j12’s.

Only adding this dependency is not enough :

Failed to instantiate SLF4J LoggerFactory
Reported exception:
java.lang.NoClassDefFoundError: org/apache/logging/log4j/spi/AbstractLoggerAdapter
...

We need to add log4j-api dependency ourselves:

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.6.2"
libraryDependencies += "org.apache.logging.log4j" % "log4j-api" % "2.6.2"

Not enough yet!

ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console…

We need to add log4j-core dependency too

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.apache.logging.log4j" % "log4j-slf4j-impl" % "2.6.2"
libraryDependencies += "org.apache.logging.log4j" % "log4j-api" % "2.6.2"
libraryDependencies += "org.apache.logging.log4j" % "log4j-core" % "2.6.2"

We get another error message (!) :

ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.

That’s better, we just need a configuration file, that’s the last step.

Let’s create a sample log4j2.xml (notice the caps):

Our message is finally back and a file A1.log is created too:

2016–08–22 01:51:49,912 INFO [run-main-a] App$ (App.scala:8) — message from slf4j!

log4j2 is excellent because it has a vast collections of Appenders where to write the logs : https://logging.apache.org/log4j/log4j-2.4/manual/appenders.html

Console, File, RollingFile, MemoryMappedFile
Flume, Kafka, JDBC, JMS, Socket
SMTP (emails on errors, woo!)
Any Appender can be treated as Async too (doing the logging in another thread, to not block the main thread cause of i/o)

logback

logback has the same father as log4j, it was meant to be the successor of log4j.

The syntax of the configuration is therefore quite similar.

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "ch.qos.logback" % "logback-classic" % "1.1.7"

“logback-classic” will pull-down “logback-core” as dependency, no need to add it.

It will run without configuration (finally!):

02:17:43.032 [run-main-1f] INFO My App — message from slf4j!

But of course, you can create a logback.xml to customize its behavior:



    
        
            %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n

debug : to display some info about the logging system creation on startup
scan : modification are taken into account in live. This is particularly useful in production when you just want to get debug message for a short amount of time.
notice the xml style is log4j1.2’s

It’s also possible to use a custom config file

-Dlogback.configurationFile=src/main/resources/logback.xml

logback has the same collection of appenders as log4j. Some are not part of the official package, such as:

kafka: https://github.com/danielwegener/logback-kafka-appender
Slack (mostly for errors): https://github.com/maricn/logback-slack-appender

TLDR

Add a dependency to slf4j which is a logging interface: “org.slf4j” % “slf4j-api” and add an logging implementation:

Implementation	Dependency(ies)	Configuration / Note
to the console	“org.slf4j” % “slf4j-simple”	simplelogger.properties
to java.util.logging (JUL)	“org.slf4j” % “slf4j-jdk14″	logging.properties
to JCL/ACL	“org.slf4j” % “slf4j-jcl”	(deprecated)
to log4j1.2	“org.slf4j” % “slf4j-log4j12″	(deprecated) log4.[properties\|xml]
to log4j2	“org.apache.logging.log4j” % “log4j-[slf4j-impl\|api\|core]”	log4j2.xml
to logback	“ch.qos.logback” % “logback-classic”	logback.xml

A very nice picture to resume what we just saw (we didn’t talked about sl4j-nop, it’s just a black hole):

So we learned about multiple implementations/bindings of slf4j’s api.

But if your project depends on other projects that are not using slf4j but directly JUL or log4j, it’s possible to redirect them to your own slf4j’s implementation, thanks to the bridges.

Bridges

Previously, we imported hadoop-client and our logs disappeared because it was using a log4j logger we never configured.

We excluded its implementation from the program and could see our logs again, but the logs of the hadoop-client library was still using log4j, and therefore its logs went into the void.

To avoid that, it’s possible to create a bridge to send log4j messages to slf4j, that we will dispatch where we want.

The bridge package generally contains both sides in the name, as “org.apache.logging.log4j” % “log4j-to-slf4j” % “2.6.2”.

For instance, with those dependencies :

libraryDependencies += "org.slf4j" % "slf4j-api" % "1.7.21"
libraryDependencies += "org.slf4j" % "slf4j-jdk14" % "1.7.21"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0"
libraryDependencies += "org.apache.logging.log4j" % "log4j-to-slf4j" % "2.6.2"

The path of the logs is:
hadoop’s log(…) → ACL → log4j → bridge → slf4j → JUL → System.err
Phew!

val sl: Logger = LoggerFactory.getLogger("My App")
sl.info("message from {}", "slf4j!")

// generate some hadoop logs
new DFSClient(new InetSocketAddress(1337), new Configuration)

We are actually “lucky” because 2 implementations were available for slf4j: log4j (provided in hadoop-client) and “slf4j-jdk14″.

Fortunately for us, slf4j pock “slf4j-jdk14″. Otherwise we would have get trap into an infinite loop :

hadoop’s log(…) → ACL → log4j → bridge → slf4j → log4j → log4j → bridge → slf4j → log4j→ log4j → bridge → slf4j → log4j…

Output:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [...slf4j-jdk14–1.7.21.jar!...]
SLF4J: Found binding in [...slf4j-log4j12–1.7.5.jar!...]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
INFO: message from slf4j! [Fri Aug 19 01:08:46 CEST 2016]
FINE: dfs.client.use.legacy.blockreader.local = false [Fri Aug 19 01:08:46 CEST 2016]
FINE: dfs.client.read.shortcircuit = false [Fri Aug 19 01:08:46 CEST 2016]
FINE: dfs.client.domain.socket.data.traffic = false [Fri Aug 19 01:08:46 CEST 2016]
FINE: dfs.domain.socket.path = [Fri Aug 19 01:08:46 CEST 2016]
…

Another bridge supposedly doing the same exists : “org.slf4j” % “log4j-over-slf4j” % “1.7.21”. Unfortunately, it creates the infinite loop in our case, because slf4j pick “slf4j-log4j12″:

SLF4J: Found binding in [...slf4j-log4j12–1.7.5.jar!...]
SLF4J: Found binding in [...slf4j-jdk14–1.7.21.jar!...]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.StackOverflowError

But we can explicitely exclude the other implementation :

libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.3.0" exclude("org.slf4j", "slf4j-log4j12")

If we do, both bridges are working as expected.

As you can see, without altering anything in the hadoop library, we made it generate logs where and with the format we wanted.

Bridges between those common implementations are available (they couldn’t agree on the naming it seems..):

jcl-over-slf4j
log4j-over-slf4j
jul-to-slf4j

That’s the power of slf4j and its implementations. It’s completely decoupled from the source.

TLDR

Here’s a picture resuming the available bridges to slf4j:

Performance

Some applications can generate a tremendous amount of logs. Some precautions should be taken care of:

async logging should always be preferred (another thread doing the logging, not the caller’s). This is often available in the logging configuration itself ()
you should not add level guards (if (logger.isDebugEnabled)…) before logging, that brings us to the next point:
do not concat strings yourself in the message: use the placeholder syntax such as log.info(“the values are {} and {}”, item, item2). The .toString() won’t be computed if it’s not needed (it can be cpu intensive, but basically, it’s just useless to call it if the log level is not enough).
In Scala, you generally use https://github.com/Log4s/log4s to avoid this and just use classic string interpolation. It’s based on macros and will automatically add guard conditions.

Some benchmarks and comparaison: https://logging.apache.org/log4j/2.x/performance.html

Last notes

slf4j is useful combined to a powerful implementation such as log4j2 or logback.

But be careful when the application is managed by another application, like supervisor, because they can handle the logging themselves too like file rolling, or logstash to somewhere. Often, keeping the logging configuration simple (stdout) is enough.

A lot of frameworks have traits or abstract classes or globals to provide the logging directly :

Akka : provides LoggingReceive, ActorLogging, akka.event.Logging, akka-slf4j.jar
Spark : it’s using log4j and had a trait org.apache.spark.Logging (removed in 2.0.0)
Play Framework: it’s using logback and provides a Logger object/class on top of slf4j’s Logger

JNA—Java Native Access: enjoy the native functions

August 3rd, 2016 | java, jna, jni, native, scala | ctheu

Before JNA: JNI

If you’re into the Java world, you’ve probably heard of JNI: Java Native Interface.
It’s used to call the native functions of the system or of any native library.
Some good JNI explanations and examples here: http://www.ibm.com/developerworks/java/tutorials/j-jni/j-jni.html

Most of developers will never use it because it’s not often necessary to access the system resources, the windows, the volumes etc. That really depends of your business.

Sometimes, you want to use a library that’s not written in Java but in C. It’s very performant and battle-tested, you need to create a bridge. This is where JNI and JNA comes into play.

About resources, Java provides already some high-level API for some system aspects (memory, disks), such as:

Runtime.getRuntime().maxMemory()
Runtime.getRuntime().availableProcessors()
File.listRoots()(0).getFreeSpace()

But it’s pretty limited. Behind the scene, they are declared as native and rely on JNI.

You can use some projects that offers more options, such as oshi (Operating System & Hardware Information). It makes all possible information on the OS and hardware of the machine available (all memory and cpu metrics, network, battery, usb, sensors..).

It’s not using JNI: it’s using JNA!
JNA is JNI’s cousin: created to be simpler to use, and to write only Java code. (Scala in our case :) Note that there is a slight call overhead compared to JNI because of the dynamic bindings.

JNA

Basically, it dynamically links the functions of the native library to some functions declared in a Java/Scala interface/trait. Nothing more.

The difficulty comes with the signature of the functions you want to “import”.
You can easily find their native signatures (Google is our friend), but it’s not always obvious to find how to translate them using the Java/Scala types.

Hopefully, the documentation of JNA is pretty good to understand the subtle cases : Using the library, FAQ.

Let’s review how to use it using Scala and SBT (instead of Java).

How to use it

First, SBT:

libraryDependencies ++= Seq(
  "net.java.dev.jna" % "jna" % "4.2.2",
  "net.java.dev.jna" % "jna-platform" % "4.2.2")

The “jna” dependency is the core.

“jna-platform” is optional. It contains a lot of already written interfaces to access some standard libraries on several systems (Windows (kernel32, user32, COM..), Linux (X11), Mac). If you plan to use any system library, check out this package first.

Then, the Scala part.

Use the existing platform bindings

With jna-platform, you can use the existing bindings:

import com.sun.jna.platform.win32.Kernel32
import com.sun.jna.ptr.IntByReference

val cn = new Array[Char](256)
val success: Boolean = Kernel32.INSTANCE.GetComputerName(cn, new IntByReference(256))
println(if (success) Native.toString(cn) else Kernel32.INSTANCE.GetLastError())

You can feel the native way when calling this function (most native functions follows this style):

you provide a buffer and its length
you get a boolean as result to indicate success/failure
in case of a failure, you call to know the code of the error (such as 111 for )
in case of a success, the buffer contains the name

That’s very manual but that’s the way. (nowadays, we would return the String and throw an Exception on failure)

For information, the native signature is :

BOOL WINAPI GetComputerName(
  _Out_   LPTSTR  lpBuffer,
  _Inout_ LPDWORD lpnSize);

A pointer to some buffer to write into and its size (use as input and as output).

Listing the opened windows

Another more complex example to retrieve the list of opened windows :

import com.sun.jna.platform.win32.{User32, WinUser}

User32.INSTANCE.EnumWindows(new WinUser.WNDENUMPROC {
  override def callback(hWnd: HWND, arg: Pointer): Boolean = {
    val buffer = new Array[Char](256)
    User32.INSTANCE.GetWindowText(hWnd, buffer, 256)
    println(s"$hWnd: ${Native.toString(buffer)}")
    true
  }
}, null)

Output:

native@0xb0274: JavaUpdate SysTray Icon 
native@0x10342: GDI+ Window 
native@0x10180: Windows Push Notifications Platform 
(a lot more)...

The native signature of is :

BOOL WINAPI EnumWindows(
  _In_ WNDENUMPROC lpEnumFunc,
  _In_ LPARAM      lParam);

we use User32 because it contains the windows functions of Windows
a WNDENUMPROC is a pointer to a callback. JNA already has an interface of the same name to be able to create this type in the JVM.
we call another function of User32 in get the title of each window

Create a custom binding

It’s time to fly with our own wings.

Let’s call a famous function of the Windows API: MessageBox. You know, the popups? It’s in User32.lib but JNA did not implemented it. Let’s do it ourselves.

First, we create an interface with the Java/Scala signature of the which is :

int WINAPI MessageBox(
  _In_opt_ HWND    hWnd,
  _In_opt_ LPCTSTR lpText,
  _In_opt_ LPCTSTR lpCaption,
  _In_     UINT    uType);

The Scala equivalence could be:

import com.sun.jna.Pointer
import com.sun.jna.win32.StdCallLibrary

trait MyUser32 extends StdCallLibrary {
  def MessageBox(hWnd: Pointer, lpText: String, lpCaption: String, uType: Int)
}

We use simple Strings and not Array[Char] because they are only used as inputs (_In_).
The name of the function must be exactly the native’s one (with caps)

Now, we need to instantiate the interface with JNA and call our function:

val u32 = Native.loadLibrary("user32", classOf[MyUser32], W32APIOptions.UNICODE_OPTIONS).asInstanceOf[MyUser32]
val MB_YESNO = 0x00000004
val MB_ICONEXCLAMATION = 0x00000030
u32.MessageBox(null, "Hello there!", "Hi", MB_YESNO | MB_ICONEXCLAMATION)

Always use W32APIOptions.UNICODE_OPTIONS or you’ll get into troubles when calling functions (that will automatically convert the input/output of the calls)

It was quite simple right? That’s the purpose of JNA. Just need an interface with the native method declaration, you can call it.

The difficulty could be to write the Java signature, but a tool can help: JNAerator. From the native language, it can generate Java signatures, pretty cool!

More examples of JNA usage on their github’s: https://github.com/java-native-access/jna/tree/master/contrib

Demystifying openssl

July 25th, 2016 | https, openssl, security, ssh | ctheu

I always saw openssl as a complicated beast.

I generally use openssl only to create a pair of private/public keys to be used with ssh.

But when I need to use it for some reasons, I’m always wondering what to do. I google some tutorials then copy/paste commands. But I never understood what’s the deal with all the files .key, .pem, .csr, .crt?! Then, when I succeed to do my thing, I move on. I never really tried to understand the flow.

I’m not an expert in openssl. I just want to demystify some of its features. How it is used to generate keys and certificates? What we can do beyond that? (random number generator, manually encrypt/decrypt files)

I’ll use a classic example: generate a self-signed certificate. Several commands will be used, several files will be generated. I’ll add the certificate into nginx to make it work. Finally, I will use some other commands unrelated to certificates.

Why a certificate?

It has two purposes :

ensure that you are on the website it claims to be

The browsers (or the operating systems) have a set of Certificate Authority root certificates installed. They use them to verify the certificates of the websites.

encrypt the data between you and the website server

The certificates contains a public key that the browser will use to encrypt the data to send. The server will be able to decrypt them using the associated private key stored on its disk.

Why a self-signed certificate?

Those kind of certificates were more useful before, when the HTTPS certificates were not free. To work with HTTPS in staging or development environments, it was the way to go.

Now, we have Let’s Encrypt and https://gethttpsforfree.com/ to get free HTTPS certificates truly signed by a Certificate Authority. But it’s still more complex to grab than simply generate a self-signed certificates.

Step 1: create a RSA key

$ openssl genrsa -des3 -passout pass:ThEpWd -out out.key 2048
Generating RSA private key, 2048 bit long modulus
.................................................+++
.......................+++
e is 65537 (0x10001)

We start by generating a RSA key (genrsa) :

We generate a key of 2048 bits. The more the safer. The more, the slower to generate. The process has to generate 2 primes numbers and do some security checks (represented by the “.” and “+”). It will also use some random signals to always get a different result.
We encrypt it with the Triple DES cipher using a password. Here, it is provided directly into the command line. It could also be passed from a file (“file:pwd.txt”). Any program trying to use the key will first need the password. It’s a secondary layer of security in case someone has access to the file.
Note that for automation, you could get into troubles. If a program wants you to type a password — because you are using an encrypted private key — such as nginx or any openssl commands, you could need a human interaction.
The output is by default the PEM format (it’s just the base64 of the DER format which is binary). PEM stands for “Privacy-Enhanced Mail”.https://tools.ietf.org/html/rfc1421. It’s a very old format.

A simpler command without encryption would be :

$ openssl genrsa -out out.key 2048

Encrypted, the key looks like this (“RSA PRIVATE KEY” with headers):

It’s possible to see the “content” of the key using the tool “rsa” — it’s all about mathematics — :

$ openssl rsa -noout -text -inform PEM -in out.key
Private-Key: (2048 bit)
modulus:
 00:b2:e0:a7:95:98:f6:26:8c:32:09:04:a6:ac:9a:
 42:b8:1e:43:de:fe:f6:c1:f1:1a:0e:b2:0c:86:47:
 35:6b:46:e4:46:36:bf:ef:cc:34:c0:09:9f:77:eb:
...
publicExponent: 65537 (0x10001)
...

The full synopsis of genrsa is :

# genrsa - generate an RSA private key
openssl genrsa [-out filename] [-passout arg] [-des] [-des3] [-idea] [-f4] [-3] [-rand file(s)] [-engine id] [numbits]

By default, openssl outputs the PEM format. It’s a plain ASCII file with some specific header and footer, and a big string in-between. It’s useful to send them through email (not the private keys of course!) along some text, or even to send a message encrypted (PGP).

Bonus: create a key from another key

“rsa” can also be used to convert any key to any other key format. For instance, we could generate another private key based on our first:

$ openssl rsa -passin pass:ThEpWd -in out.key -out out-next.key

I’m not sure why . Is that more secured? I already stumbled upon this technic to create a certificate, instead of using a password protected key generated by genrsa. Please, let me know!

The full synopsis of rsa is :

# rsa — RSA key processing tool
openssl rsa [-inform PEM|NET|DER] [-outform PEM|NET|DER] [-in filename] [-passin arg] [-out filename] [-passout arg] [-sgckey] [-des] [-des3] [-idea] [-text] [-noout] [-modulus] [-check] [-pubin] [-pubout] [-engine id]

Step 2: create a certificate request

Now that we have our private key, we are going to create a certificate request. Both the key and the certificate request are needed to create a certificate.

It’s necessary to request one because you are not supposed to be the one who signs the certificate. It is the role of the Certificate Authorities.

We use “openssl req” to generate a .csr (Certificate Signing Request).

$ openssl req -new -key out.key -out out.csr -sha256
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter ‘.’, the field will be left blank.
-----
Country Name (2 letter code) [AU]:FR
State or Province Name (full name) [Some-State]:
Locality Name (eg, city) []:
Organization Name (eg, company) [Internet Widgits Pty Ltd]:
Organizational Unit Name (eg, section) []:
Common Name (e.g. server FQDN or YOUR name) []:
...

We provide the private key to use (out.key)
We fulfill several metadata about the company (if self-signed, we don’t care a lot). They will be identifiable in the certificate.
We use a SHA-256 hash. If unspecified, it’s SHA-1 that will be used. SHA-1 creates a hash of 160 bits, and is deprecated due to some weaknesses. For instance, Chrome clearly displays a warning if the certificate is still using SHA-1. The new “standard” is SHA-256: it creates a hash of 256 bits.

The certificate request looks like this (“CERTIFICATE REQUEST”):

We can verify what’s inside:

$ openssl req -in out.csr -text -verify -noout
verify OK
Certificate Request:
 Data:
 Version: 0 (0x0)
 Subject: C=FR, ST=Some-State, L=Paris, O=Internet Widgits Pty Ltd, CN=John Doe
...

The short synopsis of req is :

# req - PKCS#10 certificate request and certificate generating utility.
 openssl req [-inform PEM|DER] [-outform PEM|DER] [-in filename] [-out filename] [-text] [-pubkey] [-noout] [-verify] [-new] [-key filename] [-keyform PEM|DER] [-keyout filename] [-x509] [-days n]

PKCS stands for “Public Key Cryptographic Standards”. PKCS#10 is the format for request a certificate (https://tools.ietf.org/html/rfc2986).

Faster alternative — TLDR

It’s possible to create a certificate request with a private key in one shot:

$ openssl req -new -out out.csr -sha256

By default, it will generate a RSA 2048 bits key, ask you for a pass-phrase, and the private key will be output to “privkey.pem”.

To get rid of the defaults, one can use :

$ openssl req -new -nodes -out out.csr -keyout out.key -sha256

-nodes to not encrypt the key (no pass-phrase)
-keyout [filename] instead of “privkey.pem”

All the defaults of openssl are configurable: /etc/ssl/openssl.cnf

Step 3: generate the self-signed certificate

Finally, we create a self-signed certificate from our certificate request.

$ openssl x509 -req -in out.csr -signkey out.key -out out.crt -days 365 -sha256
Signature ok
subject=/C=AU/ST=Some-State/O=Internet Widgits Pty Ltd
Getting Private key

x509 is all about certificates standards
We provide the certificate request (-req -in) and the private key (that makes it self-signed)
We set it valid for a year
Do not forget to use SHA-256 to avoid browsers’ deprecation warning

The certificate looks like this:

-----BEGIN CERTIFICATE-----
MIIDTjCCAjYCCQD5/NymfWIDMzANBgkqhkiG9w0BAQsFADBpMQswCQYDVQQGEwJG
UjEOMAwGA1UECAwFUGFyaXMxDTALBgNVBAoMBE5vbmUxDTALBgNVBAsMBE5vbmUx
...
-----END CERTIFICATE-----

Again, we can inspect the content:

$ openssl x509 -in out.crt -text -noout
Certificate:
 Data:
  Version: 1 (0x0)
  Serial Number: 14039925222936737604 (0xc2d7d59a8230f744)
 Signature Algorithm: sha1WithRSAEncryption
  Issuer: C=AU, ST=Some-State, O=Internet Widgits Pty Ltd
  Validity
   Not Before: Jul 22 00:07:52 2016 GMT
   Not After : Jul 22 00:07:52 2017 GMT
 ...

The short synopsis of x509 is:

# x509 — Certificate display and signing utility
openssl x509 [-inform DER|PEM|NET] [-outform DER|PEM|NET] [-keyform DER|PEM] [-in filename] [-out filename] [-startdate] [-enddate] 
[-days arg] [-signkey filename] [-req] ...

Faster alternative — TLDR

We did all that, but it is possible to generate a self-signed certificate with its private key in one shot.

$ openssl req -x509 -nodes -new -keyout out.key -out out.crt -sha256

We add “-x509” to generate a self signed certificate and not a certificate request (if unspecified).

Aparte

Let’s just present some other openssl’s tools before going back to nginx and the certificate.

Random generator

If you have some bash and need nice generated strings (instead of $RANDOM or /dev/urandom), you can use rand :

$ openssl rand -hex 50
c15b5469946564a704777286ad81f6ca0c9c49a36c45a159174cc028c953f3ab1434977927fc9f11b6b45f4194f5ed2a6090

$ openssl rand 100 | od -An
 062351 017220 011763 071125 020723 127621 022632 037276
 066565 114650 073202 106612 120052 111556 075054 033665

Encrypt/Decrypt anything

“enc” can be used to encrypt and decrypt anything using any encryption algorithm existing on Earth.

For instance, encoding/decoding in base64 :

$ openssl enc -base64
use base64 to encode me!
[Ctrl+D]
dXNlIGJhc2U2NCB0byBlbmNvZGUgbWUhCg==

$ openssl enc -base64 -d
dXNlIGJhc2U2NCB0byBlbmNvZGUgbWUhCg==
[Ctrl+D]
use base64 to encode me!

It’s a bit useless because anybody can decrypt it, it’s not password-protected.

aes-256-cbc (NSA approved) does a nicer job:

$ openssl enc -aes-256-cbc -out encrypted.txt
enter aes-256-cbc encryption password:
Verifying — enter aes-256-cbc encryption password:
encrypt me!
[Ctrl+D]

$ xxd encrypted.txt
0000000: 5361 6c74 6564 5f5f 9448 1158 e593 c2b7 Salted__.H.X….
0000010: 67fb 442d af21 75ad 80e3 483a ff37 d8f9 g.D-.!u…H:.7..

$ openssl enc -aes-256-cbc -in encrypted.txt -d
enter aes-256-cbc decryption password:
encrypt me!

Nginx

Back to our certificate, let’s make it run on our website !

A minimal subset of changes to nginx would be to add this to one’s configuration :

server {
  listen 443;
  ssl on;
  ssl_certificate /tmp/out.crt;
  ssl_certificate_key /tmp/out.key;
}

You should not use /tmp of course. Considering using /etc/ssl/{cert,private} or /usr/local/share/ca-certificates to store your files.

Then restart nginx and enter the passphrase if any (beware of automation):

$ sudo service nginx restart
Restarting nginx:
Enter PEM pass phrase:
Enter PEM pass phrase:
nginx.

Then you will get a scary error in your browser. Either you won’t be able to do anything, or you will have a link at the bottom to continue.

If you have no link to continue, try to manually trust the certificate. On Windows, you can double-click on the .crt file to install it, or go manually in the stores :

On Debian and such, you can copy it to the appropriate folder :

$ sudo cp out.crt /usr/local/share/ca-certificates/out.crt
$ sudo update-ca-certificates
Updating certificates in /etc/ssl/certs… 
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d….done.

A better nginx configuration

nginx’s ssl configuration is a very important topic. Except when dealing with self-signed certificates because they should not be exposed to the Internet. But otherwise, you should definitely follow the advises of the following gist.

https://gist.github.com/plentz/6737338

One common thing to add, as it suggests, is to generate and use a dhparam file. It adds more randomness to the process and reduce the risk of disclosures.

Moreover, to test a installed certificate and nginx’s configuration, a good start is to submit the page’s address to https://www.ssllabs.com/ssltest/.

Now, openssl is not scary anymore.

Note: this is a cross-post of my medium post: https://medium.com/@ChtefiD/demystifying-openssl-b7b8dbcdd90a

From Apache Flume to Apache Impala using HDFS

March 31st, 2016 | cloudera, flume, hadoop, hbase, hdfs, impala | ctheu

Let’s use HDFS as a database !

So, we have data coming from one of our service, which is the source of a flume agent. Now, we want to be able to query them in a scalable fashion without using hbase or any other database, to be lean.

One way is to use HDFS as a database (Flume has a HDFS sink that handle partitioning), create a Hive table on top to query its content, and because we want something performant and fast, to actually use Impala to query the data using a Apache Parquet format.

Here’s a little diagram of the stack :

Apache Oozie is used to regularly export the HDFS content to a parquet format.
We are storing our data into HDFS in an Apache Avro format (Snappy compressed) because of all its advantages and because we are already using it everywhere.

Let’s review the stack one piece at a time, starting with the Flume configuration.

Flume

Let’s say we have a avro source where our custom service is sending events :

agent1.sources.events_source.type = avro
agent1.sources.events_source.bind = 0.0.0.0
agent1.sources.events_source.port = 9876

Let’s first configure the Flume HDFS sink where we are going to export our events, the configuration is pretty long but every piece has its importance :

agent1.sinks = ... s

agent1.sinks.s.type = hdfs
agent1.sinks.s.hdfs.path = /user/flume/events/ymd=%Y-%m-%d/h=%H
agent1.sinks.s.hdfs.inUsePrefix = .
agent1.sinks.s.hdfs.fileType = DataStream
agent1.sinks.s.hdfs.filePrefix = events
agent1.sinks.s.hdfs.fileSuffix = .avro
agent1.sinks.s.hdfs.rollInterval = 300
agent1.sinks.s.hdfs.rollSize = 0
agent1.sinks.s.hdfs.rollCount = 0
agent1.sinks.s.serializer = com.company.CustomAvroSerializer$Builder
agent1.sinks.s.channel = events_channel

Let’s review this config.

Partition

Because of our volume of data, we want to partition them per year-month-day then by hour. The “ymd=” and “h=” in the path are important, it represents the “column” name of the time dimension that will be queryable later.

Note that your Flume source must have a “timestamp” in the header for Flume to know what is the time dimension. If you don’t have this info, you can simply add hdfs.useLocalTimeStamp = true to use the ingestion time, but it’s discouraged because that means you don’t have any timestamp column in your data, and you’re going to get stuck later when doing some Impala partitioning.

Roll interval

We decide to roll a new file every 5min, and not based on size nor count (they have to be explicitely set to 0 because they have another default value).

By default, Flume buffers into a .tmp file we can’t rely on, and because we want to access the fresh data quickly, 5min is a good start. This is going to generate a bunch of file (144 per day), but we don’t care because we are going to export them later into a hourly parquet format and clean up the old HDFS content.

Moreover, if Flume suddenly dies, you are going to lose maximum 5min of data, instead of the whole buffer. Stopping properly Flume flushes the buffer hopefully.

File name

The inUsePrefix to “.” is to hide the working files to Hive during a query (it ignores the aka hidden files). If you don’t, some MapReduces can fail because at first, Hive saw a file Flume was buffering into (a .tmp), then the time to execute the MR, it was not there anymore (because of some flush), and kaboom, the MR will fail :

Caused by: java.io.FileNotFoundException: File does not exist: hdfs://hadoop01:8020/user/flume/events/ymd=2016–03–17/h=15/events.1458223202146.avro.tmp

File type

By default, the filetype is SequenceFile. We don’t want that, because that makes Flume convert the output stream to a SequenceFile that Hive will not be able to read because the avro schema won’t be inside. Setting it to DataStream let the data sent unaltered.

FYI, a typical SequenceFile body :

SEQ♠!org.apache.hadoop.io.LongWritable”org.apache.hadoop.io.BytesWritable ▒c(g▒s▒▒►▒▒▒|TF..

A snappy compressed avro file body :

Obj♦avro.schema▒8{"type":"record","name":"Order",...}avro.codecsnappy ▒▒T▒▒▒▒♣

Avro serializer

Our custom serializer is doing some conversion of the original event, and simply emits some avro using a DataFileWriter with the snappyCodec:

DatumWriter

Why does slf4j even exist?

slf4j needs love

slf4j breathes in logs

slf4j is magic

Simple logging using JUL

LogManagers

A smarter logging with slf4j-api

The API

A simple slf4j binding

slf4j to JUL

Multiple implementations

A real case causing logs loss

slf4j implementations should be declared as optional

JCL/ACL

log4j

log4j1.2

log4j2

logback

TLDR

Bridges

TLDR

Performance

Last notes

Before JNA: JNI

JNA

How to use it

Use the existing platform bindings

Listing the opened windows

Create a custom binding

Why a certificate?

Why a self-signed certificate?

Step 1: create a RSA key

Bonus: create a key from another key

Step 2: create a certificate request

Faster alternative — TLDR

Step 3: generate the self-signed certificate

Faster alternative — TLDR

Aparte

Random generator

Encrypt/Decrypt anything

Nginx

A better nginx configuration

Flume

Partition

Roll interval

File name

File type

Avro serializer

Multiple Flume ?

Performance consideration

HDFS

Hive

AvroSerDe

Create the table

ERR: Long schemas

ERR: Partition and field names conflicts

Notify Hive of the new partitions

Drop partitions

Querying

Impala

Query the Hive avro table

Query a Parquet table

Oozie

Improvements

Conclusion

Passwords are obsolete

Can be used with GitHub, BitBucket, anything

Time to do some hacking

Here is my key

It’s dangerous to go alone, take this

One-way only

One computer to rule them all

.ssh/config

Even the username can be optional

Beyond hosts

Beyond ssh

But it’s not enough

Java flags overview

-XX:+PrintCommandLineFlags

-XX:+PrintFlagsFinal

More Babel syntax available thanks to `babel-eslint`