We are a Flatworld Solutions company.
Menu Quadratyx Icon

LinkedIn Dagli – ML made easier in Java

LinkedIn recently open-sourced Dagli, a new framework for simplifying the implementation of machine learning models in JVM-based languages. Dagli was designed with three main objectives:
  1. Build an easy-to-use, bug-resistant Java-based ML framework.
  2. Add a rich collection of models, statistical building blocks and feature transformers that can be rapidly incorporated into ML models.
  3. Enable a simple abstraction.

There are also other ML frameworks for Java developers, such as DeepLearning4J and Tribuo. However, Dagli excels in the implementation of end-to-end ML pipelines. Dagli represents machine learning programs as directed acyclic graphs (DAGs), with four fundamental types of root and child nodes:

  • Root Node – Placeholder: Placeholders represent values that will be filled during the training phase or generators that transfer values to the other nodes. 

  • Root Node – Generator: Generator is a root that transfers data to its children. 

  • Child Node – Transformer: Transformers are nodes that transform the inputs received by the parent nodes in order to produce a result.  

  • Child Node – Views: Views are similar to Transformers but they contain a single parent and simply pass the input information to its children.

For instance, a simple Dagli program might produce the following Dagli DAG:

Dagli program
In addition to these flexible DAGs, Dagli also brings other tangible benefits to Java developers:
  • ML Artifacts: Dagli includes rich libraries of ML components that simplify the implementation of ML models. Examples of these artifacts include neural networks, logistic regression, gradient boosted decision trees, FastText, cross-validation, cross-training, feature selection, data readers, evaluation, and feature transformation. 

  • Portability: Dagli can be executed on several JVM runtimes ranging from Hadoop servers to a local computer. 

  • Training-Inference Pipeline: Dagli defines a single DAG for both training and inference, which simplifies the interpretability of models.  

  • Deployment: Dagli programs are very simple to deploy as they are, essentially, serialized as a single object.

Maybe the most important benefit of Dagli is the possibility of attracting Java developers into the ML world. Dagli has been incubated and tested at scale in LinkedIn .

Additional Resources: