LinkedIn Dagli – ML made easier in Java

December 7, 2020

LinkedIn recently open-sourced Dagli, a new framework for simplifying the implementation of machine learning models in JVM-based languages. Dagli was designed with three main objectives:

Build an easy-to-use, bug-resistant Java-based ML framework.
Add a rich collection of models, statistical building blocks and feature transformers that can be rapidly incorporated into ML models.
Enable a simple abstraction.

There are also other ML frameworks for Java developers, such as DeepLearning4J and Tribuo. However, Dagli excels in the implementation of end-to-end ML pipelines. Dagli represents machine learning programs as directed acyclic graphs (DAGs), with four fundamental types of root and child nodes:

Root Node – Placeholder: Placeholders represent values that will be filled during the training phase or generators that transfer values to the other nodes.
Root Node – Generator: Generator is a root that transfers data to its children.
Child Node – Transformer: Transformers are nodes that transform the inputs received by the parent nodes in order to produce a result.
Child Node – Views: Views are similar to Transformers but they contain a single parent and simply pass the input information to its children.

For instance, a simple Dagli program might produce the following Dagli DAG:

In addition to these flexible DAGs, Dagli also brings other tangible benefits to Java developers:

ML Artifacts: Dagli includes rich libraries of ML components that simplify the implementation of ML models. Examples of these artifacts include neural networks, logistic regression, gradient boosted decision trees, FastText, cross-validation, cross-training, feature selection, data readers, evaluation, and feature transformation.
Portability: Dagli can be executed on several JVM runtimes ranging from Hadoop servers to a local computer.
Training-Inference Pipeline: Dagli defines a single DAG for both training and inference, which simplifies the interpretability of models.
Deployment: Dagli programs are very simple to deploy as they are, essentially, serialized as a single object.

Maybe the most important benefit of Dagli is the possibility of attracting Java developers into the ML world. Dagli has been incubated and tested at scale in LinkedIn .

Additional Resources:

https://github.com/linkedin/dagli
https://engineering.linkedin.com/blog/2020/open-sourcing-dagli

LinkedIn Dagli – ML made easier in Java

Top Posts

I enjoyed reading this recently

QPOD: The future of chatbots

LinkedIn Dagli – ML made easier in Java

Solutions

Assets

Latest

About Us

Contact