Scalable Deep Learning Pipelines for Big Data

Dipanjan Sarkar

Data Scientist


About Dipanjan Sarkar

Dipanjan (DJ) Sarkar is a Data Scientist at Intel, leveraging data science, machine learning & deep learning to build large-scale intelligent systems. He holds a master of technology degree with specializations in Data Science & Software Engineering. Dipanjan has been an analytics practitioner for several years now, specializing in machine learning, NLP, statistical methods & deep learning. He is passionate about education and also acts as a Data Science Mentor at various organizations like Springboard, helping people learn Data Science.

He is also a key Contributor & Editor for Towards Data Science, a leading online journal on AI & Data Science. Dipanjan has also authored several books on R, Python, Machine Learning, NLP & Deep Learning.


Deep Learning has made revolutionary progress across diverse domains and is slowly seeing mainstream industry adoption thanks to superior computing power, more data and complex problems to be solved. The advent of Big Data by itself has proven to be both a blessing and a curse. Often organizations have more data than they can possibly process and are hence unable to leverage the true power of 'Big Data'. Besides this, there are several aspects of Deep Learning which are computationally very heavy and consume a lot of resources and compute power. Apache Spark has proven to be an excellent Big Data Processing and Analytics framework where it effectively distributes computations across clusters to speed up computations. We have had decent success with Scalable Machine Learning leveraging Spark but what about Deep Learning?

In this talk, we will be covering how to build and leverage efficient Deep Learning Pipelines on Spark leveraging an open-source framework from Databricks (a company created by the founders of Spark) to enable Scalable Deep Learning on Big Data. Extensive coverage on the high-level components, architecture, and APIs will be covered along with a hands-on example (provided there is enough time).

Share the talk