Beyond Datalake in Hadoop summits

Since 2013 I attended countable Hadoop summits/Strata conferences and found a trend where speakers initially used to highlight importance of Hadoop and its technology stack.

When I last attended in December 2014 I found a new trend where speakers instead of talking more about Hadoop core they introduced Datalake and its architectural components like Raw Data Zone, Metadata management, search,  integration tools etc to the community. I was pretty happy when Cloudera released Hadoop Application Architecture book and I recommended to many to study it.

Just like Maven standardized code structure for Java developers, I feel the term “datalake” set a standard in the community by introducing a common vocabulary like Landing Zone/Raw Data Zone between the data engineers.

Me being in consulting and implemented datalake at many enterprises one question that is asked always was “What next… beyond datalake”.

Though I did not attended any session last year (2015) based on inputs I received from my colleagues I felt it is time for Hadoop summit/Strata speakers to introduce to the community what is beyond datalake.

I think “domain specific” data zones are next to datalake data zones; here “domain” I mean Banking, Financial, HealthCare, Financial, Telecom etc. Many enterprises would have already answered the question “What next to datalake” and would have implemented the solutions but such solutions might be locked in source code repositories (like SVN) in the form of conceptual / solution architecture artifacts. One might find many videos or blogs on what kind of solutions big data teams developed beyond “datalake” but I feel consolidating such solutions and introducing a reference architecture, vocabulary to the community through these summits will greatly help data engineers.

Above I might be wrong in expressing my views but thought of sharing¬† my opinion on “beyond datalake”.