This is a series of blog posts covering data engineering topics like HDFS, MapReduce, Sqoop, Hive, YARN and Spark with practice examples. This series is still a work in progress and you feedback is utmost welcome :)