Fundamentally hadoop is a infrastructure software for storing and processing large data set.

It is an open source under apache. To understand hadoop we have to understand two things

  1. How it store data?
  2. How it process the data?

How it store data?It is a cluster system.Hadoop used “HDFS”(Hadoop Distributed File System) as part of hadoop.Hadoop let you store bigger then what you can store in one node/server.It have multiple nodes.

How it process the data? It process multiple nodes stored data by using “Map Reduce” frame work.

Why you need hadoop?

You have large dataset that need to be process in a cost effective way here is hadoop.

But hadoop is not for Adhoc Query.