This is one of the most common question i found peoples are getting confused. I have decided why not to write a simple explanation about this.
Usually Partitioning in hive offers a way of segregating hive table data into multiple files/directory’s. But partitioning gives effective results when,
- There are limited number of partitions
- Comparatively equal sized partitions
But this may not possible in all scenarios, like when are partitioning our tables based geographic locations like country, some bigger countries will have large partitions(ex: 4-5 countries itself contributing 70-80% of total data) where as small countries data will create small partitions (remaining all countries in the world may contribute to just 20-30% of total data).So, In these cases Partitioning will not be ideal. Continue reading