What is the difference between coalesce and repartition in Spark?
Coalesce | Repartition |
---|---|
It is used for definitely decreasing the number of partitions used in a Dataframe. | This method can decrease or increase the number of partitions used in a Dataframe. |
It uses the existing partitions to minimize the amount of data being shuffled in a Dataframe. | It just creates new partitions and while doing a full shuffle. |
The partitions through this method are of variable sizes. | The partitions in this method are roughly the same sizes. |
BY Best Interview Question ON 10 Jun 2020