In the Spark SQL table, there are often many small files (the size is much smaller than the HDFS block size). In this case, Spark will start more Task to process these small files. When there is a Shuffle operation in the SQL logic, will greatly increase the number of hash buckets, which will seriously affect performance.

In the Spark SQL table, there are often many small files (the size is much smaller than the HDFS block size). In this case, Spark will start more Task to process these small files. When there is a Shuffle operation in the SQL logic, will greatly increase the number of hash buckets, which will seriously affect performance.
A . True
B . False

Answer: A

Author

Pdfprep

Author

Leave a Reply Cancel reply