You plan to use a Fabric notebook and PySpark to read sales data and save the data as a Delta table named Sales. The table must be partitioned by Year and Quarter. You load the sales data to a DataFrame named df that contains a Year column and a Quarter column. Which option describes the correct approach to write this as a partitioned Delta table?

Prepare for the DP-600 Fabric Analytics Engineer Exam. Study with flashcards and multiple choice questions, each offering hints and detailed explanations. Enhance your chances of success on the exam!

Multiple Choice

You plan to use a Fabric notebook and PySpark to read sales data and save the data as a Delta table named Sales. The table must be partitioned by Year and Quarter. You load the sales data to a DataFrame named df that contains a Year column and a Quarter column. Which option describes the correct approach to write this as a partitioned Delta table?

Explanation:
When writing a Delta table, you can improve query performance by partitioning the data on the columns you commonly filter on. By partitioning on Year and Quarter, Spark will organize the data into subdirectories for each year-quarter combination, enabling partition pruning and faster scans. To achieve this with PySpark in Fabric, write the DataFrame using Delta format and specify the partitioning columns, then save as a table named Sales. For example: df.write.partitionBy("Year", "Quarter").format("delta").saveAsTable("Sales") This approach guarantees the data is stored as a Delta table and physically partitioned by Year and Quarter, meeting the requirement. Writing as Parquet would not create a Delta table, and partitioning without Delta formatting wouldn’t satisfy the need for a Delta table. Saving without partitioning would miss the required optimization.

When writing a Delta table, you can improve query performance by partitioning the data on the columns you commonly filter on. By partitioning on Year and Quarter, Spark will organize the data into subdirectories for each year-quarter combination, enabling partition pruning and faster scans.

To achieve this with PySpark in Fabric, write the DataFrame using Delta format and specify the partitioning columns, then save as a table named Sales. For example:

df.write.partitionBy("Year", "Quarter").format("delta").saveAsTable("Sales")

This approach guarantees the data is stored as a Delta table and physically partitioned by Year and Quarter, meeting the requirement. Writing as Parquet would not create a Delta table, and partitioning without Delta formatting wouldn’t satisfy the need for a Delta table. Saving without partitioning would miss the required optimization.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy