default partitioner

rachel-mack · rachel-mack · commit 70557c6d2d8f · 2025-05-08T09:24:07.000-04:00
diff --git a/snooty.toml b/snooty.toml
@@ -21,6 +21,7 @@ artifact-id-2-13 = "mongo-spark-connector_2.13"
 artifact-id-2-12 = "mongo-spark-connector_2.12"
 spark-core-version = "3.3.1"
 spark-sql-version = "3.3.1"
+mdb-server = "MongoDB Server"
 
 [substitutions]
 copy = "unicode:: U+000A9"
diff --git a/source/batch-mode/batch-read-config.txt b/source/batch-mode/batch-read-config.txt
@@ -7,7 +7,7 @@ Batch Read Configuration Options
 .. contents:: On this page
    :local:
    :backlinks: none
-   :depth: 1
+   :depth: 2
    :class: singlecol
 
 .. facet::
@@ -178,26 +178,81 @@ dividing the data into partitions, you can run transformations in parallel.
 This section contains configuration information for the following 
 partitioner:
 
+- :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
 - :ref:`SamplePartitioner <conf-samplepartitioner>`
 - :ref:`ShardedPartitioner <conf-shardedpartitioner>`
 - :ref:`PaginateBySizePartitioner <conf-paginatebysizepartitioner>`
 - :ref:`PaginateIntoPartitionsPartitioner <conf-paginateintopartitionspartitioner>`
 - :ref:`SinglePartitionPartitioner <conf-singlepartitionpartitioner>`
-- :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
 
 .. note:: Batch Reads Only
   
    Because the data-stream-processing engine produces a single data stream,
    partitioners do not affect streaming reads.
 
+.. _conf-autobucketpartitioner:
+
+``AutoBucketPartitioner`` Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The ``AutoBucketPartitioner`` is the default partitioner configuration and uses
+the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
+aggregation stage to paginate the data. By using this configuration, 
+you can partition the data across single or multiple fields, including nested
+fields.
+
+.. note:: Compound Keys
+
+  The ``AutoBucketPartitioner`` configuration requires {+mdb-server+} version
+  7.0 or higher to support compound keys.
+
+To use this configuration, set the ``partitioner`` configuration option to
+``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
+
+.. list-table::
+   :header-rows: 1
+   :widths: 35 65
+
+   * - Property name
+     - Description
+     
+   * - ``partitioner.options.partition.fieldList``
+     - The list of fields to use for partitioning. The value can be either a single field
+       name or a list of comma-separated fields.
+      
+       **Default:** ``_id``
+
+   * - ``partitioner.options.partition.chunkSize``
+     - The average size (MB) for each partition. Smaller partition sizes
+       create more partitions containing fewer documents.
+       Because this configuration uses the average document size to determine the number of
+       documents per partition, partitions might not be the same size.
+      
+       **Default:** ``64``
+    
+   * - ``partitioner.options.partition.samplesPerPartition``
+     - The number of samples to take per partition.
+
+       **Default:** ``100``
+    
+   * - ``partitioner.options.partition.partitionKeyProjectionField``
+     - The field name to use for a projected field that contains all the
+       fields used to partition the collection.
+       We recommend changing the value of this property only if each document already
+       contains the ``__idx`` field.
+      
+       **Default:** ``__idx``
+
 .. _conf-mongosamplepartitioner:
 .. _conf-samplepartitioner:
 
 ``SamplePartitioner`` Configuration
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-``SamplePartitioner`` is the default partitioner configuration. This configuration
-lets you specify a partition field, partition size, and number of samples per partition.
+The ``SamplePartitioner`` configuration configuration is similar to the
+:ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
+configuration, but does not use the ``$bucketAuto`` aggregation stage. This configuration lets you specify a partition field,
+partition size, and number of samples per partition. 
 
 To use this configuration, set the ``partitioner`` configuration option to
 ``com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner``.
@@ -328,54 +383,6 @@ The ``SinglePartitionPartitioner`` configuration creates a single partition.
 To use this configuration, set the ``partitioner`` configuration option to
 ``com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner``.
 
-.. _conf-autobucketpartitioner:
-
-``AutoBucketPartitioner`` Configuration
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-The ``AutoBucketPartitioner`` configuration is similar to the
-:ref:`SamplePartitioner <conf-samplepartitioner>`
-configuration, but uses the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
-aggregation stage to paginate the data. By using this configuration, 
-you can partition the data across single or multiple fields, including nested fields.
-
-To use this configuration, set the ``partitioner`` configuration option to
-``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
-
-.. list-table::
-   :header-rows: 1
-   :widths: 35 65
-
-   * - Property name
-     - Description
-     
-   * - ``partitioner.options.partition.fieldList``
-     - The list of fields to use for partitioning. The value can be either a single field
-       name or a list of comma-separated fields.
-      
-       **Default:** ``_id``
-
-   * - ``partitioner.options.partition.chunkSize``
-     - The average size (MB) for each partition. Smaller partition sizes
-       create more partitions containing fewer documents.
-       Because this configuration uses the average document size to determine the number of
-       documents per partition, partitions might not be the same size.
-      
-       **Default:** ``64``
-    
-   * - ``partitioner.options.partition.samplesPerPartition``
-     - The number of samples to take per partition.
-
-       **Default:** ``100``
-    
-   * - ``partitioner.options.partition.partitionKeyProjectionField``
-     - The field name to use for a projected field that contains all the
-       fields used to partition the collection.
-       We recommend changing the value of this property only if each document already
-       contains the ``__idx`` field.
-      
-       **Default:** ``__idx``
-
 Specifying Properties in ``connection.uri``
 -------------------------------------------