@@ -7,7 +7,7 @@ Batch Read Configuration Options
77.. contents:: On this page
88 :local:
99 :backlinks: none
10- :depth: 1
10+ :depth: 2
1111 :class: singlecol
1212
1313.. facet::
@@ -178,26 +178,81 @@ dividing the data into partitions, you can run transformations in parallel.
178178This section contains configuration information for the following
179179partitioner:
180180
181+ - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
181182- :ref:`SamplePartitioner <conf-samplepartitioner>`
182183- :ref:`ShardedPartitioner <conf-shardedpartitioner>`
183184- :ref:`PaginateBySizePartitioner <conf-paginatebysizepartitioner>`
184185- :ref:`PaginateIntoPartitionsPartitioner <conf-paginateintopartitionspartitioner>`
185186- :ref:`SinglePartitionPartitioner <conf-singlepartitionpartitioner>`
186- - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
187187
188188.. note:: Batch Reads Only
189189
190190 Because the data-stream-processing engine produces a single data stream,
191191 partitioners do not affect streaming reads.
192192
193+ .. _conf-autobucketpartitioner:
194+
195+ ``AutoBucketPartitioner`` Configuration
196+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
197+
198+ The ``AutoBucketPartitioner`` is the default partitioner configuration and uses
199+ the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
200+ aggregation stage to paginate the data. By using this configuration,
201+ you can partition the data across single or multiple fields, including nested
202+ fields.
203+
204+ .. note:: Compound Keys
205+
206+ The ``AutoBucketPartitioner`` configuration requires {+mdb-server+} version
207+ 7.0 or higher to support compound keys.
208+
209+ To use this configuration, set the ``partitioner`` configuration option to
210+ ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
211+
212+ .. list-table::
213+ :header-rows: 1
214+ :widths: 35 65
215+
216+ * - Property name
217+ - Description
218+
219+ * - ``partitioner.options.partition.fieldList``
220+ - The list of fields to use for partitioning. The value can be either a single field
221+ name or a list of comma-separated fields.
222+
223+ **Default:** ``_id``
224+
225+ * - ``partitioner.options.partition.chunkSize``
226+ - The average size (MB) for each partition. Smaller partition sizes
227+ create more partitions containing fewer documents.
228+ Because this configuration uses the average document size to determine the number of
229+ documents per partition, partitions might not be the same size.
230+
231+ **Default:** ``64``
232+
233+ * - ``partitioner.options.partition.samplesPerPartition``
234+ - The number of samples to take per partition.
235+
236+ **Default:** ``100``
237+
238+ * - ``partitioner.options.partition.partitionKeyProjectionField``
239+ - The field name to use for a projected field that contains all the
240+ fields used to partition the collection.
241+ We recommend changing the value of this property only if each document already
242+ contains the ``__idx`` field.
243+
244+ **Default:** ``__idx``
245+
193246.. _conf-mongosamplepartitioner:
194247.. _conf-samplepartitioner:
195248
196249``SamplePartitioner`` Configuration
197250~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
198251
199- ``SamplePartitioner`` is the default partitioner configuration. This configuration
200- lets you specify a partition field, partition size, and number of samples per partition.
252+ The ``SamplePartitioner`` configuration configuration is similar to the
253+ :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
254+ configuration, but does not use the ``$bucketAuto`` aggregation stage. This configuration lets you specify a partition field,
255+ partition size, and number of samples per partition.
201256
202257To use this configuration, set the ``partitioner`` configuration option to
203258``com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner``.
@@ -328,54 +383,6 @@ The ``SinglePartitionPartitioner`` configuration creates a single partition.
328383To use this configuration, set the ``partitioner`` configuration option to
329384``com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner``.
330385
331- .. _conf-autobucketpartitioner:
332-
333- ``AutoBucketPartitioner`` Configuration
334- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
335-
336- The ``AutoBucketPartitioner`` configuration is similar to the
337- :ref:`SamplePartitioner <conf-samplepartitioner>`
338- configuration, but uses the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
339- aggregation stage to paginate the data. By using this configuration,
340- you can partition the data across single or multiple fields, including nested fields.
341-
342- To use this configuration, set the ``partitioner`` configuration option to
343- ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
344-
345- .. list-table::
346- :header-rows: 1
347- :widths: 35 65
348-
349- * - Property name
350- - Description
351-
352- * - ``partitioner.options.partition.fieldList``
353- - The list of fields to use for partitioning. The value can be either a single field
354- name or a list of comma-separated fields.
355-
356- **Default:** ``_id``
357-
358- * - ``partitioner.options.partition.chunkSize``
359- - The average size (MB) for each partition. Smaller partition sizes
360- create more partitions containing fewer documents.
361- Because this configuration uses the average document size to determine the number of
362- documents per partition, partitions might not be the same size.
363-
364- **Default:** ``64``
365-
366- * - ``partitioner.options.partition.samplesPerPartition``
367- - The number of samples to take per partition.
368-
369- **Default:** ``100``
370-
371- * - ``partitioner.options.partition.partitionKeyProjectionField``
372- - The field name to use for a projected field that contains all the
373- fields used to partition the collection.
374- We recommend changing the value of this property only if each document already
375- contains the ``__idx`` field.
376-
377- **Default:** ``__idx``
378-
379386Specifying Properties in ``connection.uri``
380387-------------------------------------------
381388
0 commit comments