@@ -7,7 +7,7 @@ Batch Read Configuration Options
77.. contents:: On this page
88 :local:
99 :backlinks: none
10- :depth: 1
10+ :depth: 2
1111 :class: singlecol
1212
1313.. facet::
@@ -178,26 +178,82 @@ dividing the data into partitions, you can run transformations in parallel.
178178This section contains configuration information for the following
179179partitioner:
180180
181+ - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
181182- :ref:`SamplePartitioner <conf-samplepartitioner>`
182183- :ref:`ShardedPartitioner <conf-shardedpartitioner>`
183184- :ref:`PaginateBySizePartitioner <conf-paginatebysizepartitioner>`
184185- :ref:`PaginateIntoPartitionsPartitioner <conf-paginateintopartitionspartitioner>`
185186- :ref:`SinglePartitionPartitioner <conf-singlepartitionpartitioner>`
186- - :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>`
187187
188188.. note:: Batch Reads Only
189189
190190 Because the data-stream-processing engine produces a single data stream,
191191 partitioners do not affect streaming reads.
192192
193+ .. _conf-autobucketpartitioner:
194+
195+ AutoBucketPartitioner Configuration (default)
196+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
197+
198+ The ``AutoBucketPartitioner`` is the default partitioner configuration. It
199+ samples the data to generate partitions and uses
200+ the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
201+ aggregation stage to paginate. By using this configuration, you can partition
202+ the data across single or multiple fields, including nested fields.
203+
204+ .. note:: Compound Keys
205+
206+ The ``AutoBucketPartitioner`` configuration requires {+mdb-server+} version
207+ 7.0 or higher to support compound keys.
208+
209+ To use this configuration, set the ``partitioner`` configuration option to
210+ ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
211+
212+ .. list-table::
213+ :header-rows: 1
214+ :widths: 35 65
215+
216+ * - Property name
217+ - Description
218+
219+ * - ``partitioner.options.partition.fieldList``
220+ - The list of fields to use for partitioning. The value can be either a single field
221+ name or a list of comma-separated fields.
222+
223+ **Default:** ``_id``
224+
225+ * - ``partitioner.options.partition.chunkSize``
226+ - The average size (MB) for each partition. Smaller partition sizes
227+ create more partitions containing fewer documents.
228+ Because this configuration uses the average document size to determine the number of
229+ documents per partition, partitions might not be the same size.
230+
231+ **Default:** ``64``
232+
233+ * - ``partitioner.options.partition.samplesPerPartition``
234+ - The number of samples to take per partition.
235+
236+ **Default:** ``100``
237+
238+ * - ``partitioner.options.partition.partitionKeyProjectionField``
239+ - The field name to use for a projected field that contains all the
240+ fields used to partition the collection.
241+ We recommend changing the value of this property only if each document already
242+ contains the ``__idx`` field.
243+
244+ **Default:** ``__idx``
245+
193246.. _conf-mongosamplepartitioner:
194247.. _conf-samplepartitioner:
195248
196- `` SamplePartitioner`` Configuration
197- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
249+ SamplePartitioner Configuration
250+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
198251
199- ``SamplePartitioner`` is the default partitioner configuration. This configuration
200- lets you specify a partition field, partition size, and number of samples per partition.
252+ The ``SamplePartitioner`` configuration is similar to the
253+ :ref:`AutoBucketPartitioner <conf-autobucketpartitioner>` configuration, but
254+ does not use the ``$bucketAuto`` aggregation stage. This
255+ configuration lets you specify a partition field, partition size, and number of
256+ samples per partition.
201257
202258To use this configuration, set the ``partitioner`` configuration option to
203259``com.mongodb.spark.sql.connector.read.partitioner.SamplePartitioner``.
@@ -243,8 +299,8 @@ To use this configuration, set the ``partitioner`` configuration option to
243299.. _conf-mongoshardedpartitioner:
244300.. _conf-shardedpartitioner:
245301
246- `` ShardedPartitioner`` Configuration
247- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
302+ ShardedPartitioner Configuration
303+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248304
249305The ``ShardedPartitioner`` configuration automatically partitions the data
250306based on your shard configuration.
@@ -262,8 +318,8 @@ To use this configuration, set the ``partitioner`` configuration option to
262318.. _conf-mongopaginatebysizepartitioner:
263319.. _conf-paginatebysizepartitioner:
264320
265- `` PaginateBySizePartitioner`` Configuration
266- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
321+ PaginateBySizePartitioner Configuration
322+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267323
268324The ``PaginateBySizePartitioner`` configuration paginates the data by using the
269325average document size to split the collection into average-sized chunks.
@@ -292,8 +348,8 @@ To use this configuration, set the ``partitioner`` configuration option to
292348
293349.. _conf-paginateintopartitionspartitioner:
294350
295- `` PaginateIntoPartitionsPartitioner`` Configuration
296- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
351+ PaginateIntoPartitionsPartitioner Configuration
352+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
297353
298354The ``PaginateIntoPartitionsPartitioner`` configuration paginates the data by dividing
299355the count of documents in the collection by the maximum number of allowable partitions.
@@ -320,63 +376,15 @@ To use this configuration, set the ``partitioner`` configuration option to
320376
321377.. _conf-singlepartitionpartitioner:
322378
323- `` SinglePartitionPartitioner`` Configuration
324- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
379+ SinglePartitionPartitioner Configuration
380+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
325381
326382The ``SinglePartitionPartitioner`` configuration creates a single partition.
327383
328384To use this configuration, set the ``partitioner`` configuration option to
329385``com.mongodb.spark.sql.connector.read.partitioner.SinglePartitionPartitioner``.
330386
331- .. _conf-autobucketpartitioner:
332-
333- ``AutoBucketPartitioner`` Configuration
334- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
335-
336- The ``AutoBucketPartitioner`` configuration is similar to the
337- :ref:`SamplePartitioner <conf-samplepartitioner>`
338- configuration, but uses the :manual:`$bucketAuto </reference/operator/aggregation/bucketAuto/>`
339- aggregation stage to paginate the data. By using this configuration,
340- you can partition the data across single or multiple fields, including nested fields.
341-
342- To use this configuration, set the ``partitioner`` configuration option to
343- ``com.mongodb.spark.sql.connector.read.partitioner.AutoBucketPartitioner``.
344-
345- .. list-table::
346- :header-rows: 1
347- :widths: 35 65
348-
349- * - Property name
350- - Description
351-
352- * - ``partitioner.options.partition.fieldList``
353- - The list of fields to use for partitioning. The value can be either a single field
354- name or a list of comma-separated fields.
355-
356- **Default:** ``_id``
357-
358- * - ``partitioner.options.partition.chunkSize``
359- - The average size (MB) for each partition. Smaller partition sizes
360- create more partitions containing fewer documents.
361- Because this configuration uses the average document size to determine the number of
362- documents per partition, partitions might not be the same size.
363-
364- **Default:** ``64``
365-
366- * - ``partitioner.options.partition.samplesPerPartition``
367- - The number of samples to take per partition.
368-
369- **Default:** ``100``
370-
371- * - ``partitioner.options.partition.partitionKeyProjectionField``
372- - The field name to use for a projected field that contains all the
373- fields used to partition the collection.
374- We recommend changing the value of this property only if each document already
375- contains the ``__idx`` field.
376-
377- **Default:** ``__idx``
378-
379- Specifying Properties in ``connection.uri``
380- -------------------------------------------
387+ Specifying Properties in connection.uri
388+ ---------------------------------------
381389
382390.. include:: /includes/connection-read-config.rst
0 commit comments