You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tree/dataframe/src/RDataFrame.cxx
+22-4Lines changed: 22 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -1265,10 +1265,25 @@ In that case, RDataFrame will snapshot the filtered columns in a memory-efficien
1265
1265
default-constructed object in case of classes. If none of the filters pass like in row 6, the entire event is omitted from the snapshot.
1266
1266
1267
1267
To tell apart a genuine `0` (like `x` in row 0) from a variation that didn't pass the selection, RDataFrame writes a bitmask for each event, indicating which variations
1268
-
are valid (see last column). A mapping of column names to this bitmask is placed in the same file as the output dataset, and automatically loaded when
1269
-
RDataFrame opens a file that was snapshot with variations.
1270
-
Attempting to read such missing values with RDataFrame will produce an error, but RDataFrame can either skip these values or fill in defaults as
1271
-
described in the \ref missing-values "section on dealing with missing values".
1268
+
are valid (see last column). The bitmask is implemented as a 64-bit `std::bitset` in memory, written to the output dataset as a `std::uin64_t`.
1269
+
Thus, every 64 columns a new bitmask must be written to accommodate the bits for the next 64 columns.
1270
+
1271
+
Each column stored in the output is connected to exactly one bit in one bitmask. A mapping of column names to the corresponding bitmask is placed in the
1272
+
same file as the output dataset, with a name that follows the pattern `"R_rdf_branchToBitmaskMapping_" + NAME_OF_THE_DATASET`. In this mapping, each
1273
+
column name is connected to one bitmask, and one particular bit in that bitmask. For example, in the same file as the dataset "Events" there would be
1274
+
an object named `R_rdf_branchToBitmaskMapping_Events`. This object for example would describe a connection such as:
1275
+
1276
+
~~~
1277
+
muon_pt --> (R_rdf_mask_Events_0, 42)
1278
+
~~~
1279
+
1280
+
which means that the validity of the column `muon_pt` is established by the bit `42` in the bitmask found in the column `R_rdf_mask_Events_0`.
1281
+
1282
+
When RDataFrame opens a file, it checks for the existence of this mapping between columns and bitmasks, and loads it automatically if found. As such,
1283
+
RDataFrame makes the treatment of the various bitmap maskings completely transparent to the user.
1284
+
1285
+
In case certain values are labeled invalid by the corresponding bit, this will result in reading a missing value. The semantics of such a scenario follow the
1286
+
rules described in the \ref missing-values "section on dealing with missing values" and can be dealt with accordingly.
1272
1287
1273
1288
\note Snapshot with variations is currently restricted to single-threaded TTree snapshots.
1274
1289
@@ -1780,6 +1795,9 @@ more of its entries. For example:
1780
1795
- When joining different datasets horizontally according to some index value
1781
1796
(e.g. the event number), if the index does not find a match in one or more
1782
1797
other datasets for a certain entry.
1798
+
- If, for a certain event, the value of a certain column is invalid because
1799
+
it results from a previous processing which involved systematic variations
1800
+
and that value was removed by a selection. For more details, see \ref snapshot-with-variations.
1783
1801
1784
1802
For example, suppose that column "y" does not have a value for entry 42:
0 commit comments