@@ -17,47 +17,47 @@ Project: Efficient Python routines for analysis on massively multi-threaded plat
17
17
Submitted by- Deepanshu Thakur
18
18
******************************
19
19
20
- I spend my last 3 months working on `GSoC project `_. My GSoC project was
21
- related with writing the bindings of the Hydra C++ library. Hydra is a header
22
- only C++ library designed and used to run on Linux platforms. Hydra is a
20
+ I spent my last 3 months working on a `GSoC project `_. My GSoC project was
21
+ related with writing the bindings of the Hydra C++ library. Hydra is a header-only
22
+ C++ library designed and used to run on Linux platforms. Hydra is a
23
23
templated C++11 library designed to perform common High Energy Physics data
24
- analyses on massively parallel platforms. The idea of this GSoC project is to
25
- provide the bindings of the Hydra library, so that the python support for
26
- Hydra library can be added and python can be used for the prototyping or
24
+ analysis on massively parallel platforms. The idea of this GSoC project was to
25
+ provide the Python bindings for the Hydra library, so that the Python support
26
+ can be added to the overall Hydra project and Python can be used for the prototyping or
27
27
development.
28
28
29
29
30
30
.. _GSoC project : https://summerofcode.withgoogle.com/projects/#6669304945704960
31
31
32
- My original proposal deliverables and my final output looks a little bit
33
- different and there are some very good reasons for it. The change of
32
+ My original proposal deliverables and final output ended up looking a little bit
33
+ different, and there are some very good reasons for it. The change of
34
34
deliverables will become evident in the discussion of the design challenges
35
35
and choices later in the report. In the beginning the goal was to write the
36
36
bindings for the ``Data Fitting ``, ``Random Number Generation ``,
37
37
``Phase-Space Monte Carlo Simulation ``, ``Functor Arithmetic `` and
38
38
``Numerical integration ``, but we ended up having the bindings for
39
39
``Random Number Generation `` and ``Phase-Space Monte Carlo Simulation `` only.
40
- (Though remaining classes can be binded with some extra efforts but we do
40
+ (The remaining classes can be binded with some extra effort but we do
41
41
not have time left under the current scope of GSoC, so I have decided to
42
- continue with the project outside the scope of GSoC.)
42
+ continue with the project outside the scope of GSoC given my interest in the project .)
43
43
44
44
45
- Choosing proper tools
46
- *********************
45
+ Choosing the proper tools
46
+ *************************
47
47
48
- Let me take you to my 3 months journey. First step was to find a tool or
49
- package to write the bindings. Several options were in principle available to
50
- write the bindings for example in the beginning we tried to evaluate the
51
- `SWIG `_.
48
+ Let me take you though my three-month journey. First step was to find a tool or
49
+ package to write the bindings with . Several options were in principle available to
50
+ write the bindings. For example, at the beginning we tried to evaluate the
51
+ `SWIG `_ project .
52
52
But the problem with SWIG is, it is very complicated to use and second it
53
53
does not support the ``variadic templates `` while Hydra underlying
54
54
`Thrust library `_ depends heavily on variadic templates. After trying hands
55
55
with SWIG and realizing it cannot fulfill our requirements, we turned our
56
- attention to `Boost.Python `_ which looks quite promising and a very large
57
- project but this large and complex suite project have so many tweaks and
58
- hacks so that it can work on almost any compiler but with added so many
59
- complexities and cost. Finally we turned our attention to use `pybind11 `_.
60
- A quote taken from pybind11 documentation,
56
+ attention to `Boost.Python `_, which looked quite promising. It is a very large
57
+ project; but this large and complex suite project has so many tweaks and
58
+ hacks so that it can work on almost any compiler. It does add much
59
+ complexity and cost. Finally, we turned our attention to the newer `pybind11 `_ project .
60
+ A quote taken from the pybind11 documentation,
61
61
62
62
Boost is an enormously large and complex suite of utility libraries
63
63
that works with almost every C++ compiler in existence. This compatibility
@@ -80,31 +80,30 @@ to go ahead with pybind11. Next step was to `familiarize myself`_ with pybind11.
80
80
The Basic design problem
81
81
************************
82
82
83
- Now we needed to solve the basic design problem which is the `CRTP idiom `_.
84
- Hydra library relies on the CRTP idiom to avoid runtime overhead. I
83
+ The basic design problem is the `CRTP idiom `_.
84
+ The Hydra library relies on the CRTP idiom to avoid runtime overhead. I
85
85
investigated a lot about CRTP and it took a little while to finally come up
86
- with a solution that can work with any number N. It means our class can accept
87
- any number of particles at final states. (denoted by N) If you know about
88
- CRTP, it is a type of static polymorphism or compile time polymorphism. The
89
- idea that I implemented was to take a parameter from python and based on that
86
+ with a solution that can work with any number of final-state particles (denoted N) often used in Hydra applications.
87
+ If you know about CRTP, it is a type of static polymorphism, or compile-time polymorphism. The
88
+ idea that I implemented was to take a parameter from Python and, based on that
90
89
parameter, I was writing the bindings in a new file, compiling and generating
91
- them on runtime with system calls. Unfortunately generating bindings at
90
+ them on runtime with system calls. Unfortunately, generating bindings at
92
91
runtime and compiling them would take a lot of time and so, it is not
93
- feasible for user to each time wait for few minutes before actually be
94
- able to use the generated package. We decided to go ahead with fixed number
95
- of values. Means we generate bindings for a limited number of particles.
96
- Currently python bindings for classes supports up to 10 (N = 10) number of
97
- particles at final state. We can make that to work with any number we want,
92
+ feasible for a user to each time wait for a few minutes before actually being
93
+ able to use the generated package from Python . We decided to go ahead with a fixed number
94
+ of values of N. It means we generate the bindings for a limited number of particles.
95
+ Currently the Python bindings for the Hydra classes support up to 10 (N = 10) number of
96
+ particles in the final state. Note that we can make that to work with any number we want,
98
97
as our binding code is written within a macro, so it is just a matter of
99
- writing additional 1 extra call to make it use with extra value of N.
98
+ writing additional and trivial-to-add extra calls to make the bindings work for extra values of N.
100
99
101
100
.. _CRTP idiom : https://en.wikipedia.org/wiki/Curiously_recurring_template_pattern
102
101
103
102
104
- The Hydra Binding
105
- *****************
103
+ The Hydra bindings
104
+ ******************
106
105
107
- Now that the approach was decided, we jump into the bindings of Hydra.
106
+ Now that the approach was decided, we jumped into the bindings of Hydra.
108
107
(Finally after so many complications but unfortunately this was not the
109
108
end of them.) We decided to bind the most important classes first,
110
109
``Random Number Generation `` and ``Phase-Space Monte Carlo Simulation ``.
@@ -121,20 +120,20 @@ to generate the phase space monte carlo simulation.
121
120
[F. James, Monte Carlo Phase Space, CERN 68-15 (1968)]
122
121
(https://cds.cern/ch/record/275743).
123
122
124
- The Momentum and Energy units are GeV/C, GeV/C^2 . The PhaseSpace monte
125
- carlo class depends on the ``Vector3R ``, ``Vector4R `` and ``Events `` classes.
123
+ The momentum and energy units are GeV/c and GeV/c^2, respectively . The PhaseSpace Monte
124
+ Carlo class depends on the ``Vector3R ``, ``Vector4R `` and ``Events `` classes.
126
125
Thus PhaseSpace class cannot be binded before without any of the above classes.
127
126
128
127
The ``Vector3R `` and ``Vector4R `` classes were binded. There were some problems
129
- like generating ``__eq__ `` and ``__nq__ `` methods for python side but I solved
130
- them by creating ``lambda function `` and iterating over values and checking
128
+ like generating ``__eq__ `` and ``__nq__ `` methods for the Python side but I solved
129
+ them by creating ``lambda functions `` and iterating over values and checking
131
130
if they satisfy the conditions or not. The ``Vector4R `` or four-vector class
132
- represents a particle. The idea is I first bind the particles class
131
+ represents a particle. The idea is I first bound the particles class
133
132
(the four-vector class) than I had to bind the ``Events `` class that will
134
- hold the Phase Space generated by the ``PhaseSpace `` class, and then bind the
133
+ hold the Phase Space events generated by the ``PhaseSpace `` class, and then bind the
135
134
actual ``PhaseSpace `` class. The ``Events `` class were not so easy to bind
136
135
because they were dependent on the ``hydra::multiarray `` and without their
137
- bindings, the ``Events `` class was impossible to bind. Thanks to my mentor
136
+ bindings, the ``Events `` class was impossible to bind. Thanks to my mentors
138
137
who had already binded these bindings for ``Random `` class with some tweaks on
139
138
the pybind11’s bind_container itself. We even faced some design issues of
140
139
Events class in Hydra itself. But eventually after solving these problems,
@@ -165,7 +164,7 @@ After completing the PhaseSpace code, I quickly converted the code into macro
165
164
for supporting up-to 10 particles.
166
165
167
166
Now the PhaseSpace class was working perfectly! Next step was to create a
168
- series of test cases and documentation and of-course the example of
167
+ series of test cases, documentation, and of-course the example of
169
168
PhaseSpace class in action. The remaining algorithms that I named at the
170
169
start of the article are left to implement.
171
170
@@ -178,17 +177,17 @@ things not only related with programming but related with high energy physics.
178
177
I learned about *Monte Carlo Simulations *, and how they can be used to solve
179
178
challenging real life problems. I read and studied a research paper
180
179
( https://cds.cern.ch/record/275743/files/CERN-68-15.pdf ), learned about
181
- particle decays, learned the insights of C++ varidiac templates,
180
+ particle decays, learned the insights of C++ variadic templates,
182
181
wrote a blog about `CRTP `_, learned how to compile a
183
- python function and why simple python functions cannot be used in
182
+ Python function and why simple Python functions cannot be used in
184
183
multithreaded environments. Most importantly I learned how to structure
185
184
a project from scratch, how important documentation and test cases are.
186
185
187
186
188
187
.. _CRTP : https://medium.com/@deepanshu2017/a-curiously-recurring-python-d3a441a58174
189
188
190
189
191
- Special Thanks
190
+ Special thanks
192
191
**************
193
192
194
193
Shoutout to my amazing mentors. I would like to thank
0 commit comments