You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Merge branch 'gh-pages' of github.com:carpentries-incubator/managing-computational-projects into JC-reorganisation
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
Copy file name to clipboardExpand all lines: README.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,13 +12,15 @@ Materials developed through this project will enable (1) a foundational understa
12
12
13
13
For details about the project and track management related information, please the [Project Management Repository](https://github.com/alan-turing-institute/data-training-for-bioscience/).
14
14
15
-
## Maintainer(s)
15
+
## Developers and Maintainers
16
16
17
-
Current developers and maintainers of this lesson are
18
-
19
-
* Lydia France
20
17
* Malvika Sharan
21
-
* Federico Nanni
18
+
* Julien Colomb
19
+
20
+
### Previous developers
21
+
22
+
* Lydia France was allocated as a developer on this project for six months in 0.5 FTE capacity.
23
+
* Federico Nanni provided supervision for Lydia and contributed to the project planning
Copy file name to clipboardExpand all lines: episodes/02-motivation.md
+49-27Lines changed: 49 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,21 +21,28 @@ You are also likely to work on your project with other members of the lab, and t
21
21
22
22
<imgsrc="../fig/skill-spectrum.jpg"alt="Researchers represented in a map indicating their journey to understand and apply computational approaches. Some may have just started their journey, some may have come far in the learning and some may have gained proficiency based on their research requirements."width="500"/>
23
23
24
-
_We want to acknowledge the data science knowledge will vary. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807
24
+
*We all may have dfferent research and data science expertise. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807*
25
25
26
-
Contents of training introduces methods and concepts to manage individuals and teams working on any computational project, which in the current era is literally all research projects.
26
+
> ## Why are you here
27
+
> Discuss why you/learners are taking this course, what are the expectations.
28
+
> Does the expectations align with the relevance of data science and content of this course?
29
+
{: .discussion}
30
+
31
+
Contents of this training material introduces methods and concepts to manage individuals and teams working on any computational project, which in the current era is literally all research projects.
27
32
It is *not* about learning how to write code, but building a foundational understanding for computational methods that could be applied to your research.
28
33
Furthermore, this training will provide guidance for facilitating collaboration and data analysis using tools like research data management, version control or code review.
29
34
30
-
We believe that the data science skills you will learn in this training will make your research process better. In the following sections, we will detail what we mean by "better".
35
+
We acknowledge the data science knowledge will vary.
36
+
Nonetheless, we believe that the data science skills you will learn in this training will make your research process better. In the following sections, we will detail what we mean by "better".
31
37
32
38
## How data science will improve your research ?
33
39
34
40
<imgsrc="../fig/healthy-research-tree.jpg"alt="Researchers pour water on a tree, the water represents data science, the tree is the research."width="500"/>
35
41
36
-
_Data science makes research flourish. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807
37
-
42
+
*Data science makes research flourish. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807*
38
43
44
+
> ## It is mostly about being efficient
45
+
>
39
46
> Data science brings some structure in how data is collected, processed and analysed, making it easier to collaborate on a project, to publish extra research outputs and leveraging some extra potential your data may have.
40
47
In the past, it helped me drive new hypotheses, detect problems with the research design early, and reduce the sample size needed to drive a solid conclusion.
41
48
Eventually, it made my research more robust and trustworthy.
@@ -45,7 +52,7 @@ But in the end, my real motivation is efficiency: very soon, the time I invested
45
52
46
53
There are different ways to organise the different foreseen improvement, we decided here to start with improvement in the final result, improvement in the research process, and finally aspects of community building.
47
54
48
-
### Nicer paper
55
+
### Using code for nicer paper
49
56
50
57
#### Powerful statistics
51
58
@@ -72,7 +79,11 @@ One can also automate the figure design choice, so that all figures look similar
72
79
Similarly, the production of several version of the same figure is very easy.
73
80
For example, one can use different color pallette, one using the palette usually used in the field (the one your supervisor wants to see), and one for color-blind readers.
74
81
75
-
**Example of single flights from different bees shown in supplemnentary data:** Menzel, R., Greggers, U., Smith, A., Berger, S., Brandt, R., Brunke, S., ...Watzl, S. (2005). Honey bees navigate according to a map-like spatial memory. Proceedings of the National Academy of Sciences of the United States of America, 102(8), 3040. doi: [10.1073/pnas.0408550102](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC549458/)
82
+
> ## Single flights from different bees.
83
+
>
84
+
> See a good example of data representation in differen format single flights from different bees shown in supplemnentary data: *Menzel, R., Greggers, U., Smith, A., Berger, S., Brandt, R., Brunke, S., ...Watzl, S. (2005). Honey bees navigate according to a map-like spatial memory. Proceedings of the National Academy of Sciences of the United States of America, 102(8), 3040. doi: [10.1073/pnas.0408550102](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC549458/)*
85
+
>
86
+
{: .callout}
76
87
77
88
#### Reproducible analysis
78
89
@@ -81,7 +92,7 @@ As a researcher, assuring computational reproducibility of your results is a rel
81
92
82
93
<imgsrc="../fig/ReproducibleJourney.jpg"alt="Shows a landscape with different checkpoints fpr data, code, tools and result each of which require reproducible practices. There is a woman explaining her reproducibility journey to help new people start their journey"width="500"/>
83
94
84
-
_What to expect in your reproducibility journey. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807
95
+
*What to expect in your reproducibility journey. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807*
85
96
86
97
The reproducibility of an experiment not only requires a detailed description of the methods and reagents used, but also a detailed description of the analysis performed.
87
98
The ultimate description of the analysis is to provide all elements necessary for reproducing the analysis (computational reproducibility).
@@ -103,7 +114,6 @@ While the main recognition currency in academia is still (first) authorship in p
103
114
In particular, datasets and software publication are officially reviewed in the evaluation of certain grant, for example for the Marie-curie european program.
104
115
Data science principles will make it easier to publish datasets, software, reagents or hardware you are anyway producing during the research process.
105
116
106
-
107
117
> By publishing datasets and code, you will not only help other researchers, but gain extra recognition for your work.
108
118
However, open data and open code requires a specific documentation, which we will touch upon in this training.
109
119
>
@@ -134,9 +144,6 @@ It makes also certain that difference in the figures are due to difference in th
134
144
135
145
#### Collaborative working
136
146
137
-
> Facilitating communication and sharing, will make it easier for your colleagues to help you.
138
-
> {: .callout}
139
-
140
147
Within science teams, group work is critical for experimental design and implementation.
141
148
In addition, there are rapid developments in how scientific results and methods are shared, and collaborations have never been more global or rapid.
142
149
This means that several people will likely be working with the same data files.
@@ -145,12 +152,26 @@ Data science allows for the management of
145
152
how one or multiple people work on the same project (as well as the same code).
146
153
It requires different skillsets than those taught in traditional science courses *or* a typical coding class.
147
154
155
+
> ## Who can add to your research?
156
+
>
157
+
> Facilitating communication and sharing will make it easier for your colleagues to help you.
158
+
> Can you think of people who can help you in your research, directly in your lab or at your institution ?
159
+
> Would it help for them to have access to your data? How could they participate,
160
+
> and how can you give them credit?
161
+
>
162
+
>> ## Needs from the future you
163
+
>> It is very interesting to consider your future self as one collaborator in your project.
164
+
>> Anything you may forget in the next three to five years should be documented,
165
+
>> if you want your future self to be able to (re-)analyse the data you are collecting.
166
+
>> Indeed, the advantage of working collaboratively in a project can indeed be translated directly in a project you drive mostly alone.
167
+
>{: .solution}
168
+
{: .discussion}
169
+
148
170
149
171
#### Efficiency
150
172
151
173
> The time invested in your data and code will be paid multiple times by the efficiency improvement in your workflow, if that investment is done early in the project.
152
-
Because one can consider your past self as one of your collaborator,
153
-
the advantage of working collaboratively in a project can indeed be translated directly in a project you drive mostly alone.
174
+
Because one can consider your past self as one of your collaborator, the advantage of working collaboratively in a project can indeed be translated directly in a project you drive mostly alone.
154
175
>
155
176
{: .callout}
156
177
@@ -162,36 +183,39 @@ This applies directly to the example of working on article revisions - will you
162
183
For instance, if a colleague cannot find what data goes with which figures, there are high chances that you will also be unable to find it three years from now.
163
184
In addition, itt is not uncommon to modify the design of the figures multiple times (sometimes back and forth), often modifying all figures at once.
164
185
186
+
> ## Redoing all figures in minutes
165
187
> Once a reviewer ask me to overlay individual data points onto all our 5 boxplots figures.
166
188
The project was an old one, and I had not touched the data for years.
167
-
Finding the right data and redo the all 5 figures would usually take ages using SPSS or excel.
168
-
>But since I used code, I had all figures 15 minutes later.
169
-
(Note, after seeing the new figures, the reviewer agreed that the original version was better).
170
-
> {: .testimonial}
189
+
>Finding the right data and redo the all 5 figures would usually take ages using SPSS or excel.
190
+
>But since I used code, I had all figures 15 minutes later.
191
+
>(Note, after seeing the new figures, the reviewer agreed that the original version was better).
192
+
{: .testimonial}
171
193
172
194
Later on in the project, community advantages are coming in.
173
195
Data and code reusability is not only a mark of research transparency and robustness, it also means you can reuse your own code and data.
174
196
It also means you can reuse code and data produced by other researchers.
175
-
The snow ball effect may be huge, and the objective of this lesson is to allow you to do **better science in less time** ( https://www.nature.com/articles/s41559-017-0160:)
176
197
177
-
> As an example it was estimated that research data management takes about 5% of your time, on the other hand, time lost due to poor data management is estimated to be 15%.
198
+
The snow ball effect may be huge, and the objective of this course is to allow you to do **better science in less time**
178
199
200
+
> ## Invest in data science
201
+
>As an example it was estimated that research data management takes about 5% of your time, on the other hand, time lost due to poor data management is estimated to be 15%.
202
+
> See reference: *Lowndes, J. S. S., Best, B. D., Scarborough, C., Afflerbach, J. C., Frazier, M. R., O’Hara, C. C., Halpern, B. S. (2017). Our path to better science in less time using open data science tools. Nature Ecology & Evolution, 1(0160), 1–7. doi: 10.1038/s41559-017-0160*
203
+
>
204
+
{: .callout}
179
205
180
206
### Team and community building
181
207
182
208
<imgsrc="../fig/research-foundation.jpg"alt="A house representing machine learing and AI is set upon bricks that one person is sliding below the house. On the bricks, we can read data science principles like open science, backups, reproducibiliy, and FAIR principles."width="500"/>
183
209
184
-
_Data science foundations. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807_
210
+
*Data science foundations. The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807*
185
211
186
212
Data science tools will make it easier not only to collaborate with researchers in your lab, but also with researchers outside of your lab, or even with non-researchers (citizen science or software professionals).
187
213
These may bring valuable expertise in the project.
188
-
Being part of a collaborative community will also create
189
-
impact beyond citations and papers, something which starts to be valued by funding agencies, and which make research more fun, valued and interesting.
214
+
Being part of a collaborative community will also create impact beyond citations and papers, something which starts to be valued by funding agencies, and which make research more fun, valued and interesting.
190
215
191
216
We may also add to the pot that creating a network around your research is a critical aspect of building a career in academia.
192
217
Being known as a good and skilled collaborator can open doors to many opportunities.
193
218
194
-
195
219
## A journey starts
196
220
197
221
> You step into the Road, and if you don't keep your feet, there is no knowing where you might be swept off to.
@@ -209,12 +233,10 @@ For instance, *The Turing Way* guide for data science and research provides seve
_ The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807
213
-
236
+
*The Turing Way project illustration by Scriberia for The Turing Way Community Shared under CC-BY 4.0 License. Zenodo. http://doi.org/10.5281/zenodo.3332807*
214
237
215
238
## References
216
239
217
-
218
240
* A Quick Guide to Organizing Computational Biology Projects
219
241
Noble WS (2009) A Quick Guide to Organizing Computational Biology Projects. PLOS Computational Biology 5(7): e1000424. https://doi.org/10.1371/journal.pcbi.1000424
220
242
* Seddighi, M, Allanson, D, Rothwell, G, Takrouri, K. Study on the use of a combination of IPython Notebook and an industry-standard package in educating a CFD course. Comput Appl Eng Educ. 2020; 28: 952– 964. https://doi.org/10.1002/cae.22273
Copy file name to clipboardExpand all lines: episodes/09-rdm.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -142,7 +142,7 @@ You can find a more detailed [overview of the FAIR principles by GO FAIR](https:
142
142
143
143
### Summary of "FAIR - How To"
144
144
145
-
> We have provided an additional lesson to discuss the How-Tos of FAIR principles in the context of data and software. Please see details in [](../../_extra/-4-FAIRHowTo.md).
145
+
> We have provided an additional lesson to discuss the How-Tos of FAIR principles in the context of data and software. See [FAIR How-To for data and software](../../_extra/-4-FAIRHowTo.md) for detail.
146
146
> - Reference: E. L.-Gebali, S. (2022). BOSSConf_2022_Research_Data_Management. Zenodo. doi: [10.5281/zenodo.6490583](https://doi.org/10.5281/zenodo.6490583)
0 commit comments