Skip to content

Commit 5fca74e

Browse files
committed
clean up duplicates
1 parent 66f5608 commit 5fca74e

File tree

1 file changed

+10
-14
lines changed

1 file changed

+10
-14
lines changed

episodes/03-overview.md

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,15 +28,13 @@ This program will teach you best practices in data science project management an
2828
This material will help you to manage a research project that comprise some *online collaborative working*,
2929
has a relatively *big team, where people have complementary skills*,
3030
use *coding or programming*, as well as the *reuse of code*,
31-
and last but not least, aim at producing a *reproducible analysis*.
31+
and last but not least, aim at producing a *reproducible analysis*, as is pictured below.
3232

33-
Here we give an short overview of the topics that will be covered in this course.
34-
Note that the course episode split follows a different logic, and you will find training linked to each five of these data science specifics in most episodes.
35-
As pictured below, the specifics of data science projects can be grouped in five main topics: working online, working with a heterogenous and relative big team, writing code, reusing code, and aiming at a reproducible analysis.
36-
This latter point being central to data science practices.
3733

3834
<img src="../fig/datasciencespecifics.jpg" alt="Specicity of data science project. Five blocks (working online, large teams whose members have with specialised skills, writing code and re-using code) are placed around a central block where reproducible analysis is written. Data specifics by Julien Colomb CC-BY 4.0 " width="500"/>
3935

36+
Here we give an short overview of the topics that will be covered in this course.
37+
Note that the course episode split follows a different logic, and you will find training linked to each five of these data science specifics in most episodes.
4038

4139

4240
## Online work
@@ -47,24 +45,22 @@ Discussions are also more difficult to organise and meetings are more complex to
4745
In this course, we will look at different elements that make this work easier.
4846

4947
Af first, there should be one entry point for the project, where every team member can find the main documentation as well as links to other documents and data.
50-
This starts with setting one `main` URL when setting up the project, as well as using good readme files and templates. The information needs to be updated during the project and shared with the whole team.
48+
This starts with setting `one main URL` when setting up the project, as well as using good readme files and templates. The information needs to be updated during the project and shared with the whole team.
5149

5250
The use of online project management tool (like kaban boards for todo list) can also help members of the team to coordinate their work, and follow their achievements.
5351

5452
## Team science
5553

56-
Because teams can be big, and quite heterogenous in terms of skills (especially computer and programming skills), it is important to follow best practice of team bulding.
54+
Because teams can be big, and quite heterogeneous in terms of skills (especially computer and programming skills), it is important to follow best practice of team building.
5755

5856
In particular, one should set reasonable goals and milestones for the project, and document them in the main documentation.
5957
It is also important that every team member knows what his part is, and that the work is well distributed.
6058

61-
One should make sure every team member is able to use the communication tools set for the team, and take particular care of the organisation of meetings. Data and code should be documented, such that every team member can follow and reuse the work of the other team members.
62-
In this course, we will present ways to foster this documentation process.
63-
59+
One should make sure every team member is able to use the communication tools set for the team, and take particular care of the organisation of meetings. Data and code should be documented (and this documentation work should be fostered), such that every team member can follow and reuse the work of the other team members.
6460

6561
## Involves coding
6662

67-
When data analysis is done via a programming language, things become mostly easier, but this facility has some drawbacks as well as some effects on data management practice.
63+
When data analysis is done via a programming language, things become mostly easier, but this facility has some drawbacks, as well as some effects on data management practice.
6864

6965
First, a data analysis workflow will now start with the computer reading the raw data.
7066
This means that the choice of the data format for the raw data may change, and that manually gathered data should be (easily) computer readable.
@@ -77,7 +73,7 @@ However, errors are easier to spot (doing code reviews and tests) and when the c
7773

7874
## Involves reuse of code
7975

80-
Very soon in a research project, writing code consist mostly of taking code written by someone else and applying it (with some tweaks sometimes) to one own's data.
76+
Very soon in a research project, writing code consist mostly of taking code written by someone else and applying it (with some tweaks sometimes) to one own data.
8177
We will look at ways to find relevant code, make sure it can be trusted, make sure you can legally use it, and ways to cite it (to give recognition the initial software engineer deserves).
8278

8379
In addition, code written in the project will probably be reused, too.
@@ -86,11 +82,11 @@ We will look into best practices to make this reuse easier, both in how the code
8682

8783
## Reproducibility
8884

89-
At the core of data science, the analysis reproducibiliy is both a goal and a huge advantage (in terms of research transparency, trustworthiness and in term of work efficiency).
85+
At the core of data science, the analysis reproducibiliy is both a goal and a huge advantage (in terms of research transparency, trustworthiness and work efficiency).
9086
The use of code is not enough to get a reproducible analysis, one needs to have access to both the code and the data used to produce the research result, a concept called provenance.
9187
This may not be trivial, especially if several version of the code and of the data exist.
9288

93-
In this course, we will have a strong emphasis on version control, while we will introduce the concepts (and some tools) of provenance, as well as literate programming (reproducible reports and executable papers).
89+
In this course, we will have a strong emphasis on version control, while we will introduce the concepts (and some tools) of provenance, as well as literate programming (reproducible reports and executable papers), where the code, the figure and explanatory text are bound in the same file.
9490

9591

9692
{% include links.md %}

0 commit comments

Comments
 (0)