Skip to content

Commit 5435955

Browse files
authored
Merge pull request #357 from PTsolvers/pa-bughunting
Towards JustRelax v0.5.0
2 parents e89825d + 035cc51 commit 5435955

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+2116
-273
lines changed

.github/workflows/Dependency.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,14 @@ jobs:
99
- name: Checkout code
1010
uses: actions/checkout@v5
1111

12-
- name: Check for GLMakie and JustPIC dependencies
12+
- name: Check for GLMakie and GMG dependencies
1313
run: |
1414
if grep -q "GLMakie" ./Project.toml; then
1515
echo "GLMakie dependency found, failing the test."
1616
exit 1
1717
fi
18-
echo "Neither GLMakie dependencies found."
18+
if grep -q "GeophysicalModelGenerator" ./Project.toml; then
19+
echo "GeophysicalModelGenerator dependency found, failing the test."
20+
exit 1
21+
fi
22+
echo "Neither GLMakie or GMG dependencies found."

.github/workflows/ci.yml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ jobs:
2121
fail-fast: false
2222
matrix:
2323
version:
24-
- '1.10'
25-
- '1.11'
26-
- '1.12'
24+
- 'lts'
25+
- '1'
26+
# - 'pre'
2727
# - 'nightly'
2828
os:
2929
- ubuntu-latest
@@ -35,11 +35,11 @@ jobs:
3535
include:
3636
- os: macOS-latest
3737
arch: aarch64
38-
version: '1.10'
38+
version: 'lts'
3939
allow_failure: false
4040
- os: macOS-latest
4141
arch: aarch64
42-
version: '1.11'
42+
version: '1'
4343
allow_failure: false
4444
- os: macOS-latest
4545
arch: aarch64

Project.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ version = "0.4.2"
77
Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
88
CellArrays = "d35fcfd7-7af4-4c67-b1aa-d78070614af4"
99
Crayons = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
10+
ExactFieldSolutions = "2a6b1ac7-2de5-4354-bffb-bf009f6a4bef"
1011
GeoParams = "e018b62d-d9de-4a26-8697-af89c310ae38"
1112
HDF5 = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
1213
ImplicitGlobalGrid = "4d7a3746-15be-11ea-1130-334b0c4f5fa0"
@@ -26,17 +27,20 @@ WriteVTK = "64499a7a-5c06-52f2-abe2-ccb03c286192"
2627
[weakdeps]
2728
AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
2829
CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
30+
Makie = "ee78f7c6-11fb-53f2-987a-cfe4a2b5a57a"
2931

3032
[extensions]
3133
JustRelaxAMDGPUExt = "AMDGPU"
3234
JustRelaxCUDAExt = "CUDA"
35+
JustRelaxMakieExt = "Makie"
3336

3437
[compat]
3538
AMDGPU = "1, 2"
3639
Adapt = "4"
3740
CUDA = "5"
3841
CellArrays = "0.3.2"
3942
Crayons = "4.1.1"
43+
ExactFieldSolutions = "0.1.6"
4044
GeoParams = "0.7.8"
4145
HDF5 = "0.17.1"
4246
ImplicitGlobalGrid = "0.16"

_typos.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,4 @@ oly = "oly"
44
iy = "iy"
55
pn = "pn"
66
nd = "nd"
7+
Heros = "Heros"

docs/make.jl

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,10 @@ makedocs(;
110110
"Rheology" => "man/plume3D/rheology.md",
111111
"Setting up the model" => "man/plume3D/plume3D.md",
112112
],
113+
"Checkpointing/Restart" => Any[
114+
"Checkpointing" => "man/checkpointing.md",
115+
"Restart" => "man/restart.md",
116+
],
113117
],
114118
"List of functions" => "man/listfunctions.md",
115119
"References" => Any[

docs/src/man/checkpointing.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Checkpointing
2+
3+
It is common to save the state of a simulation at regular intervals, especially for long-running simulations. This allows you to restart the simulation from the last saved state in case of interruptions or to continue the simulation at a later time without losing progress. JustRelax provides a simple way to save and load checkpoint files. Two checkpointing functions are available for the most common file extensions (HDF5 and JLD2). By loading the `DataIO` module, you gain access to these checkpointing functions as well as VTK saving functions for later visualization with [ParaView](https://www.paraview.org/).
4+
For more details on the vtk, see the [here](./subduction2D/subduction2D.md).
5+
6+
!!! tip "JustPIC checkpointing" A similar checkpointing function is defined by [JustPIC.jl](https://juliageodynamics.github.io/JustPIC.jl/dev/IO/) to save the state of the particles.
7+
8+
9+
:::code-group
10+
11+
```julia [2D module]
12+
using JustRelax, JustRelax.JustRelax2D
13+
using JustRelax.DataIO
14+
```
15+
16+
```julia [3D module]
17+
using JustRelax, JustRelax.JustRelax3D
18+
using JustRelax.DataIO
19+
```
20+
:::
21+
22+
Unless you have a specific reason to use hdf5 files, we recommend using JLD2 files for checkpointing. JLD2 files are generally faster to read and write, and retain the original data types of the variables. However, we made sure to provide both options for maximum flexibility.
23+
24+
### Saving and loading checkpoint with HDF5
25+
The HDF5 checkpointing function saves the most important model variables (pressure, temperature, velocity components, viscosity, time, and timestep) to a `checkpoint.h5` file in your destination folder.
26+
27+
```julia
28+
dst = "Your_checkpointing_directory"
29+
checkpointing_hdf5(dst, stokes, thermal.T, time, timestep)
30+
```
31+
32+
To load the checkpoint, use `load_checkpoint_hdf5`. This function returns the variables in the same order as saved:
33+
34+
```julia
35+
fname = joinpath(dst, "checkpoint.h5")
36+
P, T, Vx, Vy, Vz, η, t, dt = load_checkpoint_hdf5(fname)
37+
```
38+
39+
### Saving and loading checkpoint with JLD2
40+
JLD2 checkpointing is recommended for most users due to its speed and ability to preserve Julia data types. In contrast to the HDF5 function, the JLD2 checkpointing function saves all stokes and thermal arrays (optional) while being MPI agnostic. This means that if you run your model with multiple processors, each processor will save its own checkpoint file in the specified directory with MPI rank attached to the name (e.g. `checkpoint0000.jld2`, `checkpoint0001.jld2`). The function automatically handles the naming of these files to avoid overwriting. Additionally, you can save any custom fields by passing them as keyword arguments.
41+
42+
!!! warning "Checkpointing" All checkpointing functions will save the arrays as CPU arrays no matter your backend. This means that if you are using a GPU backend, the arrays will be transferred to the CPU before saving, which may take some time depending on the size of your model.
43+
44+
:::code-group
45+
46+
```julia [Normal use]
47+
dst = "Your_checkpointing_directory"
48+
checkpointing_jld2(dst, stokes, thermal, time, dt)
49+
```
50+
51+
```julia [MPI]
52+
dst = "Your_checkpointing_directory"
53+
checkpointing_jld2(dst, stokes, thermal, time, dt, igg)
54+
```
55+
56+
```julia [Additional fields]
57+
dst = "Your_checkpointing_directory"
58+
checkpointing_jld2(checkpoint, stokes, thermal, t, dt, igg; it = it, custom_field_1 = some_data, custom_field_2 = example_vector)
59+
```
60+
:::
61+
62+
To load the checkpoint, you can use the preexisting `load_checkpoint_jld2` function or use the `JLD2` loading function directly. The `load_checkpoint_jld2` function is MPI agnostic and will automatically load the correct file for each processor based on its rank:
63+
64+
:::code-group
65+
66+
```julia [Normal use]
67+
dst = "Your_checkpointing_directory"
68+
stokes, thermal, t, dt = load_checkpoint_jld2(dst)
69+
```
70+
71+
```julia [MPI]
72+
dst = "Your_checkpointing_directory"
73+
stokes, thermal, t, dt = load_checkpoint_jld2(dst, igg)
74+
```
75+
:::
76+
77+
If you save additional fields, it is the easiest to load the checkpointing file directly using the `JLD2` package. This way, you can access all saved variables by their names:
78+
79+
```julia
80+
using JLD2
81+
fname = joinpath(dst, "checkpoint000$(igg.me).jld2") # Adjust filename for MPI if needed
82+
data = JLD2.load(fname)
83+
```
84+
which then returns a dictionary with all your saved variables. You can access them like this:
85+
86+
```julia
87+
stokes = data["stokes"]
88+
thermal = data["thermal"]
89+
t = data["time"]
90+
dt = data["dt"]
91+
custom_field_1 = data["custom_field_1"]
92+
custom_field_2 = data["custom_field_2"]
93+
# and so on...
94+
```

docs/src/man/restart.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Restarting from a Checkpoint File
2+
3+
To restart a simulation from a previously saved checkpoint file, you can make use of the checkpointing functions described in the [Checkpointing documentation](./checkpointing.md). Depending on the file format you used to save your checkpoint (HDF5 or JLD2), you can load the saved state of your simulation using the corresponding loading function.
4+
5+
In this example, we will demonstrate how to set up a script to restart a simulation from a JLD2 checkpoint file as we can save the entire structures of the `StokesArrays` and `ThermalArrays` which makes it easier to restart the simulation. We will assume that you have already saved a checkpoint file using the `checkpointing_jld2` function for the example of a 2D subduction model.
6+
Ideally, one does not need to change much in the initial script used to start the simulation from scratch. The main difference is that instead of initializing the `StokesArrays` and `ThermalArrays` from scratch, we will load them from the checkpoint file.
7+
For a detailed description of the 2D subduction model setup, please refer to the [2D subduction documentation](./subduction2D/subduction2D.md). The following example can be found [here](https://github.com/PTsolvers/JustRelax.jl/blob/d63ca8f08860859700913b575c9befc33d5c4f2a/miniapps/subduction/2D/Subduction2D_restart).
8+
9+
Load JustRelax necessary modules and define backend.
10+
```julia
11+
using CUDA # comment this out if you are not using CUDA; or load AMDGPU.jl if you are using an AMD GPU
12+
using JustRelax, JustRelax.JustRelax2D, JustRelax.DataIO
13+
const backend_JR = CUDABackend # Options: CPUBackend, CUDABackend, AMDGPUBackend
14+
```
15+
16+
For this benchmark we will use particles to track the advection of the material phases and their information. For this, we will use [JustPIC.jl](https://github.com/JuliaGeodynamics/JustPIC.jl)
17+
```julia
18+
using JustPIC, JustPIC._2D
19+
const backend = CUDABackend # Options: JustPIC.CPUBackend, CUDABackend, JustPIC.AMDGPUBackend
20+
```
21+
22+
!!! tip "Script" Leave most of your original script unchanged and only change the parts we highlight in this example, unless you want to explicitly change some model parameters (e.g., rheology, boundary conditions, etc.). Make sure you dont accidentally overwrite your loaded arrays/particles with new initializations.
23+
24+
## Load and initialize particles fields
25+
The `JustPIC` specific function `TA()` will convert the loaded particles to the correct backend.
26+
```julia
27+
data = load(joinpath("Your_checkpointing_directory", "particles.jld2"))
28+
particles = TA(backend)(Float64, data["particles"])
29+
phases = TA(backend)(Float64, data["phases"])
30+
phase_ratios = TA(backend)(Float64, data["phase_ratios"])
31+
particle_args = TA(backend).(Float64, data["particle_args"])
32+
subgrid_arrays = SubgridDiffusionCellArrays(particles)
33+
# velocity staggered grids
34+
grid_vxi = velocity_grids(xci, xvi, di)
35+
```
36+
37+
## Load Stokes and Thermal arrays from checkpoint file
38+
:::code-group
39+
```julia [Normal use]
40+
dst = "Your_checkpointing_directory"
41+
stokes_cpu, thermal_cpu, t, dt = load_checkpoint_jld2(dst)
42+
```
43+
```julia [MPI]
44+
dst = "Your_checkpointing_directory"
45+
stokes_cpu, thermal_cpu, t, dt = load_checkpoint_jld2(dst, igg)
46+
```
47+
```julia [Additional fields]
48+
dst = "Your_checkpointing_directory"
49+
fname = joinpath(dst, "checkpoint" * lpad("$(igg.me)", 4, "0") * ".jld2")
50+
stokes_cpu, thermal_cpu, t, dt, it, custom_field_1, custom_field_2 = JLD2.load(fname)
51+
```
52+
:::
53+
54+
The loaded arrays are CPU arrays, so we need to convert them to the correct backend.
55+
```julia
56+
stokes = PTArray(backend_JR, stokes_cpu)
57+
thermal = PTArray(backend_JR, thermal_cpu)
58+
```
59+
From here on you should be able to continue the simulation as usual. Make sure to adjust the time loop to start from the loaded time `t` and iteration `it` if you loaded them.

docs/src/man/subduction2D/subduction2D.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,18 @@ inject_particles_phase!(particles, pPhases, (pT, ), (T_buffer, ), xvi)
223223
update_phase_ratios!(phase_ratios, particles, xci, xvi, pPhases)
224224
```
225225

226-
6. **Optional:** Save data as VTK to visualize it later with [ParaView](https://www.paraview.org/)
226+
6. **Optional:** Save checkpoint every 10 time steps
227+
Saving the particles will generate a lot of data so you might want to do this less frequently depending on your model size.
228+
```julia
229+
if rem(it, 10) == 0
230+
checkpoint = joinpath(figdir, "checkpoint")
231+
take(checkpoint)
232+
checkpointing_jld2(checkpoint, stokes, thermal, t, dt, igg; it = it)
233+
checkpointing_particles(checkpoint, particles, igg.me; phases = pPhases, phase_ratios = phase_ratios, particle_args = particle_args, t = t, dt = dt, it = it)
234+
end
235+
```
236+
237+
7. **Optional:** Save data as VTK to visualize it later with [ParaView](https://www.paraview.org/)
227238
```julia
228239
Vx_v = @zeros(ni.+1...)
229240
Vy_v = @zeros(ni.+1...)

ext/JustRelaxMakieExt.jl

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
module JustRelaxMakieExt
2+
3+
using JustRelax
4+
5+
using Makie
6+
7+
include("../src/Plotting/Plotting.jl")
8+
9+
end

miniapps/benchmarks/stokes2D/Blankenbach2D/Benchmark2D_sgd.jl

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,35 @@
1+
const isCUDA = false
2+
# const isCUDA = true
3+
4+
@static if isCUDA
5+
using CUDA
6+
end
7+
18
using JustRelax, JustRelax.JustRelax2D, JustRelax.DataIO
2-
const backend_JR = CPUBackend
9+
10+
const backend_JR = @static if isCUDA
11+
CUDABackend # Options: CPUBackend, CUDABackend, AMDGPUBackend
12+
else
13+
JustRelax.CPUBackend # Options: CPUBackend, CUDABackend, AMDGPUBackend
14+
end
315

416
using ParallelStencil, ParallelStencil.FiniteDifferences2D
5-
@init_parallel_stencil(Threads, Float64, 2) #or (CUDA, Float64, 2) or (AMDGPU, Float64, 2)
617

7-
using JustPIC
8-
using JustPIC._2D
18+
@static if isCUDA
19+
@init_parallel_stencil(CUDA, Float64, 2)
20+
else
21+
@init_parallel_stencil(Threads, Float64, 2)
22+
end
23+
24+
using JustPIC, JustPIC._2D
925
# Threads is the default backend,
1026
# to run on a CUDA GPU load CUDA.jl (i.e. "using CUDA") at the beginning of the script,
1127
# and to run on an AMD GPU load AMDGPU.jl (i.e. "using AMDGPU") at the beginning of the script.
12-
const backend = JustPIC.CPUBackend # Options: CPUBackend, CUDABackend, AMDGPUBackend
13-
28+
const backend = @static if isCUDA
29+
CUDABackend # Options: CPUBackend, CUDABackend, AMDGPUBackend
30+
else
31+
JustPIC.CPUBackend # Options: CPUBackend, CUDABackend, AMDGPUBackend
32+
end
1433
# Load script dependencies
1534
using Printf, LinearAlgebra, GeoParams, CairoMakie, CellArrays
1635

@@ -54,7 +73,7 @@ end
5473
## END OF HELPER FUNCTION ------------------------------------------------------------
5574

5675
## BEGIN OF MAIN SCRIPT --------------------------------------------------------------
57-
function main2D(igg; ar = 1, nx = 32, ny = 32, nit = 1.0e1, figdir = "figs2D", do_vtk = false)
76+
function main2D(igg; ar = 1, nx = 32, ny = 32, nit = 1.0e1, figdir = "figs2D", do_vtk = false, finalize_MPI = true)
5877

5978
# Physical domain ------------------------------------
6079
ly = 1000.0e3 # domain length in y
@@ -76,7 +95,7 @@ function main2D(igg; ar = 1, nx = 32, ny = 32, nit = 1.0e1, figdir = "figs2D", d
7695
# Initialize particles -------------------------------
7796
nxcell, max_xcell, min_xcell = 24, 36, 12
7897
particles = init_particles(
79-
backend, nxcell, max_xcell, min_xcell, xvi, di, ni
98+
backend, nxcell, max_xcell, min_xcell, xvi...
8099
)
81100
subgrid_arrays = SubgridDiffusionCellArrays(particles)
82101
# velocity grids
@@ -392,7 +411,9 @@ function main2D(igg; ar = 1, nx = 32, ny = 32, nit = 1.0e1, figdir = "figs2D", d
392411

393412
@show Urms[Int64(nit)] Nu_top[Int64(nit)]
394413

395-
return nothing
414+
finalize_global_grid(; finalize_MPI = finalize_MPI)
415+
416+
return Urms, Nu_top, trms, thermal.T, xvi
396417
end
397418
## END OF MAIN SCRIPT ----------------------------------------------------------------
398419

0 commit comments

Comments
 (0)