-
Notifications
You must be signed in to change notification settings - Fork 1
Description
This is a small development plan for allowing any Dataset::Derived
to be a parent, as long as one ancestor remains Dataset::Full
. This is the first step of the new dataset pipelines project.
There are three identified steps to be taken in Atlas
:
- Every dataset can be a parent, not just full datasets
- Validation of complete parent/grandparent
- Chain fallback locations to grandparents etc to pick up the correct curves
I will describe what needs to happen for each of them.
Prologue
Before we start, let's clean up some deprecated code and methods with confusing names from the Dataset
and Dataset::Derived
models.
Initializer Inputs
As long as I can remember these have been deprecated, but the code is still present in the Dataset::Derived
model. These inputs were used to initialise Derived
sets that did not originate from ETLocal and thus had no graph values.
- Remove the
InitializerInput
model - Remove any references to and validations of
InitializerInput
from theDataset::Derived
model. This includes everything surroundinguses_deprecated_initializer_inputs
- Remove any affected specs
PARENT_VALUE
In Runtime
Atlas defines methods that can be called for the dataset within nodes and edges to build the present graph (so for Refinery). This includes methods like EB
and PRIMARY_PRODUCTION
. The method PARENT_VALUE
has been unused for a long time and references an obsolete csv
file demands/parent_values.csv
within the dataset. The name for this method can become confusing while we work on this project as well as in the future, so I'd like to get rid of it.
- Remove
PARENT_VALUE
fromAtlas::Runtime
- Remove
parent_values
method fromDataset
- Check for any of these obsolete csv's still hanging around in ETSource
1. Every dataset can be a parent, not just full datasets
Our first step does not include validation of a Full
ancestor yet. We are merely setting up the hierarchy.
In the Dataset::Derived
model, the parent
should be able to be a Derived
dataset:
def parent
Dataset.find(base_dataset) # Was Dataset::Full.find(base_dataset)
end
The same goes for validate_presence_of_base_dataset
.
- Adjust
parent
method - Adjust
validate_presence_of_base_dataset
method - Adjust affected specs
2. Validation of complete parent/grandparent
Because a Full
dataset is the only one that can use EB
methods, and those are still in use, we need to ensure that one ancestor is still a Dataset::Full
and the energy_balance
method of the Derived
dataset is delegated to that set. Otherwise Refinery will not be able to build the graph.
- Add a spec that creates a hierarchy of three datasets
Full
->Derived
->Derived
and test theenergy_balance
method on the grandchild. (It could very well be that this will pass directly) - Add a spec that creates a hierarchy of three datasets
Derived
->Derived
->Derived
and test the creation of the grandchild. Test if the creation throws a validation error. (This we will build in the next todo) - Add validation method
validate_presence_of_full_ancestor
. It will be something like this:
def has_full_parent?
Dataset::Full.exists?(base_dataset) || parent.has_full_parent?
end
def validate_presence_of_full_ancestor
return if has_full_parent?
errors.add(:base_dataset, 'has no Full parent')
end
3. Chain fallback locations to grandparents etc to pick up the correct curves
A Dataset::Derived
may be incomplete because it picks up incomplete items from its parent, like missing curves.
For curves and other csvs the PathResolver
will look in all supplied locations (dataset folders). For each Derived
dataset the PathResolver
is initialised with the method resolve_paths
, which returns an array of locations that can be inspected in order of importance. We should alter this method to recursively look at the parent sets. It will be something like this:
def resolve_paths
[dataset_dir] << parent.resolve_paths
end
- Update the
resolve_paths
method inDataset::Derived
- Update and write new specs to ensure correct behaviour of located curves etc.
Epilogue: Atlas::Scaler
We now successfully made it possible for Derived
datasets to have children. However, the Scaled
datasets (these are Derived
dataset with a scaler attached) just look only at their direct parent for scaling. If this direct parent is not Full
this could possibly lead to some fallout. I did not check this thoroughly as I'd like to discuss our vision on scaled datasets first.
@mabijkerk Let's discuss in our brainstorm how we see the future of scaled datasets and check if and how they are still in use.
NB: if there is time to remove more stuff, I'd like to get rid of the old Preset
still present in Atlas. These represent what we now have as featured scenarios in csv format. This has not been used for years.