-
Notifications
You must be signed in to change notification settings - Fork 38
[930][evaluation] implement CSVReader #932
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iluise I have not tried it but I trust you did. Approved, and here are a couple of comments (small)
self.metric = eval_cfg.get("metric") | ||
self.region = eval_cfg.get("region") | ||
|
||
def rename_channels(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you return a pd.DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, personally, I would write this as a little helper function:
class CSVReader:
...
pd_data = pd.read_csv(self.csv_path, index_col=0)
self.data = _rename_channels(pd_data)
def _rename_channels(data) -> pd.DataFrame:
# No need for self.data here
def rename_channels(self) -> str: | ||
""" | ||
Rename channel names to include underscore between letters and digits. | ||
E.g., 'z500' -> 'z_500', 't850' -> 't_850', '2t' -> '2t', '10ff' -> '10ff' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, why do we need to do that renaming? I trust you it was necessary, just put a line in the docstring
_logger.info(f"RUN {run_id}: Getting data...") | ||
|
||
reader = WeatherGenReader(run, run_id, private_paths) | ||
type = run.get("type", "zarr") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for putting a sensible default!
and region == reader.region | ||
and stream == reader.stream | ||
): | ||
data = reader.data.values[np.newaxis, :, :, np.newaxis].T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style note: you colud have
else:
data = np.full(
(
len(available_data.samples),
len(available_data.fsteps),
len(available_data.channels),
1,
),
np.nan,
)
/ f"{reader.run_id}_{stream}_{region}_{metric}_epoch{reader.epoch:05d}.json" | ||
) | ||
_logger.debug(f"Looking for: {score_path}") | ||
if hasattr(reader, "data") and reader.data is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasattr
is not a good habit because it is really hard for humans and type checkers to figure out if an python object has an attribute. Here is what you can do, which then vscode can rename/check for you:
class Reader:
data: pd.DataFrame | None # Data attributes (if specified)
def __init__(self, eval_cfg: dict, run_id: str, private_paths: dict | None = None):
...
self.data = None
...
# No change to WG reader or CSVReader
# Now you can directly use:
if reader.data is not None:
self.csv_path = eval_cfg.get("csv_path") | ||
assert self.csv_path is not None, "CSV path must be provided in the config." | ||
|
||
self.data = pd.read_csv(self.csv_path, index_col=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing I would do is cast all the values to np.float32 (or float). pandas tries to be very clever and would for example use int32 if the data allows. I am not sure if xarray can deal with that later.
Description
Add reader to retrieve CSV scores from files generated with Quaver. @Jubeku
quaver scores can be plotted in the FastEvaluation package by adding this in the config (you need to have them locally first):
Issue Number
Closes #930
Is this PR a draft? Mark it as draft.
Checklist before asking for review
./scripts/actions.sh lint
./scripts/actions.sh unit-test
./scripts/actions.sh integration-test
launch-slurm.py --time 60