-
Notifications
You must be signed in to change notification settings - Fork 127
child_birth autorefersh configarations #1668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @Rohit231998, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new automated data import pipeline for US birth and mortality data sourced from the CDC. It establishes the necessary configuration files, including a manifest for scheduled weekly refreshes, metadata for data processing, and mappings for geographical entities and statistical variables. The primary goal is to integrate this new dataset into the system, enabling regular updates and structured representation of child birth and related mortality statistics. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new data import for child birth statistics from the CDC. It includes the necessary configuration files, mappings, and test data.
There are a few issues that need to be addressed:
- The pull request title 'child_birth autorefersh configarations' contains typos and should be corrected to 'child_birth auto-refresh configurations'.
- The
manifest.jsoncontains critical errors in script paths and an incorrectimport_namethat will cause the import to fail or be miscategorized. - The
README.mdfile has some typos and a broken command example that should be fixed for clarity. - Several configuration files have minor formatting issues like missing newlines or commented-out properties that should be reviewed.
Overall, the structure of the new import is good, but the identified issues, especially in the manifest, are important to fix before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample data file is very large. We don't need to include the entire data. About 50-100 records is sufficient. Can we change it to only include 50-100 records?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sample data file is very large. We don't need to include the entire data. About 50-100 records is sufficient. Assuming this was generated through statvar processor, can we change the input file to include smaller dataset as I mentioned in another comment and re-generate this output file again?
|
Triggering review from gemini-code-assist bot again... |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new data import for child birth statistics from the CDC. It includes all the necessary configuration files and test data. The changes look good overall, but there are a few issues to address, mainly in the README.md and manifest.json files to ensure the import can be run correctly. I've also suggested adding trailing newlines to several files for better compatibility. Please also consider fixing the typos in the pull request title ('autorefersh' -> 'autorefresh', 'configarations' -> 'configurations').
| "provenance_url": "https://www.cdc.gov/nchs/nvss/vsrr/provisional-tables.html", | ||
| "provenance_description": "The data set contains USA birth data", | ||
| "scripts": [ | ||
| "./../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative path to download_util_script.py appears to be incorrect. Based on the file structure, the script is in the root util directory. From the manifest's location (statvar_imports/child_birth/), the correct path should be ../../util/download_util_script.py.
| "./../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/", | |
| "../../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/", |
| ###How to run: | ||
|
|
||
| python3 stat_var_processor.py | ||
| --input_data=../../statvar_imports/child_birth/input_files/*.csv - | ||
| -pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> | ||
| --places_resolved_csv=../../statvar_imports/child_birth/place_mapping.csv | ||
| --config_file=../../statvar_imports/child_birth/<filename_of_metadata> | ||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | ||
| --output_path=../../statvar_imports/child_birth/output_files/child_birth | ||
|
|
||
|
|
||
|
|
||
| ###Example | ||
|
|
||
| To Process the files, Run: | ||
|
|
||
| Execute the script inside the folder "/data/tools/statvar_importer/" | ||
|
|
||
| ``` | ||
| python3 stat_var_processor.py | ||
| --input_data=../../statvar_imports/child_birth/input_files/*.csv | ||
| --pv_map=../../statvar_imports/child_birth/pvmap.csv | ||
| --places_resolved_csv=../../statvar_imports/child_birth/places_resolved.csv | ||
| --config_file=../../statvar_imports/child_birth/metadata.csv | ||
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | ||
| --output_path=../../statvar_imports/child_birth/output_files/child_birth | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'How to run' section is confusing, contains placeholders and errors, and the 'Example' section is redundant. It's better to have a single, clear, and correct 'How to run' section. This suggestion replaces the current 'How to run' and 'Example' sections with a single corrected version.
| ###How to run: | |
| python3 stat_var_processor.py | |
| --input_data=../../statvar_imports/child_birth/input_files/*.csv - | |
| -pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> | |
| --places_resolved_csv=../../statvar_imports/child_birth/place_mapping.csv | |
| --config_file=../../statvar_imports/child_birth/<filename_of_metadata> | |
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | |
| --output_path=../../statvar_imports/child_birth/output_files/child_birth | |
| ###Example | |
| To Process the files, Run: | |
| Execute the script inside the folder "/data/tools/statvar_importer/" | |
| ``` | |
| python3 stat_var_processor.py | |
| --input_data=../../statvar_imports/child_birth/input_files/*.csv | |
| --pv_map=../../statvar_imports/child_birth/pvmap.csv | |
| --places_resolved_csv=../../statvar_imports/child_birth/places_resolved.csv | |
| --config_file=../../statvar_imports/child_birth/metadata.csv | |
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf | |
| --output_path=../../statvar_imports/child_birth/output_files/child_birth | |
| ``` | |
| ###How to run: | |
| Execute the script from within the `/data/tools/statvar_importer/` directory. | |
| ```bash | |
| python3 stat_var_processor.py \ | |
| --input_data=../../statvar_imports/child_birth/input_files/*.csv \ | |
| --pv_map=../../statvar_imports/child_birth/pvmap.csv \ | |
| --places_resolved_csv=../../statvar_imports/child_birth/places_resolved.csv \ | |
| --config_file=../../statvar_imports/child_birth/metadata.csv \ | |
| --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf \ | |
| --output_path=../../statvar_imports/child_birth/output_files/child_birth |
|
|
||
| To Download, run: | ||
|
|
||
| `python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better readability and to prevent copy-paste errors, shell commands should be wrapped in a bash code block.
| `python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/` | |
| python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/ |
|
|
||
| `python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/` | ||
|
|
||
| Note : The downloaded file will be saved as "input_files/*.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The note about the downloaded filename is slightly misleading. The download_util_script.py will save the file as input_files/rows.csv based on the URL, not a wildcard *.csv. It's better to be specific to avoid confusion.
| Note : The downloaded file will be saved as "input_files/*.csv" | |
| Note : The downloaded file will be saved as "input_files/rows.csv" |
| "provenance_description": "The data set contains USA birth data", | ||
| "scripts": [ | ||
| "./../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/", | ||
| "../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/*.csv --pv_map=pvmap.csv --config_file=metadata.csv --places_resolved_csv=places_resolved.csv --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf --output_path=output_files/child_birth" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are extra spaces between arguments in this command. While this may not break execution, it's good practice to maintain consistent spacing for better readability.
| "../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/*.csv --pv_map=pvmap.csv --config_file=metadata.csv --places_resolved_csv=places_resolved.csv --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf --output_path=output_files/child_birth" | |
| "../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/*.csv --pv_map=pvmap.csv --config_file=metadata.csv --places_resolved_csv=places_resolved.csv --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf --output_path=output_files/child_birth" |
| #places_within,country/USA | ||
| output_columns,"observationAbout,observationDate,value,variableMeasured,observationPeriod" | ||
| header_rows,1 | ||
| mapped_columns,5 No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| WEST VIRGINIA,geoId/54 | ||
| WISCONSIN,geoId/55 | ||
| WYOMING,geoId/56 | ||
| PUERTO RICO,geoId/72 No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Data Value,value,{Number},observationDate,{Month} {Year},,,, | ||
| Monthly,observationPeriod,P1M,,,,,, | ||
| 12 Month-ending,observationPeriod,P1Y,,,,,, | ||
| State,observationAbout,{Data},,,,,, No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the necessary configuration and documentation for importing child_birth data. The new files, including README.md, manifest.json, metadata.csv, places_resolved.csv, pvmap.csv, Child_birth.tmcf, and Child_birth_stat_vars.mcf, establish a clear process for data download and processing. The README.md is well-structured, providing essential execution steps and an example. However, there are a few areas for improvement, including a critical typo in a command in the README.md, inconsistent use of placeholders, and missing newlines at the end of several files.
| --input_data=../../statvar_imports/child_birth/input_files/*.csv - | ||
| -pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a typo here: --pv_map is split across two lines as - and -pv_map. This will cause the command to fail. Please correct it to --pv_map on a single line.
| --input_data=../../statvar_imports/child_birth/input_files/*.csv - | |
| -pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> | |
| --input_data=../../statvar_imports/child_birth/input_files/*.csv \ | |
| --pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> |
|
|
||
| python3 stat_var_processor.py | ||
| --input_data=../../statvar_imports/child_birth/input_files/*.csv - | ||
| -pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pv_map argument uses a generic placeholder <filename_of_pvmap>. For consistency and clarity, it would be better to use the actual filename, pvmap.csv, as demonstrated in the example section (line 39).
| -pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> | |
| --pv_map=../../statvar_imports/child_birth/pvmap.csv |
| --input_data=../../statvar_imports/child_birth/input_files/*.csv - | ||
| -pv_map=../../statvar_imports/child_birth/<filename_of_pvmap> | ||
| --places_resolved_csv=../../statvar_imports/child_birth/place_mapping.csv | ||
| --config_file=../../statvar_imports/child_birth/<filename_of_metadata> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the pv_map argument, the config_file argument uses a generic placeholder <filename_of_metadata>. Please update this to metadata.csv for consistency with the example provided (line 41).
| --config_file=../../statvar_imports/child_birth/<filename_of_metadata> | |
| --config_file=../../statvar_imports/child_birth/metadata.csv |
| @@ -0,0 +1,5 @@ | |||
| parameter,value | |||
| #places_within,country/USA | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.