Skip to content

Conversation

@Rohit231998
Copy link
Contributor

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Rohit231998, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new automated data import pipeline for US birth and mortality data sourced from the CDC. It establishes the necessary configuration files, including a manifest for scheduled weekly refreshes, metadata for data processing, and mappings for geographical entities and statistical variables. The primary goal is to integrate this new dataset into the system, enabling regular updates and structured representation of child birth and related mortality statistics.

Highlights

  • New Data Import Pipeline: A new data import pipeline has been established for US birth and mortality data, sourced from the CDC.
  • Automated Weekly Refresh: The import is configured to automatically refresh on a weekly schedule, ensuring the data remains up-to-date.
  • Statistical Variable Definitions: New statistical variables have been defined to capture annual live births, general mortality events, and infant mortality events.
  • Comprehensive Place Resolution: Detailed mappings for all US states and Puerto Rico to their respective Data Commons IDs are included to accurately resolve geographical entities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new data import for child birth statistics from the CDC. It includes the necessary configuration files, mappings, and test data.

There are a few issues that need to be addressed:

  • The pull request title 'child_birth autorefersh configarations' contains typos and should be corrected to 'child_birth auto-refresh configurations'.
  • The manifest.json contains critical errors in script paths and an incorrect import_name that will cause the import to fail or be miscategorized.
  • The README.md file has some typos and a broken command example that should be fixed for clarity.
  • Several configuration files have minor formatting issues like missing newlines or commented-out properties that should be reviewed.

Overall, the structure of the new import is good, but the identified issues, especially in the manifest, are important to fix before merging.

@Rohit231998 Rohit231998 requested a review from ajaits October 17, 2025 06:03
@SandeepTuniki SandeepTuniki self-requested a review November 10, 2025 06:11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample data file is very large. We don't need to include the entire data. About 50-100 records is sufficient. Can we change it to only include 50-100 records?

Copy link
Contributor

@SandeepTuniki SandeepTuniki Nov 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sample data file is very large. We don't need to include the entire data. About 50-100 records is sufficient. Assuming this was generated through statvar processor, can we change the input file to include smaller dataset as I mentioned in another comment and re-generate this output file again?

@SandeepTuniki
Copy link
Contributor

Triggering review from gemini-code-assist bot again...

@SandeepTuniki
Copy link
Contributor

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new data import for child birth statistics from the CDC. It includes all the necessary configuration files and test data. The changes look good overall, but there are a few issues to address, mainly in the README.md and manifest.json files to ensure the import can be run correctly. I've also suggested adding trailing newlines to several files for better compatibility. Please also consider fixing the typos in the pull request title ('autorefersh' -> 'autorefresh', 'configarations' -> 'configurations').

"provenance_url": "https://www.cdc.gov/nchs/nvss/vsrr/provisional-tables.html",
"provenance_description": "The data set contains USA birth data",
"scripts": [
"./../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The relative path to download_util_script.py appears to be incorrect. Based on the file structure, the script is in the root util directory. From the manifest's location (statvar_imports/child_birth/), the correct path should be ../../util/download_util_script.py.

Suggested change
"./../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/",
"../../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/",

Comment on lines +18 to +44
###How to run:

python3 stat_var_processor.py
--input_data=../../statvar_imports/child_birth/input_files/*.csv -
-pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>
--places_resolved_csv=../../statvar_imports/child_birth/place_mapping.csv
--config_file=../../statvar_imports/child_birth/<filename_of_metadata>
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
--output_path=../../statvar_imports/child_birth/output_files/child_birth



###Example

To Process the files, Run:

Execute the script inside the folder "/data/tools/statvar_importer/"

```
python3 stat_var_processor.py
--input_data=../../statvar_imports/child_birth/input_files/*.csv
--pv_map=../../statvar_imports/child_birth/pvmap.csv
--places_resolved_csv=../../statvar_imports/child_birth/places_resolved.csv
--config_file=../../statvar_imports/child_birth/metadata.csv
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
--output_path=../../statvar_imports/child_birth/output_files/child_birth
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The 'How to run' section is confusing, contains placeholders and errors, and the 'Example' section is redundant. It's better to have a single, clear, and correct 'How to run' section. This suggestion replaces the current 'How to run' and 'Example' sections with a single corrected version.

Suggested change
###How to run:
python3 stat_var_processor.py
--input_data=../../statvar_imports/child_birth/input_files/*.csv -
-pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>
--places_resolved_csv=../../statvar_imports/child_birth/place_mapping.csv
--config_file=../../statvar_imports/child_birth/<filename_of_metadata>
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
--output_path=../../statvar_imports/child_birth/output_files/child_birth
###Example
To Process the files, Run:
Execute the script inside the folder "/data/tools/statvar_importer/"
```
python3 stat_var_processor.py
--input_data=../../statvar_imports/child_birth/input_files/*.csv
--pv_map=../../statvar_imports/child_birth/pvmap.csv
--places_resolved_csv=../../statvar_imports/child_birth/places_resolved.csv
--config_file=../../statvar_imports/child_birth/metadata.csv
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf
--output_path=../../statvar_imports/child_birth/output_files/child_birth
```
###How to run:
Execute the script from within the `/data/tools/statvar_importer/` directory.
```bash
python3 stat_var_processor.py \
--input_data=../../statvar_imports/child_birth/input_files/*.csv \
--pv_map=../../statvar_imports/child_birth/pvmap.csv \
--places_resolved_csv=../../statvar_imports/child_birth/places_resolved.csv \
--config_file=../../statvar_imports/child_birth/metadata.csv \
--existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf \
--output_path=../../statvar_imports/child_birth/output_files/child_birth


To Download, run:

`python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better readability and to prevent copy-paste errors, shell commands should be wrapped in a bash code block.

Suggested change
`python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/`
python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/


`python3 download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/`

Note : The downloaded file will be saved as "input_files/*.csv"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The note about the downloaded filename is slightly misleading. The download_util_script.py will save the file as input_files/rows.csv based on the URL, not a wildcard *.csv. It's better to be specific to avoid confusion.

Suggested change
Note : The downloaded file will be saved as "input_files/*.csv"
Note : The downloaded file will be saved as "input_files/rows.csv"

"provenance_description": "The data set contains USA birth data",
"scripts": [
"./../util/download_util_script.py --download_url=https://data.cdc.gov/api/views/hmz2-vwda/rows.csv?accessType=DOWNLOAD --output_folder=input_files/",
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/*.csv --pv_map=pvmap.csv --config_file=metadata.csv --places_resolved_csv=places_resolved.csv --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf --output_path=output_files/child_birth"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are extra spaces between arguments in this command. While this may not break execution, it's good practice to maintain consistent spacing for better readability.

Suggested change
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/*.csv --pv_map=pvmap.csv --config_file=metadata.csv --places_resolved_csv=places_resolved.csv --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf --output_path=output_files/child_birth"
"../../../tools/statvar_importer/stat_var_processor.py --input_data=input_files/*.csv --pv_map=pvmap.csv --config_file=metadata.csv --places_resolved_csv=places_resolved.csv --existing_statvar_mcf=gs://unresolved_mcf/scripts/statvar/stat_vars.mcf --output_path=output_files/child_birth"

#places_within,country/USA
output_columns,"observationAbout,observationDate,value,variableMeasured,observationPeriod"
header_rows,1
mapped_columns,5 No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's a best practice for text files, including CSVs, to end with a newline character. This ensures compatibility with various command-line tools and editors.

mapped_columns,5

WEST VIRGINIA,geoId/54
WISCONSIN,geoId/55
WYOMING,geoId/56
PUERTO RICO,geoId/72 No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file should end with a newline character. It's a standard convention for text files and helps prevent issues with some processing tools.

PUERTO RICO,geoId/72

Data Value,value,{Number},observationDate,{Month} {Year},,,,
Monthly,observationPeriod,P1M,,,,,,
12 Month-ending,observationPeriod,P1Y,,,,,,
State,observationAbout,{Data},,,,,, No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please add a newline at the end of the file. It's a standard convention that improves file handling with many tools.

State,observationAbout,{Data},,,,,,

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the necessary configuration and documentation for importing child_birth data. The new files, including README.md, manifest.json, metadata.csv, places_resolved.csv, pvmap.csv, Child_birth.tmcf, and Child_birth_stat_vars.mcf, establish a clear process for data download and processing. The README.md is well-structured, providing essential execution steps and an example. However, there are a few areas for improvement, including a critical typo in a command in the README.md, inconsistent use of placeholders, and missing newlines at the end of several files.

Comment on lines +21 to +22
--input_data=../../statvar_imports/child_birth/input_files/*.csv -
-pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a typo here: --pv_map is split across two lines as - and -pv_map. This will cause the command to fail. Please correct it to --pv_map on a single line.

Suggested change
--input_data=../../statvar_imports/child_birth/input_files/*.csv -
-pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>
--input_data=../../statvar_imports/child_birth/input_files/*.csv \
--pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>


python3 stat_var_processor.py
--input_data=../../statvar_imports/child_birth/input_files/*.csv -
-pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The pv_map argument uses a generic placeholder <filename_of_pvmap>. For consistency and clarity, it would be better to use the actual filename, pvmap.csv, as demonstrated in the example section (line 39).

Suggested change
-pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>
--pv_map=../../statvar_imports/child_birth/pvmap.csv

--input_data=../../statvar_imports/child_birth/input_files/*.csv -
-pv_map=../../statvar_imports/child_birth/<filename_of_pvmap>
--places_resolved_csv=../../statvar_imports/child_birth/place_mapping.csv
--config_file=../../statvar_imports/child_birth/<filename_of_metadata>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the pv_map argument, the config_file argument uses a generic placeholder <filename_of_metadata>. Please update this to metadata.csv for consistency with the example provided (line 41).

Suggested change
--config_file=../../statvar_imports/child_birth/<filename_of_metadata>
--config_file=../../statvar_imports/child_birth/metadata.csv

@@ -0,0 +1,5 @@
parameter,value
#places_within,country/USA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The line #places_within,country/USA is commented out. If this configuration is not currently in use or intended for future use, please either remove it to avoid confusion or add a comment explaining its purpose and why it's commented out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants