diff --git a/guide/12-enrich-data-with-thematic-information/part4_what_to_enrich_datacollections_analysisvariables.ipynb b/guide/12-enrich-data-with-thematic-information/part4_what_to_enrich_datacollections_analysisvariables.ipynb index 371a7b6b24..0cd2c3bd9a 100644 --- a/guide/12-enrich-data-with-thematic-information/part4_what_to_enrich_datacollections_analysisvariables.ipynb +++ b/guide/12-enrich-data-with-thematic-information/part4_what_to_enrich_datacollections_analysisvariables.ipynb @@ -1 +1,3040 @@ -{"cells":[{"cell_type":"markdown","metadata":{"slideshow":{"slide_type":"slide"}},"source":["# Part 4 - What to enrich with? (What are Data Collections and Analysis Variables?)"]},{"cell_type":"markdown","metadata":{},"source":["## Data Collections and GeoEnrichment coverage\n","\n","As described earlier, a data collection is a preassembled list of attributes that will be used to enrich the input features. Collection attributes can describe various types of information, such as demographic characteristics and geographic context of the locations or areas submitted as input features. \n","\n","Some data collections (such as default) can be used in all supported countries. Other data collections may only be available in one or a collection of countries. [Data Browser](https://doc.arcgis.com/en/esri-demographics/data/data-browser.htm) can be used to examine the entire global listing of variables, and associated datasets for each country."]},{"cell_type":"markdown","metadata":{},"source":["
"]},{"cell_type":"markdown","metadata":{},"source":["### List Countries with GeoEnrichment Data"]},{"cell_type":"markdown","metadata":{"slideshow":{"slide_type":"subslide"}},"source":["The `get_countries()` method can be used to query the countries for which GeoEnrichment data is available, and it returns a list of `Country` objects with which you can further query for properties. This list can also be viewed [here](https://developers.arcgis.com/rest/geoenrichment/api-reference/geoenrichment-coverage.htm)."]},{"cell_type":"code","execution_count":1,"metadata":{},"outputs":[],"source":["from arcgis.gis import GIS\n","from arcgis.geoenrichment import Country, enrich, get_countries"]},{"cell_type":"code","execution_count":2,"metadata":{},"outputs":[],"source":["# Create a GIS Connection\n","gis = GIS(profile='your_online_profile')"]},{"cell_type":"code","execution_count":3,"metadata":{"slideshow":{"slide_type":"fragment"}},"outputs":[{"name":"stdout","output_type":"stream","text":["Number of countries for which GeoEnrichment data is available: 177\n"]},{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
iso2iso3namealt_namedatasetsdefault_datasetcontinenthierarchy
0ALALBAlbaniaALBANIA[ALB_MBR_2021]ALB_MBR_2021Europe[census]
1DZDZAAlgeriaALGERIA[DZA_MBR_2021]DZA_MBR_2021Africa[census]
2ADANDAndorraANDORRA[AND_MBR_2021]AND_MBR_2021Europe[census]
3AOAGOAngolaANGOLA[AGO_MBR_2021]AGO_MBR_2021Africa[census]
4AIAIAAnguillaANGUILLA[AIA_MBR_2020]AIA_MBR_2020North America[census]
5ARARGArgentinaARGENTINA[ARG_MBR_2020]ARG_MBR_2020South America[census]
6AMARMArmeniaARMENIA[ARM_MBR_2020]ARM_MBR_2020Europe[census]
7AWABWArubaARUBA[ABW_MBR_2020]ABW_MBR_2020North America[census]
8AUAUSAustraliaAUSTRALIA[AUS_ABS_2016, AUS_MBR_2020]AUS_ABS_2016Oceania[AUS_ABS, census]
9ATAUTAustriaAUSTRIA[AUT_MBR_2021]AUT_MBR_2021Europe[census]
\n","
"],"text/plain":[" iso2 iso3 name alt_name datasets \\\n","0 AL ALB Albania ALBANIA [ALB_MBR_2021] \n","1 DZ DZA Algeria ALGERIA [DZA_MBR_2021] \n","2 AD AND Andorra ANDORRA [AND_MBR_2021] \n","3 AO AGO Angola ANGOLA [AGO_MBR_2021] \n","4 AI AIA Anguilla ANGUILLA [AIA_MBR_2020] \n","5 AR ARG Argentina ARGENTINA [ARG_MBR_2020] \n","6 AM ARM Armenia ARMENIA [ARM_MBR_2020] \n","7 AW ABW Aruba ARUBA [ABW_MBR_2020] \n","8 AU AUS Australia AUSTRALIA [AUS_ABS_2016, AUS_MBR_2020] \n","9 AT AUT Austria AUSTRIA [AUT_MBR_2021] \n","\n"," default_dataset continent hierarchy \n","0 ALB_MBR_2021 Europe [census] \n","1 DZA_MBR_2021 Africa [census] \n","2 AND_MBR_2021 Europe [census] \n","3 AGO_MBR_2021 Africa [census] \n","4 AIA_MBR_2020 North America [census] \n","5 ARG_MBR_2020 South America [census] \n","6 ARM_MBR_2020 Europe [census] \n","7 ABW_MBR_2020 North America [census] \n","8 AUS_ABS_2016 Oceania [AUS_ABS, census] \n","9 AUT_MBR_2021 Europe [census] "]},"execution_count":3,"metadata":{},"output_type":"execute_result"}],"source":["countries = get_countries()\n","print(\"Number of countries for which GeoEnrichment data is available: \" + str(len(countries)))\n","\n","#print a few countries for a sample\n","countries[0:10]"]},{"cell_type":"markdown","metadata":{},"source":["### Data Collections for U.S.\n","\n","The `data_collections` property of a `Country` object lists its available data collections and analysis variables under each data collection as a Pandas dataframe.\n","\n","In order to discover the data collections for a particular country, you may first access the reference variable to it using the `country.get()` method, and then fetch the data collections from `country.data_collections` property. Once we know the data collection we would like to use, we can look at `analysisVariable`s available in that data collection."]},{"cell_type":"code","execution_count":4,"metadata":{},"outputs":[{"data":{"text/plain":["arcgis.geoenrichment.enrichment.Country"]},"execution_count":4,"metadata":{},"output_type":"execute_result"}],"source":["# Get US as a country\n","usa = Country.get('US')\n","type(usa)"]},{"cell_type":"code","execution_count":5,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
analysisVariablealiasfieldCategoryvintage
dataCollectionID
1yearincrements1yearincrements.AGE0_CY2022 Population Age <12022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE1_CY2022 Population Age 12022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE2_CY2022 Population Age 22022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE3_CY2022 Population Age 32022 Age: 1 Year Increments (Esri)2022
1yearincrements1yearincrements.AGE4_CY2022 Population Age 42022 Age: 1 Year Increments (Esri)2022
\n","
"],"text/plain":[" analysisVariable alias \\\n","dataCollectionID \n","1yearincrements 1yearincrements.AGE0_CY 2022 Population Age <1 \n","1yearincrements 1yearincrements.AGE1_CY 2022 Population Age 1 \n","1yearincrements 1yearincrements.AGE2_CY 2022 Population Age 2 \n","1yearincrements 1yearincrements.AGE3_CY 2022 Population Age 3 \n","1yearincrements 1yearincrements.AGE4_CY 2022 Population Age 4 \n","\n"," fieldCategory vintage \n","dataCollectionID \n","1yearincrements 2022 Age: 1 Year Increments (Esri) 2022 \n","1yearincrements 2022 Age: 1 Year Increments (Esri) 2022 \n","1yearincrements 2022 Age: 1 Year Increments (Esri) 2022 \n","1yearincrements 2022 Age: 1 Year Increments (Esri) 2022 \n","1yearincrements 2022 Age: 1 Year Increments (Esri) 2022 "]},"execution_count":5,"metadata":{},"output_type":"execute_result"}],"source":["usa_df = usa.data_collections\n","\n","# print a few rows of the DataFrame\n","usa_df.head()"]},{"cell_type":"code","execution_count":6,"metadata":{},"outputs":[{"data":{"text/plain":["(18946, 4)"]},"execution_count":6,"metadata":{},"output_type":"execute_result"}],"source":["usa_df.shape"]},{"cell_type":"markdown","metadata":{},"source":["#### Unique Data Collections for U.S."]},{"cell_type":"markdown","metadata":{},"source":["Each data collection and analysis variable has a unique ID. When calling the `enrich()` method (explained earlier in this guide) these analysis variables can be passed in the `data_collections` and `analysis_variables` parameters.\n","\n","As an example, here we see a subset of the data collections for US showing 2 different data collections and multiple analysis variables for each collection."]},{"cell_type":"code","execution_count":7,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
analysisVariablealiasfieldCategoryvintage
dataCollectionID
1yearincrements1yearincrements.FAGE75_FY2027 Females Age 752027 Age: 1 Year Increments (Esri)2027
1yearincrements1yearincrements.FAGE76_FY2027 Females Age 762027 Age: 1 Year Increments (Esri)2027
1yearincrements1yearincrements.FAGE77_FY2027 Females Age 772027 Age: 1 Year Increments (Esri)2027
1yearincrements1yearincrements.FAGE78_FY2027 Females Age 782027 Age: 1 Year Increments (Esri)2027
1yearincrements1yearincrements.FAGE79_FY2027 Females Age 792027 Age: 1 Year Increments (Esri)2027
...............
5yearincrements5yearincrements.MEDAGE_CY2022 Median Age2022 Age: 5 Year Increments (Esri)2022
5yearincrements5yearincrements.MALES_CY2022 Male Population2022 Age: 5 Year Increments (Esri)2022
5yearincrements5yearincrements.MALE0_CY2022 Males Age 0-42022 Age: 5 Year Increments (Esri)2022
5yearincrements5yearincrements.MALE5_CY2022 Males Age 5-92022 Age: 5 Year Increments (Esri)2022
5yearincrements5yearincrements.MALE10_CY2022 Males Age 10-142022 Age: 5 Year Increments (Esri)2022
\n","

100 rows × 4 columns

\n","
"],"text/plain":[" analysisVariable alias \\\n","dataCollectionID \n","1yearincrements 1yearincrements.FAGE75_FY 2027 Females Age 75 \n","1yearincrements 1yearincrements.FAGE76_FY 2027 Females Age 76 \n","1yearincrements 1yearincrements.FAGE77_FY 2027 Females Age 77 \n","1yearincrements 1yearincrements.FAGE78_FY 2027 Females Age 78 \n","1yearincrements 1yearincrements.FAGE79_FY 2027 Females Age 79 \n","... ... ... \n","5yearincrements 5yearincrements.MEDAGE_CY 2022 Median Age \n","5yearincrements 5yearincrements.MALES_CY 2022 Male Population \n","5yearincrements 5yearincrements.MALE0_CY 2022 Males Age 0-4 \n","5yearincrements 5yearincrements.MALE5_CY 2022 Males Age 5-9 \n","5yearincrements 5yearincrements.MALE10_CY 2022 Males Age 10-14 \n","\n"," fieldCategory vintage \n","dataCollectionID \n","1yearincrements 2027 Age: 1 Year Increments (Esri) 2027 \n","1yearincrements 2027 Age: 1 Year Increments (Esri) 2027 \n","1yearincrements 2027 Age: 1 Year Increments (Esri) 2027 \n","1yearincrements 2027 Age: 1 Year Increments (Esri) 2027 \n","1yearincrements 2027 Age: 1 Year Increments (Esri) 2027 \n","... ... ... \n","5yearincrements 2022 Age: 5 Year Increments (Esri) 2022 \n","5yearincrements 2022 Age: 5 Year Increments (Esri) 2022 \n","5yearincrements 2022 Age: 5 Year Increments (Esri) 2022 \n","5yearincrements 2022 Age: 5 Year Increments (Esri) 2022 \n","5yearincrements 2022 Age: 5 Year Increments (Esri) 2022 \n","\n","[100 rows x 4 columns]"]},"execution_count":7,"metadata":{},"output_type":"execute_result"}],"source":["usa_df.iloc[500:600,:]"]},{"cell_type":"markdown","metadata":{},"source":["The table above shows 2 different data collections (1yearincrements and 5yearincrements). Since these are `Age` data collections, the `analysisVariable`s for these collections are similar. `vintage` shows the year that the demographic data represents. For example, a vintage of 2020 means that the data represents the year 2020.\n","\n","Let's get a list of unique data collections that are available for U.S. "]},{"cell_type":"code","execution_count":8,"metadata":{},"outputs":[{"data":{"text/plain":["115"]},"execution_count":8,"metadata":{},"output_type":"execute_result"}],"source":["usa_df.index.nunique()"]},{"cell_type":"markdown","metadata":{},"source":["*United States has 150 unique data collections.* Here are the first 10 data collections.\n"]},{"cell_type":"code","execution_count":9,"metadata":{},"outputs":[{"data":{"text/plain":["['1yearincrements',\n"," '5yearincrements',\n"," 'Age',\n"," 'agebyracebysex',\n"," 'AgeDependency',\n"," 'AtRisk',\n"," 'AutomobilesAutomotiveProducts',\n"," 'BabyProductsToysGames',\n"," 'basicFactsForMobileApps',\n"," 'businesses']"]},"execution_count":9,"metadata":{},"output_type":"execute_result"}],"source":["list(usa_df.index.unique())[:10]"]},{"cell_type":"markdown","metadata":{},"source":["Looking at `fieldCategory` is a great way to clearly understand what the data collection is about. `fieldCategory` combines vintage, datacollectionID columns along with the year and data collection. However, to query a data collection its unique ID (`dataCollectionID`) must be used.\n","\n","Let's look at the `fieldCategory` column for a few data collections in US."]},{"cell_type":"code","execution_count":10,"metadata":{},"outputs":[{"data":{"text/plain":["array(['2022 Age: 1 Year Increments (Esri)',\n"," '2027 Age: 1 Year Increments (Esri)',\n"," '2010 Age: 1 Year Increments (U.S. Census)',\n"," '2022 Age: 5 Year Increments (Esri)',\n"," '2027 Age: 5 Year Increments (Esri)',\n"," '2010 Age: 5 Year Increments (U.S. Census)',\n"," '2016-2020 Age: 5 Year Increments (ACS)',\n"," '2022 Age by Sex by Race (Esri)', '2027 Age by Sex by Race (Esri)',\n"," '2010 Age by Sex by Race (U.S. Census)'], dtype=object)"]},"execution_count":10,"metadata":{},"output_type":"execute_result"}],"source":["usa_df.fieldCategory.unique()[:10]"]},{"cell_type":"markdown","metadata":{},"source":["#### Data Collections by Socio-demographic Factors"]},{"cell_type":"markdown","metadata":{},"source":["You can filter the `data_collections` to get collections for a specific factor using Pandas expressions. Let's loook at data collections for different `socio-demographic factors` such as `Age, Population, Income`."]},{"cell_type":"markdown","metadata":{},"source":["__Data Collections for Age__"]},{"cell_type":"code","execution_count":11,"metadata":{},"outputs":[{"data":{"text/plain":["array(['2022 Age: 1 Year Increments (Esri)',\n"," '2027 Age: 1 Year Increments (Esri)',\n"," '2010 Age: 1 Year Increments (U.S. Census)',\n"," '2022 Age: 5 Year Increments (Esri)',\n"," '2027 Age: 5 Year Increments (Esri)',\n"," '2010 Age: 5 Year Increments (U.S. Census)',\n"," '2016-2020 Age: 5 Year Increments (ACS)',\n"," '2022 Age by Sex by Race (Esri)', '2027 Age by Sex by Race (Esri)',\n"," '2010 Age by Sex by Race (U.S. Census)',\n"," '2022 Age Dependency (Esri)', '2027 Age Dependency (Esri)',\n"," '2022 Disposable Income by Age (Esri)',\n"," '2010 Households by Age of Householder (U.S. Census)',\n"," '2016-2020 Households by Type and Size and Age (ACS)',\n"," '2010 Housing by Age of Householder (U.S. Census)',\n"," '2022 Income by Age (Esri)', '2027 Income by Age (Esri)',\n"," '2016-2020 Income by Age (ACS)', 'Age: 5 Year Increments',\n"," '2022 Net Worth by Age (Esri)',\n"," '2016-2020 Females by Age of Children and Employment Status (ACS)'],\n"," dtype=object)"]},"execution_count":11,"metadata":{},"output_type":"execute_result"}],"source":["Age_Collections = usa_df['fieldCategory'].str.contains('Age', na=False)\n","usa_df[Age_Collections].fieldCategory.unique()"]},{"cell_type":"markdown","metadata":{},"source":["__Data Collections for Population__"]},{"cell_type":"code","execution_count":12,"metadata":{},"outputs":[{"data":{"text/plain":["array(['2010 Population (U.S. Census)',\n"," '2016-2020 Population by Language Spoken at Home (ACS)',\n"," '2022 Daytime Population (Esri)',\n"," '2022 Population by Generation (Esri)',\n"," '2027 Population by Generation (Esri)',\n"," '2020 Group Quarters Population (U.S. Census)',\n"," '2020 Group Quarters Population by Type (U.S. Census)',\n"," '2010 Group Quarters Population (U.S. Census)',\n"," '2020 Hispanic Population by Race (U.S. Census)',\n"," '2020 Hispanic Population of Two or More Races (U.S. Census)',\n"," '2020 Hispanic Population <18 Years by Race (U.S. Census)',\n"," '2020 Hispanic Population 18+ Years by Race (U.S. Census)',\n"," '2020 Hispanic Population 18+ Years of Two or More Races (U.S. Census)',\n"," '2022 Population Time Series (Esri)',\n"," '2010 Population by Relationship and Household Type (U.S. Census)',\n"," '2016-2020 Population by Relationship and Household Type (ACS)',\n"," '2020 Non Hispanic Population by Race (U.S. Census)',\n"," '2020 Non Hispanic Population of Two or More Races (U.S. Census)',\n"," '2020 Non Hispanic Population <18 Years by Race (U.S. Census)',\n"," '2020 Non Hispanic Population 18+ Years by Race (U.S. Census)',\n"," '2020 Non Hispanic Population 18+ Years of Two or More Races (U.S. Census)',\n"," '2022 Population (Esri)', '2020 Population (U.S. Census)',\n"," '2020 Population by Race (U.S. Census)',\n"," '2020 Population of Two or More Races (U.S. Census)',\n"," '2020 Population <18 Years by Race (U.S. Census)',\n"," '2020 Population 18+ Years by Race (U.S. Census)',\n"," '2020 Population 18+ Years of Two or More Races (U.S. Census)'],\n"," dtype=object)"]},"execution_count":12,"metadata":{},"output_type":"execute_result"}],"source":["Pop_Collections = usa_df['fieldCategory'].str.contains('Population', na=False)\n","usa_df[Pop_Collections].fieldCategory.unique()"]},{"cell_type":"markdown","metadata":{},"source":["__Data Collections for Income__"]},{"cell_type":"code","execution_count":13,"metadata":{},"outputs":[{"data":{"text/plain":["Index(['1yearincrements', '5yearincrements', 'Age', 'agebyracebysex',\n"," 'AgeDependency', 'AtRisk', 'AutomobilesAutomotiveProducts',\n"," 'BabyProductsToysGames', 'basicFactsForMobileApps', 'businesses',\n"," ...\n"," 'travelMPI', 'unitsinstructure', 'urbanizationgroupsNEW', 'vacant',\n"," 'vehiclesavailable', 'veterans', 'Wealth', 'women', 'yearbuilt',\n"," 'yearmovedin'],\n"," dtype='object', name='dataCollectionID', length=115)"]},"execution_count":13,"metadata":{},"output_type":"execute_result"}],"source":["Income_Collections = usa_df['fieldCategory'].str.contains('Income', na=False)\n","Income_Collections.index.unique()"]},{"cell_type":"markdown","metadata":{},"source":["As mentioned earlier, using a `data_collection`'s unique ID (`dataCollectionID`) is the best way to further query a data collection. Let's look at the `dataCollectionID` for various Income data collections."]},{"cell_type":"code","execution_count":14,"metadata":{},"outputs":[{"data":{"text/plain":["Index(['AtRisk', 'basicFactsForMobileApps', 'disposableincome',\n"," 'foodstampsSNAP', 'Health', 'householdincome', 'households',\n"," 'incomebyage', 'KeyUSFacts', 'Policy', 'population', 'Wealth'],\n"," dtype='object', name='dataCollectionID')"]},"execution_count":14,"metadata":{},"output_type":"execute_result"}],"source":["usa_df[Income_Collections].index.unique()"]},{"cell_type":"markdown","metadata":{},"source":["#### Analysis variables for Data Collections\n","Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover `analysisVariable`s for some of the data collections."]},{"cell_type":"markdown","metadata":{},"source":["__Analysis variables for `Age` data collection__"]},{"cell_type":"code","execution_count":15,"metadata":{},"outputs":[{"data":{"text/plain":["array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',\n"," 'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',\n"," 'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',\n"," 'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',\n"," 'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',\n"," 'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',\n"," 'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',\n"," 'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)"]},"execution_count":15,"metadata":{},"output_type":"execute_result"}],"source":["usa_df.loc['Age']['analysisVariable'].unique()"]},{"cell_type":"markdown","metadata":{},"source":["Analysis variables are typically represented as `dataCollectionID.` as seen above."]},{"cell_type":"markdown","metadata":{},"source":["__Analysis variables for `DaytimePopulation` data collection__"]},{"cell_type":"code","execution_count":20,"metadata":{},"outputs":[{"data":{"text/plain":["array(['DaytimePopulation.DPOP_CY', 'DaytimePopulation.DPOPWRK_CY',\n"," 'DaytimePopulation.DPOPRES_CY', 'DaytimePopulation.DPOPDENSCY'],\n"," dtype=object)"]},"execution_count":20,"metadata":{},"output_type":"execute_result"}],"source":["usa_df.loc['DaytimePopulation']['analysisVariable'].unique()"]},{"cell_type":"markdown","metadata":{},"source":["### Data Collections for Another Country"]},{"cell_type":"markdown","metadata":{},"source":["Let's look at data collections for New Zealand. [Data Browser](https://doc.arcgis.com/en/esri-demographics/data/data-browser.htm) can be used to examine the entire global listing of variables, and associated datasets for New Zealand.\n","\n","In order to discover the data collections for a particular country, you may first access the reference variable to it using the `country.get()` method, and then fetch the data collections from `country.data_collections` property. Once we know the data collection we would like to use, we can look at `analysisVariable`s available in that data collection."]},{"cell_type":"code","execution_count":21,"metadata":{},"outputs":[{"data":{"text/plain":["arcgis.geoenrichment.enrichment.Country"]},"execution_count":21,"metadata":{},"output_type":"execute_result"}],"source":["# Get US as a country\n","nz = Country.get('New Zealand')\n","type(nz)"]},{"cell_type":"code","execution_count":22,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
analysisVariablealiasfieldCategoryvintage
dataCollectionID
15YearIncrements15YearIncrements.PAGE01_CY2020 Total Population Age 0-142020 Population Totals (MBR)2020
15YearIncrements15YearIncrements.PAGE02_CY2020 Total Population Age 15-292020 Population Totals (MBR)2020
15YearIncrements15YearIncrements.PAGE03_CY2020 Total Population Age 30-442020 Population Totals (MBR)2020
15YearIncrements15YearIncrements.PAGE04_CY2020 Total Population Age 45-592020 Population Totals (MBR)2020
15YearIncrements15YearIncrements.PAGE05_CY2020 Total Population Age 60+2020 Population Totals (MBR)2020
\n","
"],"text/plain":[" analysisVariable alias \\\n","dataCollectionID \n","15YearIncrements 15YearIncrements.PAGE01_CY 2020 Total Population Age 0-14 \n","15YearIncrements 15YearIncrements.PAGE02_CY 2020 Total Population Age 15-29 \n","15YearIncrements 15YearIncrements.PAGE03_CY 2020 Total Population Age 30-44 \n","15YearIncrements 15YearIncrements.PAGE04_CY 2020 Total Population Age 45-59 \n","15YearIncrements 15YearIncrements.PAGE05_CY 2020 Total Population Age 60+ \n","\n"," fieldCategory vintage \n","dataCollectionID \n","15YearIncrements 2020 Population Totals (MBR) 2020 \n","15YearIncrements 2020 Population Totals (MBR) 2020 \n","15YearIncrements 2020 Population Totals (MBR) 2020 \n","15YearIncrements 2020 Population Totals (MBR) 2020 \n","15YearIncrements 2020 Population Totals (MBR) 2020 "]},"execution_count":22,"metadata":{},"output_type":"execute_result"}],"source":["nz_df = nz.data_collections\n","\n","# print a few rows of the DataFrame\n","nz_df.head()"]},{"cell_type":"code","execution_count":23,"metadata":{},"outputs":[{"data":{"text/plain":["(193, 4)"]},"execution_count":23,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.shape"]},{"cell_type":"markdown","metadata":{},"source":["#### Unique Data Collections for New Zealand"]},{"cell_type":"markdown","metadata":{},"source":["Let's get a list of unique data collections that are available for New Zealand."]},{"cell_type":"code","execution_count":24,"metadata":{},"outputs":[{"data":{"text/plain":["Index(['15YearIncrements', 'EducationalAttainment', 'Gender',\n"," 'HouseholdsbyIncome', 'HouseholdsbyType', 'HouseholdTotals', 'KeyFacts',\n"," 'KeyGlobalFacts', 'MaritalStatus', 'PopulationTotals',\n"," 'PurchasingPower', 'Spending'],\n"," dtype='object', name='dataCollectionID')"]},"execution_count":24,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.index.unique()"]},{"cell_type":"markdown","metadata":{},"source":["New Zealand has 12 unique data collections.\n","\n","We can look at the `fieldCategory` column to understand each category better."]},{"cell_type":"code","execution_count":25,"metadata":{},"outputs":[{"data":{"text/plain":["array(['2020 Population Totals (MBR)',\n"," '2020 Male Population Totals (MBR)',\n"," '2020 Female Population Totals (MBR)',\n"," '2020 Educational Attainment (MBR)',\n"," '2020 Households by Income (MBR)', '2020 Households by Type (MBR)',\n"," '2020 Household Totals (MBR)', '2020 Marital Status (MBR)',\n"," '2020 Purchasing Power (MBR)', 'Key Demographic Indicators',\n"," 'Age: 5 Year Increments',\n"," '2020 Food & Beverage Expenditures (MBR)',\n"," '2020 Alcoholic Beverage Expenditures (MBR)',\n"," '2020 Tobacco Expenditures (MBR)',\n"," '2020 Clothing Expenditures (MBR)',\n"," '2020 Footwear Expenditures (MBR)',\n"," '2020 Furniture & Furnishing Expenditures (MBR)',\n"," '2020 Household Textiles Expenditures (MBR)',\n"," '2020 Household Appliances Expenditures (MBR)',\n"," '2020 Household Utensils Expenditures (MBR)',\n"," '2020 House & Garden Expenditures (MBR)',\n"," '2020 Household Maintenance Expenditures (MBR)',\n"," '2020 Medical Products & Supplies Expenditures (MBR)',\n"," '2020 Consumer Electronics Expenditures (MBR)',\n"," '2020 Recreation & Culture Durable Expenditures (MBR)',\n"," '2020 Entertainment Expenditures (MBR)',\n"," '2020 Recreational & Cultural Service Expenditures (MBR)',\n"," '2020 Books & Stationery Expenditures (MBR)',\n"," '2020 Catering Services Expenditures (MBR)',\n"," '2020 Personal Care Expenditures (MBR)',\n"," '2020 Jewelry & Personal Effects Expenditures (MBR)'], dtype=object)"]},"execution_count":25,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.fieldCategory.unique()"]},{"cell_type":"markdown","metadata":{},"source":["Looking at `fieldCategory` is a great way to clearly understand what the data collection is about. However, to query a data collection its unique ID (`dataCollectionID`) must be used."]},{"cell_type":"markdown","metadata":{},"source":["#### Data Collections for Socio-demographic Factors"]},{"cell_type":"markdown","metadata":{},"source":["New Zealand has fewer `data_collections` compared to U.S. Let's look at data collections for Key Facts, Education and Spending."]},{"cell_type":"markdown","metadata":{},"source":["__Data Collection for Key Facts__"]},{"cell_type":"code","execution_count":26,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
analysisVariablealiasfieldCategoryvintage
dataCollectionID
KeyGlobalFactsKeyGlobalFacts.TOTPOPTotal PopulationKey Demographic IndicatorsNaN
KeyGlobalFactsKeyGlobalFacts.TOTHHTotal HouseholdsKey Demographic IndicatorsNaN
KeyGlobalFactsKeyGlobalFacts.AVGHHSZAverage Household SizeKey Demographic IndicatorsNaN
KeyGlobalFactsKeyGlobalFacts.TOTMALESMale PopulationAge: 5 Year IncrementsNaN
KeyGlobalFactsKeyGlobalFacts.TOTFEMALESFemale PopulationAge: 5 Year IncrementsNaN
\n","
"],"text/plain":[" analysisVariable alias \\\n","dataCollectionID \n","KeyGlobalFacts KeyGlobalFacts.TOTPOP Total Population \n","KeyGlobalFacts KeyGlobalFacts.TOTHH Total Households \n","KeyGlobalFacts KeyGlobalFacts.AVGHHSZ Average Household Size \n","KeyGlobalFacts KeyGlobalFacts.TOTMALES Male Population \n","KeyGlobalFacts KeyGlobalFacts.TOTFEMALES Female Population \n","\n"," fieldCategory vintage \n","dataCollectionID \n","KeyGlobalFacts Key Demographic Indicators NaN \n","KeyGlobalFacts Key Demographic Indicators NaN \n","KeyGlobalFacts Key Demographic Indicators NaN \n","KeyGlobalFacts Age: 5 Year Increments NaN \n","KeyGlobalFacts Age: 5 Year Increments NaN "]},"execution_count":26,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.loc['KeyGlobalFacts']"]},{"cell_type":"markdown","metadata":{},"source":["__Data Collection for Education__"]},{"cell_type":"code","execution_count":27,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
analysisVariablealiasfieldCategoryvintage
dataCollectionID
EducationalAttainmentEducationalAttainment.EDUC01A_CY2020 Pop 15+/Edu: No Qualification2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC02A_CY2020 Pop 15+/Edu: Level 12020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC03A_CY2020 Pop 15+/Edu: Level 22020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC04A_CY2020 Pop 15+/Edu: Level 32020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC05A_CY2020 Pop 15+/Edu: Level 42020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC06B_CY2020 Pop 15+/Edu: Level 5 Diploma2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC07A_CY2020 Pop 15+/Edu: Level 6 Diploma2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC08A_CY2020 Pop 15+/Edu: Bachelor Degree2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC09A_CY2020 Pop 15+/Edu: Post-graduate and Honours de...2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC10A_CY2020 Pop 15+/Edu: Master's Degree2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC11A_CY2020 Pop 15+/Edu: Doctorate2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC12A_CY2020 Pop 15+/Edu: Overseas Secondary School2020 Educational Attainment (MBR)2020
EducationalAttainmentEducationalAttainment.EDUC13_CY2020 Pop 15+/Edu: Not Included Elsewhere2020 Educational Attainment (MBR)2020
\n","
"],"text/plain":[" analysisVariable \\\n","dataCollectionID \n","EducationalAttainment EducationalAttainment.EDUC01A_CY \n","EducationalAttainment EducationalAttainment.EDUC02A_CY \n","EducationalAttainment EducationalAttainment.EDUC03A_CY \n","EducationalAttainment EducationalAttainment.EDUC04A_CY \n","EducationalAttainment EducationalAttainment.EDUC05A_CY \n","EducationalAttainment EducationalAttainment.EDUC06B_CY \n","EducationalAttainment EducationalAttainment.EDUC07A_CY \n","EducationalAttainment EducationalAttainment.EDUC08A_CY \n","EducationalAttainment EducationalAttainment.EDUC09A_CY \n","EducationalAttainment EducationalAttainment.EDUC10A_CY \n","EducationalAttainment EducationalAttainment.EDUC11A_CY \n","EducationalAttainment EducationalAttainment.EDUC12A_CY \n","EducationalAttainment EducationalAttainment.EDUC13_CY \n","\n"," alias \\\n","dataCollectionID \n","EducationalAttainment 2020 Pop 15+/Edu: No Qualification \n","EducationalAttainment 2020 Pop 15+/Edu: Level 1 \n","EducationalAttainment 2020 Pop 15+/Edu: Level 2 \n","EducationalAttainment 2020 Pop 15+/Edu: Level 3 \n","EducationalAttainment 2020 Pop 15+/Edu: Level 4 \n","EducationalAttainment 2020 Pop 15+/Edu: Level 5 Diploma \n","EducationalAttainment 2020 Pop 15+/Edu: Level 6 Diploma \n","EducationalAttainment 2020 Pop 15+/Edu: Bachelor Degree \n","EducationalAttainment 2020 Pop 15+/Edu: Post-graduate and Honours de... \n","EducationalAttainment 2020 Pop 15+/Edu: Master's Degree \n","EducationalAttainment 2020 Pop 15+/Edu: Doctorate \n","EducationalAttainment 2020 Pop 15+/Edu: Overseas Secondary School \n","EducationalAttainment 2020 Pop 15+/Edu: Not Included Elsewhere \n","\n"," fieldCategory vintage \n","dataCollectionID \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 \n","EducationalAttainment 2020 Educational Attainment (MBR) 2020 "]},"execution_count":27,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.loc['EducationalAttainment']"]},{"cell_type":"markdown","metadata":{},"source":["__Data Collection for Spending__"]},{"cell_type":"code","execution_count":28,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
analysisVariablealiasfieldCategoryvintage
dataCollectionID
SpendingSpending.CS01_CY2020 Food & Beverage: Total2020 Food & Beverage Expenditures (MBR)2020
SpendingSpending.CS01PRM_CY2020 Food & Beverage: Per Mill2020 Food & Beverage Expenditures (MBR)2020
SpendingSpending.CSPC01_CY2020 Food & Beverage: Per Capita2020 Food & Beverage Expenditures (MBR)2020
SpendingSpending.CS01IDX_CY2020 Food & Beverage: Index2020 Food & Beverage Expenditures (MBR)2020
SpendingSpending.CS02_CY2020 Alcoholic Beverage: Total2020 Alcoholic Beverage Expenditures (MBR)2020
...............
SpendingSpending.CS19IDX_CY2020 Personal Care: Index2020 Personal Care Expenditures (MBR)2020
SpendingSpending.CS20_CY2020 Personal Effects: Total2020 Jewelry & Personal Effects Expenditures (...2020
SpendingSpending.CS20PRM_CY2020 Personal Effects: Per Mill2020 Jewelry & Personal Effects Expenditures (...2020
SpendingSpending.CSPC20_CY2020 Personal Effects: Per Capita2020 Jewelry & Personal Effects Expenditures (...2020
SpendingSpending.CS20IDX_CY2020 Personal Effects: Index2020 Jewelry & Personal Effects Expenditures (...2020
\n","

80 rows × 4 columns

\n","
"],"text/plain":[" analysisVariable alias \\\n","dataCollectionID \n","Spending Spending.CS01_CY 2020 Food & Beverage: Total \n","Spending Spending.CS01PRM_CY 2020 Food & Beverage: Per Mill \n","Spending Spending.CSPC01_CY 2020 Food & Beverage: Per Capita \n","Spending Spending.CS01IDX_CY 2020 Food & Beverage: Index \n","Spending Spending.CS02_CY 2020 Alcoholic Beverage: Total \n","... ... ... \n","Spending Spending.CS19IDX_CY 2020 Personal Care: Index \n","Spending Spending.CS20_CY 2020 Personal Effects: Total \n","Spending Spending.CS20PRM_CY 2020 Personal Effects: Per Mill \n","Spending Spending.CSPC20_CY 2020 Personal Effects: Per Capita \n","Spending Spending.CS20IDX_CY 2020 Personal Effects: Index \n","\n"," fieldCategory vintage \n","dataCollectionID \n","Spending 2020 Food & Beverage Expenditures (MBR) 2020 \n","Spending 2020 Food & Beverage Expenditures (MBR) 2020 \n","Spending 2020 Food & Beverage Expenditures (MBR) 2020 \n","Spending 2020 Food & Beverage Expenditures (MBR) 2020 \n","Spending 2020 Alcoholic Beverage Expenditures (MBR) 2020 \n","... ... ... \n","Spending 2020 Personal Care Expenditures (MBR) 2020 \n","Spending 2020 Jewelry & Personal Effects Expenditures (... 2020 \n","Spending 2020 Jewelry & Personal Effects Expenditures (... 2020 \n","Spending 2020 Jewelry & Personal Effects Expenditures (... 2020 \n","Spending 2020 Jewelry & Personal Effects Expenditures (... 2020 \n","\n","[80 rows x 4 columns]"]},"execution_count":28,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.loc['Spending']"]},{"cell_type":"markdown","metadata":{},"source":["#### Analysis variables for Data Collections\n","Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover `analysisVariable`s for some of the data collections we looked at earlier."]},{"cell_type":"markdown","metadata":{},"source":["__Analysis variables for `KeyGlobalFacts` data collection__"]},{"cell_type":"code","execution_count":29,"metadata":{},"outputs":[{"data":{"text/plain":["array(['KeyGlobalFacts.TOTPOP', 'KeyGlobalFacts.TOTHH',\n"," 'KeyGlobalFacts.AVGHHSZ', 'KeyGlobalFacts.TOTMALES',\n"," 'KeyGlobalFacts.TOTFEMALES'], dtype=object)"]},"execution_count":29,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.loc['KeyGlobalFacts']['analysisVariable'].unique()"]},{"cell_type":"markdown","metadata":{},"source":["__Analysis variables for `EducationalAttainment` data collection__"]},{"cell_type":"code","execution_count":30,"metadata":{},"outputs":[{"data":{"text/plain":["array(['EducationalAttainment.EDUC01A_CY',\n"," 'EducationalAttainment.EDUC02A_CY',\n"," 'EducationalAttainment.EDUC03A_CY',\n"," 'EducationalAttainment.EDUC04A_CY',\n"," 'EducationalAttainment.EDUC05A_CY',\n"," 'EducationalAttainment.EDUC06B_CY',\n"," 'EducationalAttainment.EDUC07A_CY',\n"," 'EducationalAttainment.EDUC08A_CY',\n"," 'EducationalAttainment.EDUC09A_CY',\n"," 'EducationalAttainment.EDUC10A_CY',\n"," 'EducationalAttainment.EDUC11A_CY',\n"," 'EducationalAttainment.EDUC12A_CY',\n"," 'EducationalAttainment.EDUC13_CY'], dtype=object)"]},"execution_count":30,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.loc['EducationalAttainment']['analysisVariable'].unique()"]},{"cell_type":"markdown","metadata":{},"source":["__Analysis variables for `Spending` data collection__"]},{"cell_type":"code","execution_count":31,"metadata":{},"outputs":[{"data":{"text/plain":["array(['Spending.CS01_CY', 'Spending.CS01PRM_CY', 'Spending.CSPC01_CY',\n"," 'Spending.CS01IDX_CY', 'Spending.CS02_CY', 'Spending.CS02PRM_CY',\n"," 'Spending.CSPC02_CY', 'Spending.CS02IDX_CY', 'Spending.CS03_CY',\n"," 'Spending.CS03PRM_CY', 'Spending.CSPC03_CY', 'Spending.CS03IDX_CY',\n"," 'Spending.CS04_CY', 'Spending.CS04PRM_CY', 'Spending.CSPC04_CY',\n"," 'Spending.CS04IDX_CY', 'Spending.CS05_CY', 'Spending.CS05PRM_CY',\n"," 'Spending.CSPC05_CY', 'Spending.CS05IDX_CY', 'Spending.CS06_CY',\n"," 'Spending.CS06PRM_CY', 'Spending.CSPC06_CY', 'Spending.CS06IDX_CY',\n"," 'Spending.CS07_CY', 'Spending.CS07PRM_CY', 'Spending.CSPC07_CY',\n"," 'Spending.CS07IDX_CY', 'Spending.CS08_CY', 'Spending.CS08PRM_CY',\n"," 'Spending.CSPC08_CY', 'Spending.CS08IDX_CY', 'Spending.CS09_CY',\n"," 'Spending.CS09PRM_CY', 'Spending.CSPC09_CY', 'Spending.CS09IDX_CY',\n"," 'Spending.CS10_CY', 'Spending.CS10PRM_CY', 'Spending.CSPC10_CY',\n"," 'Spending.CS10IDX_CY', 'Spending.CS11_CY', 'Spending.CS11PRM_CY',\n"," 'Spending.CSPC11_CY', 'Spending.CS11IDX_CY', 'Spending.CS12_CY',\n"," 'Spending.CS12PRM_CY', 'Spending.CSPC12_CY', 'Spending.CS12IDX_CY',\n"," 'Spending.CS13_CY', 'Spending.CS13PRM_CY', 'Spending.CSPC13_CY',\n"," 'Spending.CS13IDX_CY', 'Spending.CS14_CY', 'Spending.CS14PRM_CY',\n"," 'Spending.CSPC14_CY', 'Spending.CS14IDX_CY', 'Spending.CS15_CY',\n"," 'Spending.CS15PRM_CY', 'Spending.CSPC15_CY', 'Spending.CS15IDX_CY',\n"," 'Spending.CS16_CY', 'Spending.CS16PRM_CY', 'Spending.CSPC16_CY',\n"," 'Spending.CS16IDX_CY', 'Spending.CS17_CY', 'Spending.CS17PRM_CY',\n"," 'Spending.CSPC17_CY', 'Spending.CS17IDX_CY', 'Spending.CS18_CY',\n"," 'Spending.CS18PRM_CY', 'Spending.CSPC18_CY', 'Spending.CS18IDX_CY',\n"," 'Spending.CS19_CY', 'Spending.CS19PRM_CY', 'Spending.CSPC19_CY',\n"," 'Spending.CS19IDX_CY', 'Spending.CS20_CY', 'Spending.CS20PRM_CY',\n"," 'Spending.CSPC20_CY', 'Spending.CS20IDX_CY'], dtype=object)"]},"execution_count":31,"metadata":{},"output_type":"execute_result"}],"source":["nz_df.loc['Spending']['analysisVariable'].unique()"]},{"cell_type":"markdown","metadata":{},"source":["### Perform Enrichment using Data Collections and Analysis Variables"]},{"cell_type":"markdown","metadata":{},"source":["Data Collections can be used to enrich various study areas. `data_collection`s and `analysis_variable`s can be passed in the `enrich()` method. Details about enriching study areas can be found in __Enriching Study Areas__ section. \n","\n","Let's look at a few similar examples of GeoEnrichment here."]},{"cell_type":"markdown","metadata":{},"source":["#### Enrich using Data Collections"]},{"cell_type":"markdown","metadata":{},"source":["__Enrich with `Age` data collection__\n","\n","Here we see an address being enriched by data from `Age` data collection."]},{"cell_type":"code","execution_count":32,"metadata":{},"outputs":[],"source":["# Enriching single address as single line imput\n","age_coll = enrich(study_areas=[\"380 New York St Redlands CA 92373\"], \n"," data_collections=['Age'])"]},{"cell_type":"code","execution_count":33,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidence...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0USA-117.1947934.057265RingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...366.0392.0365.0345.0322.0277.0168.0103.0132.0{\"rings\": [[[-117.19479001927878, 34.071773611...
\n","

1 rows × 48 columns

\n","
"],"text/plain":[" source_country x y area_type buffer_units \\\n","0 USA -117.19479 34.057265 RingBuffer esriMiles \n","\n"," buffer_units_alias buffer_radii \\\n","0 Miles 1.0 \n","\n"," aggregation_method \\\n","0 BlockApportionment:US.BlockGroups;PointsLayer:... \n","\n"," population_to_polygon_size_rating apportionment_confidence ... fem45 \\\n","0 2.191 2.576 ... 366.0 \n","\n"," fem50 fem55 fem60 fem65 fem70 fem75 fem80 fem85 \\\n","0 392.0 365.0 345.0 322.0 277.0 168.0 103.0 132.0 \n","\n"," SHAPE \n","0 {\"rings\": [[[-117.19479001927878, 34.071773611... \n","\n","[1 rows x 48 columns]"]},"execution_count":33,"metadata":{},"output_type":"execute_result"}],"source":["age_coll"]},{"cell_type":"code","execution_count":34,"metadata":{},"outputs":[{"data":{"text/plain":["Index(['source_country', 'x', 'y', 'area_type', 'buffer_units',\n"," 'buffer_units_alias', 'buffer_radii', 'aggregation_method',\n"," 'population_to_polygon_size_rating', 'apportionment_confidence',\n"," 'has_data', 'male0', 'male5', 'male10', 'male15', 'male20', 'male25',\n"," 'male30', 'male35', 'male40', 'male45', 'male50', 'male55', 'male60',\n"," 'male65', 'male70', 'male75', 'male80', 'male85', 'fem0', 'fem5',\n"," 'fem10', 'fem15', 'fem20', 'fem25', 'fem30', 'fem35', 'fem40', 'fem45',\n"," 'fem50', 'fem55', 'fem60', 'fem65', 'fem70', 'fem75', 'fem80', 'fem85',\n"," 'SHAPE'],\n"," dtype='object')"]},"execution_count":34,"metadata":{},"output_type":"execute_result"}],"source":["age_coll.columns"]},{"cell_type":"markdown","metadata":{},"source":["When a data collection is specified without specific analysis variables, all variables under the data collection are used for enrichment as can be seen above."]},{"cell_type":"markdown","metadata":{},"source":["__Enrich with `Health` data collection__\n","\n","Here we see a zip code being enriched by data from Health data collection."]},{"cell_type":"code","execution_count":35,"metadata":{},"outputs":[],"source":["redlands = usa.subgeographies.states['California'].zip5['92373']"]},{"cell_type":"code","execution_count":36,"metadata":{},"outputs":[],"source":["redlands_df = enrich(study_areas=[redlands], data_collections=['Health'] )"]},{"cell_type":"code","execution_count":37,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
std_geography_levelstd_geography_namestd_geography_idsource_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datarel65_hi2_ocacscivnins...pop85_cypop18up_cypop21up_cymedage_cyhhu18_c10medhinc_cys27_buss27_saless27_empSHAPE
0US.ZIP5Redlands92373USAQuery:US.ZIP52.1912.57612.031157.0...1205.028208.027076.042.23851.091009.0224.0306371.04093.0{\"rings\": [[[-117.16767396036383, 33.976847519...
\n","

1 rows × 431 columns

\n","
"],"text/plain":[" std_geography_level std_geography_name std_geography_id source_country \\\n","0 US.ZIP5 Redlands 92373 USA \n","\n"," aggregation_method population_to_polygon_size_rating \\\n","0 Query:US.ZIP5 2.191 \n","\n"," apportionment_confidence has_data rel65_hi2_oc acscivnins ... \\\n","0 2.576 1 2.0 31157.0 ... \n","\n"," pop85_cy pop18up_cy pop21up_cy medage_cy hhu18_c10 medhinc_cy \\\n","0 1205.0 28208.0 27076.0 42.2 3851.0 91009.0 \n","\n"," s27_bus s27_sales s27_emp \\\n","0 224.0 306371.0 4093.0 \n","\n"," SHAPE \n","0 {\"rings\": [[[-117.16767396036383, 33.976847519... \n","\n","[1 rows x 431 columns]"]},"execution_count":37,"metadata":{},"output_type":"execute_result"}],"source":["redlands_df"]},{"cell_type":"code","execution_count":38,"metadata":{},"outputs":[{"data":{"text/plain":["Index(['std_geography_level', 'std_geography_name', 'std_geography_id',\n"," 'source_country', 'aggregation_method',\n"," 'population_to_polygon_size_rating', 'apportionment_confidence',\n"," 'has_data', 'rel65_hi2_oc', 'acscivnins',\n"," ...\n"," 'pop85_cy', 'pop18up_cy', 'pop21up_cy', 'medage_cy', 'hhu18_c10',\n"," 'medhinc_cy', 's27_bus', 's27_sales', 's27_emp', 'SHAPE'],\n"," dtype='object', length=431)"]},"execution_count":38,"metadata":{},"output_type":"execute_result"}],"source":["redlands_df.columns"]},{"cell_type":"markdown","metadata":{},"source":["#### Enrich using Analysis Variables\n","\n","Data can be enriched by specifying specific analysis variables of a data collection with which we want to enrich our data. In this example, we will look at `analysis_variables` for Age `data_collection` and then use specific analysis variables to `enrich()` a study area."]},{"cell_type":"code","execution_count":39,"metadata":{},"outputs":[{"data":{"text/plain":["array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',\n"," 'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',\n"," 'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',\n"," 'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',\n"," 'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',\n"," 'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',\n"," 'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',\n"," 'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)"]},"execution_count":39,"metadata":{},"output_type":"execute_result"}],"source":["# Unique analysis variables for Age data collection\n","usa = Country.get('US')\n","usa.data_collections.loc['Age']['analysisVariable'].unique()"]},{"cell_type":"markdown","metadata":{},"source":["Now, we will enrich our study area with `Age.FEM45, Age.FEM55, Age.FEM65` variables"]},{"cell_type":"code","execution_count":40,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datafem45fem55fem65SHAPE
0USA-117.1947934.057265RingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.5761366.0365.0322.0{\"rings\": [[[-117.19479001927878, 34.071773611...
\n","
"],"text/plain":[" source_country x y area_type buffer_units \\\n","0 USA -117.19479 34.057265 RingBuffer esriMiles \n","\n"," buffer_units_alias buffer_radii \\\n","0 Miles 1.0 \n","\n"," aggregation_method \\\n","0 BlockApportionment:US.BlockGroups;PointsLayer:... \n","\n"," population_to_polygon_size_rating apportionment_confidence has_data \\\n","0 2.191 2.576 1 \n","\n"," fem45 fem55 fem65 SHAPE \n","0 366.0 365.0 322.0 {\"rings\": [[[-117.19479001927878, 34.071773611... "]},"execution_count":40,"metadata":{},"output_type":"execute_result"}],"source":["enrich(study_areas=[\"380 New York St Redlands CA 92373\"], \n"," analysis_variables=[\"Age.FEM45\",\"Age.FEM55\",\"Age.FEM65\"])"]},{"cell_type":"markdown","metadata":{},"source":["## Enriching Spatially Enabled Dataframes"]},{"cell_type":"markdown","metadata":{},"source":["One of the most common use case for GeoEnrichment is enriching existing data in feature layers. As a user, you may need to analyze and enrich your data that already exists in feature layers. Spatially Enabled DataFrame (SeDF) helps us bring the data from layer into a dataframe which can then be GeoEnriched. \n","\n","Let's look at an example using an existing layer of Covid-19 dataset. This feature layer includes latest Covid-19 Cases, Recovered and Deaths data for U.S. at the county level."]},{"cell_type":"code","execution_count":41,"metadata":{},"outputs":[{"name":"stdout","output_type":"stream","text":["\n"]},{"data":{"text/plain":[""]},"execution_count":41,"metadata":{},"output_type":"execute_result"}],"source":["# Get the layer\n","gis = GIS(set_active=False)\n","covid_item = gis.content.get('628578697fb24d8ea4c32fa0c5ae1843')\n","print(covid_item)\n","covid_layer = covid_item.layers[0]\n","covid_layer"]},{"cell_type":"markdown","metadata":{},"source":["We can query the layer as a dataframe and then use the dataframe for enrichment."]},{"cell_type":"code","execution_count":42,"metadata":{},"outputs":[{"data":{"text/plain":["(3272, 19)"]},"execution_count":42,"metadata":{},"output_type":"execute_result"}],"source":["covid_df = covid_layer.query(as_df=True)\n","covid_df.shape"]},{"cell_type":"code","execution_count":43,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
OBJECTIDProvince_StateCountry_RegionLast_UpdateLatLong_ConfirmedRecoveredDeathsActiveAdmin2FIPSCombined_KeyIncident_RatePeople_TestedPeople_HospitalizedUIDISO3SHAPE
01AlabamaUS2022-10-27 17:22:3432.539527-86.64408218511<NA>228<NA>Autauga01001Autauga, Alabama, US33132.864379<NA><NA>84001001USA{\"x\": -86.64408226999996, \"y\": 32.539527450000...
12AlabamaUS2022-10-27 17:22:3430.72775-87.72207165973<NA>716<NA>Baldwin01003Baldwin, Alabama, US29553.293853<NA><NA>84001003USA{\"x\": -87.72207057999998, \"y\": 30.727749910000...
23AlabamaUS2022-10-27 17:22:3431.868263-85.3871296930<NA>103<NA>Barbour01005Barbour, Alabama, US28072.591752<NA><NA>84001005USA{\"x\": -85.38712859999998, \"y\": 31.868263000000...
34AlabamaUS2022-10-27 17:22:3432.996421-87.1251157575<NA>108<NA>Bibb01007Bibb, Alabama, US33826.024828<NA><NA>84001007USA{\"x\": -87.12511459999996, \"y\": 32.996420640000...
45AlabamaUS2022-10-27 17:22:3433.982109-86.56790617320<NA>258<NA>Blount01009Blount, Alabama, US29951.92474<NA><NA>84001009USA{\"x\": -86.56790592999994, \"y\": 33.982109180000...
\n","
"],"text/plain":[" OBJECTID Province_State Country_Region Last_Update Lat \\\n","0 1 Alabama US 2022-10-27 17:22:34 32.539527 \n","1 2 Alabama US 2022-10-27 17:22:34 30.72775 \n","2 3 Alabama US 2022-10-27 17:22:34 31.868263 \n","3 4 Alabama US 2022-10-27 17:22:34 32.996421 \n","4 5 Alabama US 2022-10-27 17:22:34 33.982109 \n","\n"," Long_ Confirmed Recovered Deaths Active Admin2 FIPS \\\n","0 -86.644082 18511 228 Autauga 01001 \n","1 -87.722071 65973 716 Baldwin 01003 \n","2 -85.387129 6930 103 Barbour 01005 \n","3 -87.125115 7575 108 Bibb 01007 \n","4 -86.567906 17320 258 Blount 01009 \n","\n"," Combined_Key Incident_Rate People_Tested People_Hospitalized \\\n","0 Autauga, Alabama, US 33132.864379 \n","1 Baldwin, Alabama, US 29553.293853 \n","2 Barbour, Alabama, US 28072.591752 \n","3 Bibb, Alabama, US 33826.024828 \n","4 Blount, Alabama, US 29951.92474 \n","\n"," UID ISO3 SHAPE \n","0 84001001 USA {\"x\": -86.64408226999996, \"y\": 32.539527450000... \n","1 84001003 USA {\"x\": -87.72207057999998, \"y\": 30.727749910000... \n","2 84001005 USA {\"x\": -85.38712859999998, \"y\": 31.868263000000... \n","3 84001007 USA {\"x\": -87.12511459999996, \"y\": 32.996420640000... \n","4 84001009 USA {\"x\": -86.56790592999994, \"y\": 33.982109180000... "]},"execution_count":43,"metadata":{},"output_type":"execute_result"}],"source":["covid_df.head()"]},{"cell_type":"markdown","metadata":{},"source":["To showcase GeoEnrichment, we will create a subset of the original data and then `enrich()` the subset."]},{"cell_type":"code","execution_count":44,"metadata":{},"outputs":[{"data":{"text/plain":["(100, 19)"]},"execution_count":44,"metadata":{},"output_type":"execute_result"}],"source":["# Create subset\n","test_df = covid_df.iloc[:100].copy()\n","test_df.shape"]},{"cell_type":"code","execution_count":45,"metadata":{},"outputs":[{"data":{"text/plain":["['point', None]"]},"execution_count":45,"metadata":{},"output_type":"execute_result"}],"source":["# Check geometry\n","test_df.spatial.geometry_type"]},{"cell_type":"markdown","metadata":{},"source":["A dataframe can be passed as a value to `study_areas` parameter of the `enrich()` method. Here we are enriching our dataframe with specific variables from `Age` data collection."]},{"cell_type":"code","execution_count":null,"metadata":{},"outputs":[],"source":["# Enrich dataframe\n","new_df = enrich(study_areas=test_df.spatial, \n"," analysis_variables=[\"Age.FEM45\",\"Age.FEM55\",\"Age.FEM65\"])"]},{"cell_type":"code","execution_count":44,"metadata":{},"outputs":[{"data":{"text/html":["
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
IDOBJECTID_0sourceCountryLong_RecoveredCountry_RegionFIPSLast_UpdateCombined_KeyISO3...bufferUnitsAliasbufferRadiiaggregationMethodpopulationToPolygonSizeRatingapportionmentConfidenceHasDataFEM45FEM55FEM65SHAPE
001US-82.4617070US450011596857725000Abbeville, South Carolina, USUSA...Miles1BlockApportionment:US.BlockGroups2.1912.5761222{\"rings\": [[[-82.46170657999994, 34.2378420028...
112US-92.4141970US220011596857725000Acadia, Louisiana, USUSA...Miles1BlockApportionment:US.BlockGroups2.1912.5761332{\"rings\": [[[-92.41419697999997, 30.3095821536...
223US-75.6323460US510011596857725000Accomack, Virginia, USUSA...Miles1BlockApportionment:US.BlockGroups2.1912.5761141614{\"rings\": [[[-75.63234615, 37.781571251121655]...
334US-116.2415520US160011596857725000Ada, Idaho, USUSA...Miles1BlockApportionment:US.BlockGroups2.1912.5761000{\"rings\": [[[-116.24155159999998, 43.467142851...
445US-94.4710590US190011596857725000Adair, Iowa, USUSA...Miles1BlockApportionment:US.BlockGroups2.1912.5761111{\"rings\": [[[-94.47105873999998, 41.3452468224...
\n","

5 rows × 31 columns

\n","
"],"text/plain":[" ID OBJECTID_0 sourceCountry Long_ Recovered Country_Region FIPS \\\n","0 0 1 US -82.461707 0 US 45001 \n","1 1 2 US -92.414197 0 US 22001 \n","2 2 3 US -75.632346 0 US 51001 \n","3 3 4 US -116.241552 0 US 16001 \n","4 4 5 US -94.471059 0 US 19001 \n","\n"," Last_Update Combined_Key ISO3 ... bufferUnitsAlias \\\n","0 1596857725000 Abbeville, South Carolina, US USA ... Miles \n","1 1596857725000 Acadia, Louisiana, US USA ... Miles \n","2 1596857725000 Accomack, Virginia, US USA ... Miles \n","3 1596857725000 Ada, Idaho, US USA ... Miles \n","4 1596857725000 Adair, Iowa, US USA ... Miles \n","\n"," bufferRadii aggregationMethod \\\n","0 1 BlockApportionment:US.BlockGroups \n","1 1 BlockApportionment:US.BlockGroups \n","2 1 BlockApportionment:US.BlockGroups \n","3 1 BlockApportionment:US.BlockGroups \n","4 1 BlockApportionment:US.BlockGroups \n","\n"," populationToPolygonSizeRating apportionmentConfidence HasData FEM45 \\\n","0 2.191 2.576 1 2 \n","1 2.191 2.576 1 3 \n","2 2.191 2.576 1 14 \n","3 2.191 2.576 1 0 \n","4 2.191 2.576 1 1 \n","\n"," FEM55 FEM65 SHAPE \n","0 2 2 {\"rings\": [[[-82.46170657999994, 34.2378420028... \n","1 3 2 {\"rings\": [[[-92.41419697999997, 30.3095821536... \n","2 16 14 {\"rings\": [[[-75.63234615, 37.781571251121655]... \n","3 0 0 {\"rings\": [[[-116.24155159999998, 43.467142851... \n","4 1 1 {\"rings\": [[[-94.47105873999998, 41.3452468224... \n","\n","[5 rows x 31 columns]"]},"execution_count":44,"metadata":{},"output_type":"execute_result"}],"source":["new_df.head()"]},{"cell_type":"code","execution_count":45,"metadata":{},"outputs":[],"source":["new_df.drop(['OBJECTID_0', 'ID','Last_Update'], axis=1, inplace=True)"]},{"cell_type":"code","execution_count":52,"metadata":{},"outputs":[{"data":{"text/plain":["(91, 28)"]},"execution_count":52,"metadata":{},"output_type":"execute_result"}],"source":["# Check shape\n","new_df.shape"]},{"cell_type":"markdown","metadata":{},"source":["We can see that enrichment resulted in 91 records and 31 columns. There are some areas in our dataframe for which enrichment information is not available. Hence, we have 91 records instead of 100. Geoenrichment adds some additional columns along with the analysis variables we enriched for and so we see 31 columns however we are dropping duplicates and unnecessary columns to bring the count down to 28 columns."]},{"cell_type":"markdown","metadata":{},"source":["### Visualize on a Map\n","\n","Let's visualize the enriched dataframe on a map. We will use `FEM65` column to classify our data for plotting on the map."]},{"cell_type":"code","execution_count":67,"metadata":{},"outputs":[{"data":{"text/html":[""],"text/plain":[""]},"execution_count":67,"metadata":{},"output_type":"execute_result"}],"source":["covid_map = gis.map('USA')\n","covid_map"]},{"cell_type":"code","execution_count":61,"metadata":{},"outputs":[{"data":{"text/plain":["True"]},"execution_count":61,"metadata":{},"output_type":"execute_result"}],"source":["# Plot on a map\n","new_df.spatial.plot(map_widget=covid_map)"]},{"cell_type":"markdown","metadata":{},"source":["## Conclusion"]},{"cell_type":"markdown","metadata":{},"source":["In this part of the `arcgis.geoenrichment` module guide series, you saw how `data_collections` property of a `Country` object lists its available `data_collection`s and `analysis_variable`s. You explored different data collections, their analysis variables and then enriched study areas using the same. Towards the end, you experienced how spatially enabled dataframes can be enriched.\n","\n","In the subsequent pages, you will learn about Generating Reports and Standard Geography Queries."]}],"metadata":{"anaconda-cloud":{},"kernelspec":{"display_name":"arcgispro-dev","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.9.11 [MSC v.1931 64 bit (AMD64)]"},"livereveal":{"scroll":true},"toc":{"base_numbering":1,"nav_menu":{},"number_sections":true,"sideBar":true,"skip_h1_title":true,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":true,"toc_position":{"height":"calc(100% - 180px)","left":"10px","top":"150px","width":"274px"},"toc_section_display":true,"toc_window_display":true},"vscode":{"interpreter":{"hash":"07c13af76457a6f4e5d5b34d0e1bc42b2e017343b05b916c8871f483eed35ce6"}}},"nbformat":4,"nbformat_minor":2} +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "slide" + } + }, + "source": [ + "# Part 4 - What to enrich with? (What are Data Collections and Analysis Variables?)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data Collections and GeoEnrichment coverage\n", + "\n", + "As described earlier, a data collection is a preassembled list of attributes that will be used to enrich the input features. Collection attributes can describe various types of information, such as demographic characteristics and geographic context of the locations or areas submitted as input features. \n", + "\n", + "Some data collections (such as default) can be used in all supported countries. Other data collections may only be available in one or a collection of countries. [Data Browser](https://doc.arcgis.com/en/esri-demographics/data/data-browser.htm) can be used to examine the entire global listing of variables, and associated datasets for each country." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### List Countries with GeoEnrichment Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "slideshow": { + "slide_type": "subslide" + } + }, + "source": [ + "The `get_countries()` method can be used to query the countries for which GeoEnrichment data is available, and it returns a list of `Country` objects with which you can further query for properties. This list can also be viewed [here](https://developers.arcgis.com/rest/geoenrichment/api-reference/geoenrichment-coverage.htm)." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "from arcgis.gis import GIS\n", + "from arcgis.geoenrichment import Country, enrich, get_countries" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "# Create a GIS Connection\n", + "gis = GIS(profile='your_online_profile')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "slideshow": { + "slide_type": "fragment" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Number of countries for which GeoEnrichment data is available: 177\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
iso2iso3namealt_namedatasetsdefault_datasetcontinenthierarchiesdefault_hierarchy
0ALALBAlbaniaALBANIA[ALB_MBR_2024]ALB_MBR_2024Europe[census]census
1DZDZAAlgeriaALGERIA[DZA_MBR_2025]DZA_MBR_2025Africa[census]census
2ADANDAndorraANDORRA[AND_MBR_2024]AND_MBR_2024Europe[census]census
3AOAGOAngolaANGOLA[AGO_MBR_2025]AGO_MBR_2025Africa[census]census
4AIAIAAnguillaANGUILLA[AIA_MBR_2025]AIA_MBR_2025North America[census]census
5ARARGArgentinaARGENTINA[ARG_MBR_2024]ARG_MBR_2024South America[census]census
6AMARMArmeniaARMENIA[ARM_MBR_2024]ARM_MBR_2024Europe[census]census
7AWABWArubaARUBA[ABW_MBR_2025]ABW_MBR_2025North America[census]census
8AUAUSAustraliaAUSTRALIA[AUS_ABS_2021, AUS_MBR_2024]AUS_ABS_2021Oceania[AUS_ABS, census]AUS_ABS
9ATAUTAustriaAUSTRIA[AUT_MBR_2024]AUT_MBR_2024Europe[census]census
\n", + "
" + ], + "text/plain": [ + " iso2 iso3 name alt_name datasets \\\n", + "0 AL ALB Albania ALBANIA [ALB_MBR_2024] \n", + "1 DZ DZA Algeria ALGERIA [DZA_MBR_2025] \n", + "2 AD AND Andorra ANDORRA [AND_MBR_2024] \n", + "3 AO AGO Angola ANGOLA [AGO_MBR_2025] \n", + "4 AI AIA Anguilla ANGUILLA [AIA_MBR_2025] \n", + "5 AR ARG Argentina ARGENTINA [ARG_MBR_2024] \n", + "6 AM ARM Armenia ARMENIA [ARM_MBR_2024] \n", + "7 AW ABW Aruba ARUBA [ABW_MBR_2025] \n", + "8 AU AUS Australia AUSTRALIA [AUS_ABS_2021, AUS_MBR_2024] \n", + "9 AT AUT Austria AUSTRIA [AUT_MBR_2024] \n", + "\n", + " default_dataset continent hierarchies default_hierarchy \n", + "0 ALB_MBR_2024 Europe [census] census \n", + "1 DZA_MBR_2025 Africa [census] census \n", + "2 AND_MBR_2024 Europe [census] census \n", + "3 AGO_MBR_2025 Africa [census] census \n", + "4 AIA_MBR_2025 North America [census] census \n", + "5 ARG_MBR_2024 South America [census] census \n", + "6 ARM_MBR_2024 Europe [census] census \n", + "7 ABW_MBR_2025 North America [census] census \n", + "8 AUS_ABS_2021 Oceania [AUS_ABS, census] AUS_ABS \n", + "9 AUT_MBR_2024 Europe [census] census " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "countries = get_countries()\n", + "print(\"Number of countries for which GeoEnrichment data is available: \" + str(len(countries)))\n", + "\n", + "#print a few countries for a sample\n", + "countries[0:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data Collections for U.S.\n", + "\n", + "The `data_collections` property of a `Country` object lists its available data collections and analysis variables under each data collection as a Pandas dataframe.\n", + "\n", + "In order to discover the data collections for a particular country, you may first access the reference variable to it using the `country.get()` method, and then fetch the data collections from `country.data_collections` property. Once we know the data collection we would like to use, we can look at `analysisVariable`s available in that data collection." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "arcgis.geoenrichment.enrichment.Country" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Get US as a country\n", + "usa = Country.get('US')\n", + "type(usa)" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
analysisVariablealiasfieldCategoryvintage
dataCollectionID
1yearincrements1yearincrements.AGE0_CY2025 Population Age <12025 Age: 1 Year Increments (Esri)2025
1yearincrements1yearincrements.AGE1_CY2025 Population Age 12025 Age: 1 Year Increments (Esri)2025
1yearincrements1yearincrements.AGE2_CY2025 Population Age 22025 Age: 1 Year Increments (Esri)2025
1yearincrements1yearincrements.AGE3_CY2025 Population Age 32025 Age: 1 Year Increments (Esri)2025
1yearincrements1yearincrements.AGE4_CY2025 Population Age 42025 Age: 1 Year Increments (Esri)2025
\n", + "
" + ], + "text/plain": [ + " analysisVariable alias \\\n", + "dataCollectionID \n", + "1yearincrements 1yearincrements.AGE0_CY 2025 Population Age <1 \n", + "1yearincrements 1yearincrements.AGE1_CY 2025 Population Age 1 \n", + "1yearincrements 1yearincrements.AGE2_CY 2025 Population Age 2 \n", + "1yearincrements 1yearincrements.AGE3_CY 2025 Population Age 3 \n", + "1yearincrements 1yearincrements.AGE4_CY 2025 Population Age 4 \n", + "\n", + " fieldCategory vintage \n", + "dataCollectionID \n", + "1yearincrements 2025 Age: 1 Year Increments (Esri) 2025 \n", + "1yearincrements 2025 Age: 1 Year Increments (Esri) 2025 \n", + "1yearincrements 2025 Age: 1 Year Increments (Esri) 2025 \n", + "1yearincrements 2025 Age: 1 Year Increments (Esri) 2025 \n", + "1yearincrements 2025 Age: 1 Year Increments (Esri) 2025 " + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df = usa.data_collections\n", + "\n", + "# print a few rows of the DataFrame\n", + "usa_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(21033, 4)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Unique Data Collections for U.S." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Each data collection and analysis variable has a unique ID. When calling the `enrich()` method (explained earlier in this guide) these analysis variables can be passed in the `data_collections` and `analysis_variables` parameters.\n", + "\n", + "As an example, here we see a subset of the data collections for US showing 2 different data collections and multiple analysis variables for each collection." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
analysisVariablealiasfieldCategoryvintage
dataCollectionID
1yearincrements1yearincrements.FAGE75_FY2030 Females Age 752030 Age: 1 Year Increments (Esri)2030
1yearincrements1yearincrements.FAGE76_FY2030 Females Age 762030 Age: 1 Year Increments (Esri)2030
1yearincrements1yearincrements.FAGE77_FY2030 Females Age 772030 Age: 1 Year Increments (Esri)2030
1yearincrements1yearincrements.FAGE78_FY2030 Females Age 782030 Age: 1 Year Increments (Esri)2030
1yearincrements1yearincrements.FAGE79_FY2030 Females Age 792030 Age: 1 Year Increments (Esri)2030
...............
1yearincrements1yearincrements.AGE18C202020 Population Age 182020 Age: 1 Year Increments (U.S. Census)2020
1yearincrements1yearincrements.AGE19C202020 Population Age 192020 Age: 1 Year Increments (U.S. Census)2020
1yearincrements1yearincrements.AGE20C202020 Population Age 202020 Age: 1 Year Increments (U.S. Census)2020
1yearincrements1yearincrements.AGE21C202020 Population Age 212020 Age: 1 Year Increments (U.S. Census)2020
1yearincrements1yearincrements.MLU20POP202020 Male Pop <202020 Age: 1 Year Increments (U.S. Census)2020
\n", + "

100 rows × 4 columns

\n", + "
" + ], + "text/plain": [ + " analysisVariable alias \\\n", + "dataCollectionID \n", + "1yearincrements 1yearincrements.FAGE75_FY 2030 Females Age 75 \n", + "1yearincrements 1yearincrements.FAGE76_FY 2030 Females Age 76 \n", + "1yearincrements 1yearincrements.FAGE77_FY 2030 Females Age 77 \n", + "1yearincrements 1yearincrements.FAGE78_FY 2030 Females Age 78 \n", + "1yearincrements 1yearincrements.FAGE79_FY 2030 Females Age 79 \n", + "... ... ... \n", + "1yearincrements 1yearincrements.AGE18C20 2020 Population Age 18 \n", + "1yearincrements 1yearincrements.AGE19C20 2020 Population Age 19 \n", + "1yearincrements 1yearincrements.AGE20C20 2020 Population Age 20 \n", + "1yearincrements 1yearincrements.AGE21C20 2020 Population Age 21 \n", + "1yearincrements 1yearincrements.MLU20POP20 2020 Male Pop <20 \n", + "\n", + " fieldCategory vintage \n", + "dataCollectionID \n", + "1yearincrements 2030 Age: 1 Year Increments (Esri) 2030 \n", + "1yearincrements 2030 Age: 1 Year Increments (Esri) 2030 \n", + "1yearincrements 2030 Age: 1 Year Increments (Esri) 2030 \n", + "1yearincrements 2030 Age: 1 Year Increments (Esri) 2030 \n", + "1yearincrements 2030 Age: 1 Year Increments (Esri) 2030 \n", + "... ... ... \n", + "1yearincrements 2020 Age: 1 Year Increments (U.S. Census) 2020 \n", + "1yearincrements 2020 Age: 1 Year Increments (U.S. Census) 2020 \n", + "1yearincrements 2020 Age: 1 Year Increments (U.S. Census) 2020 \n", + "1yearincrements 2020 Age: 1 Year Increments (U.S. Census) 2020 \n", + "1yearincrements 2020 Age: 1 Year Increments (U.S. Census) 2020 \n", + "\n", + "[100 rows x 4 columns]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df.iloc[500:600,:]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The table above shows 2 different data collections (1yearincrements and 5yearincrements). Since these are `Age` data collections, the `analysisVariable`s for these collections are similar. `vintage` shows the year that the demographic data represents. For example, a vintage of 2020 means that the data represents the year 2020.\n", + "\n", + "Let's get a list of unique data collections that are available for U.S. " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "121" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df.index.nunique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*United States has 150 unique data collections.* Here are the first 10 data collections.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['1yearincrements',\n", + " '5yearincrements',\n", + " 'Age',\n", + " 'agebyracebysex',\n", + " 'agebyracebysex2010',\n", + " 'agebyracebysex2020',\n", + " 'AgeDependency',\n", + " 'AtRisk',\n", + " 'AutomobilesAutomotiveProducts',\n", + " 'BabyProductsToysGames']" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(usa_df.index.unique())[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Looking at `fieldCategory` is a great way to clearly understand what the data collection is about. `fieldCategory` combines vintage, datacollectionID columns along with the year and data collection. However, to query a data collection its unique ID (`dataCollectionID`) must be used.\n", + "\n", + "Let's look at the `fieldCategory` column for a few data collections in US." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['2025 Age: 1 Year Increments (Esri)',\n", + " '2030 Age: 1 Year Increments (Esri)',\n", + " '2010 Age: 1 Year Increments (U.S. Census)',\n", + " '2020 Age: 1 Year Increments (U.S. Census)',\n", + " '2025 Age: 5 Year Increments (Esri)',\n", + " '2030 Age: 5 Year Increments (Esri)',\n", + " '2010 Age: 5 Year Increments (U.S. Census)',\n", + " '2019-2023 Age: 5 Year Increments (ACS)',\n", + " '2020 Age: 5 Year Increments (U.S. Census)',\n", + " '2025 Age by Sex by Race (Esri)'], dtype=object)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df.fieldCategory.unique()[:10]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Data Collections by Socio-demographic Factors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can filter the `data_collections` to get collections for a specific factor using Pandas expressions. Let's loook at data collections for different `socio-demographic factors` such as `Age, Population, Income`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Data Collections for Age__" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['2025 Age: 1 Year Increments (Esri)',\n", + " '2030 Age: 1 Year Increments (Esri)',\n", + " '2010 Age: 1 Year Increments (U.S. Census)',\n", + " '2020 Age: 1 Year Increments (U.S. Census)',\n", + " '2025 Age: 5 Year Increments (Esri)',\n", + " '2030 Age: 5 Year Increments (Esri)',\n", + " '2010 Age: 5 Year Increments (U.S. Census)',\n", + " '2019-2023 Age: 5 Year Increments (ACS)',\n", + " '2020 Age: 5 Year Increments (U.S. Census)',\n", + " '2025 Age by Sex by Race (Esri)', '2030 Age by Sex by Race (Esri)',\n", + " '2010 Age by Sex by Race (U.S. Census)',\n", + " '2020 Age by Sex by Race (U.S. Census)',\n", + " '2025 Age Dependency (Esri)', '2030 Age Dependency (Esri)',\n", + " '2025 Disposable Income by Age (Esri)',\n", + " '2010 Households by Age of Householder (U.S. Census)',\n", + " '2019-2023 Households by Type and Size and Age (ACS)',\n", + " '2010 Housing by Age of Householder (U.S. Census)',\n", + " '2020 Housing by Age of Householder (U.S. Census)',\n", + " '2025 Income by Age (Esri)', '2030 Income by Age (Esri)',\n", + " '2019-2023 Income by Age (ACS)', 'Age: 5 Year Increments',\n", + " '2025 Net Worth by Age (Esri)',\n", + " '2019-2023 Females by Age of Children and Employment Status (ACS)'],\n", + " dtype=object)" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Age_Collections = usa_df['fieldCategory'].str.contains('Age', na=False)\n", + "usa_df[Age_Collections].fieldCategory.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Data Collections for Population__" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['2010 Population (U.S. Census)', '2020 Population (U.S. Census)',\n", + " '2019-2023 Population by Language Spoken at Home (ACS)',\n", + " '2025 Daytime Population (Esri)',\n", + " '2025 Population by Generation (Esri)',\n", + " '2030 Population by Generation (Esri)',\n", + " '2020 Group Quarters Population (U.S. Census)',\n", + " '2010 Group Quarters Population (U.S. Census)',\n", + " '2020 Hispanic Population of Two or More Races (U.S. Census)',\n", + " '2020 Hispanic Population <18 Years by Race (U.S. Census)',\n", + " '2020 Hispanic Population 18+ Years by Race (U.S. Census)',\n", + " '2020 Hispanic Population 18+ Years of Two or More Races (U.S. Census)',\n", + " '2025 Population Time Series (Esri)',\n", + " '2010 Population by Relationship and Household Type (U.S. Census)',\n", + " '2019-2023 Population by Relationship and Household Type (ACS)',\n", + " '2020 Population by Relationship and Household Type (U.S. Census)',\n", + " '2025 Tapestry (Population)',\n", + " '2020 Non Hispanic Population 18+ Years by Race (U.S. Census)',\n", + " '2020 Non Hispanic Population 18+ Years of Two or More Races (U.S. Census)',\n", + " '2020 Non Hispanic Population <18 Years by Race (U.S. Census)',\n", + " '2020 Non Hispanic Population of Two or More Races (U.S. Census)',\n", + " '2025 Population (Esri)',\n", + " '2020 Population of Two or More Races (U.S. Census)',\n", + " '2020 Population <18 Years by Race (U.S. Census)',\n", + " '2020 Population 18+ Years by Race (U.S. Census)',\n", + " '2020 Population 18+ Years of Two or More Races (U.S. Census)',\n", + " '2025 Urbanicity (Population)'], dtype=object)" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Pop_Collections = usa_df['fieldCategory'].str.contains('Population', na=False)\n", + "usa_df[Pop_Collections].fieldCategory.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Data Collections for Income__" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['1yearincrements', '5yearincrements', 'Age', 'agebyracebysex',\n", + " 'agebyracebysex2010', 'agebyracebysex2020', 'AgeDependency', 'AtRisk',\n", + " 'AutomobilesAutomotiveProducts', 'BabyProductsToysGames',\n", + " ...\n", + " 'unitsinstructure', 'urbanicity', 'UrbanicityLandarea', 'vacant',\n", + " 'vehiclesavailable', 'veterans', 'Wealth', 'women', 'yearbuilt',\n", + " 'yearmovedin'],\n", + " dtype='object', name='dataCollectionID', length=121)" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Income_Collections = usa_df['fieldCategory'].str.contains('Income', na=False)\n", + "Income_Collections.index.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As mentioned earlier, using a `data_collection`'s unique ID (`dataCollectionID`) is the best way to further query a data collection. Let's look at the `dataCollectionID` for various Income data collections." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['AtRisk', 'basicFactsForMobileApps', 'disposableincome',\n", + " 'foodstampsSNAP', 'Health', 'householdincome', 'households',\n", + " 'incomebyage', 'KeyUSFacts', 'Policy', 'population', 'Wealth'],\n", + " dtype='object', name='dataCollectionID')" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df[Income_Collections].index.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Analysis variables for Data Collections\n", + "Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover `analysisVariable`s for some of the data collections." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Analysis variables for `Age` data collection__" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',\n", + " 'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',\n", + " 'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',\n", + " 'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',\n", + " 'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',\n", + " 'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',\n", + " 'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',\n", + " 'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df.loc['Age']['analysisVariable'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Analysis variables are typically represented as `dataCollectionID.` as seen above." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Analysis variables for `DaytimePopulation` data collection__" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['DaytimePopulation.DPOP_CY', 'DaytimePopulation.DPOPWRK_CY',\n", + " 'DaytimePopulation.DPOPRES_CY', 'DaytimePopulation.DPOPDENSCY'],\n", + " dtype=object)" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "usa_df.loc['DaytimePopulation']['analysisVariable'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Data Collections for Another Country" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's look at data collections for New Zealand. [Data Browser](https://doc.arcgis.com/en/esri-demographics/data/data-browser.htm) can be used to examine the entire global listing of variables, and associated datasets for New Zealand.\n", + "\n", + "In order to discover the data collections for a particular country, you may first access the reference variable to it using the `country.get()` method, and then fetch the data collections from `country.data_collections` property. Once we know the data collection we would like to use, we can look at `analysisVariable`s available in that data collection." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "arcgis.geoenrichment.enrichment.Country" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Get US as a country\n", + "nz = Country.get('New Zealand')\n", + "type(nz)" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
analysisVariablealiasfieldCategoryvintage
dataCollectionID
5YearIncrementsStatsNZ5YearIncrementsStatsNZ.Age5year_Total2023 5-Year Age Groups: Total2023 Population by Age (Stats NZ)2023
5YearIncrementsStatsNZ5YearIncrementsStatsNZ.Age5year_0_4_years2023 5-Year Age Group: 0 to 4 Years2023 Population by Age (Stats NZ)2023
5YearIncrementsStatsNZ5YearIncrementsStatsNZ.Age5year_5_9_years2023 5-Year Age Group: 5 to 9 Years2023 Population by Age (Stats NZ)2023
5YearIncrementsStatsNZ5YearIncrementsStatsNZ.Age5year_10_14_years2023 5-Year Age Group: 10 to 14 Years2023 Population by Age (Stats NZ)2023
5YearIncrementsStatsNZ5YearIncrementsStatsNZ.Age5year_15_19_years2023 5-Year Age Group: 15 to 19 Years2023 Population by Age (Stats NZ)2023
\n", + "
" + ], + "text/plain": [ + " analysisVariable \\\n", + "dataCollectionID \n", + "5YearIncrementsStatsNZ 5YearIncrementsStatsNZ.Age5year_Total \n", + "5YearIncrementsStatsNZ 5YearIncrementsStatsNZ.Age5year_0_4_years \n", + "5YearIncrementsStatsNZ 5YearIncrementsStatsNZ.Age5year_5_9_years \n", + "5YearIncrementsStatsNZ 5YearIncrementsStatsNZ.Age5year_10_14_years \n", + "5YearIncrementsStatsNZ 5YearIncrementsStatsNZ.Age5year_15_19_years \n", + "\n", + " alias \\\n", + "dataCollectionID \n", + "5YearIncrementsStatsNZ 2023 5-Year Age Groups: Total \n", + "5YearIncrementsStatsNZ 2023 5-Year Age Group: 0 to 4 Years \n", + "5YearIncrementsStatsNZ 2023 5-Year Age Group: 5 to 9 Years \n", + "5YearIncrementsStatsNZ 2023 5-Year Age Group: 10 to 14 Years \n", + "5YearIncrementsStatsNZ 2023 5-Year Age Group: 15 to 19 Years \n", + "\n", + " fieldCategory vintage \n", + "dataCollectionID \n", + "5YearIncrementsStatsNZ 2023 Population by Age (Stats NZ) 2023 \n", + "5YearIncrementsStatsNZ 2023 Population by Age (Stats NZ) 2023 \n", + "5YearIncrementsStatsNZ 2023 Population by Age (Stats NZ) 2023 \n", + "5YearIncrementsStatsNZ 2023 Population by Age (Stats NZ) 2023 \n", + "5YearIncrementsStatsNZ 2023 Population by Age (Stats NZ) 2023 " + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df = nz.data_collections\n", + "\n", + "# print a few rows of the DataFrame\n", + "nz_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(718, 4)" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Unique Data Collections for New Zealand" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's get a list of unique data collections that are available for New Zealand." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['5YearIncrementsStatsNZ', 'AccesstoAmenitiesStatsNZ',\n", + " 'AccesstoTelecommunicationsStatsNZ', 'BirthplaceStatsNZ',\n", + " 'DwellingDampnessStatsNZ', 'EducationalAttainmentStatsNZ',\n", + " 'EmploymentStatusStatsNZ', 'EthnicityStatsNZ', 'FamilyStatsNZ',\n", + " 'HealthStatsNZ', 'HeatingSourceStatsNZ', 'HomeOwnershipStatusStatsNZ',\n", + " 'HoursWorkedStatsNZ', 'HouseholdIncomeStatsNZ', 'HousingbySizeStatsNZ',\n", + " 'HousingCostsStatsNZ', 'ImmigrationPeriodStatsNZ', 'IndustryStatsNZ',\n", + " 'JobSearchStatsNZ', 'KeyGlobalFacts', 'LabourForceStatusStatsNZ',\n", + " 'LandlordTypeStatsNZ', 'LanguageSpokenStatsNZ',\n", + " 'LifeCycleGroupsStatsNZ', 'MaoriDescentStatsNZ', 'MaritalStatusStatsNZ',\n", + " 'MethodofTraveltoWorkStatsNZ', 'NumberofBornChildrenStatsNZ',\n", + " 'OccupancyStatusStatsNZ', 'OccupationStatsNZ', 'PersonalIncomeStatsNZ',\n", + " 'PopulationTotalsStatsNZ', 'ReligiousAffiliationStatsNZ',\n", + " 'SmokingBehaviourStatsNZ', 'StructureTypeStatsNZ',\n", + " 'StudyParticipationStatsNZ', 'TraveltoSchoolStatsNZ',\n", + " 'UnpaidActivitiesStatsNZ', 'UsualResidenceStatsNZ', 'VehiclesStatsNZ'],\n", + " dtype='object', name='dataCollectionID')" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.index.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "New Zealand has 40 unique data collections.\n", + "\n", + "We can look at the `fieldCategory` column to understand each category better." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['2023 Population by Age (Stats NZ)',\n", + " '2023 AccessToAmenities (Stats NZ)',\n", + " '2023 Access To Telecommunications (Stats NZ)',\n", + " '2023 Birthplace (Stats NZ)', '2023 Dwelling Dampness (Stats NZ)',\n", + " '2023 Dwelling Mould (Stats NZ)',\n", + " '2023 Educational Attainment (Stats NZ)',\n", + " '2023 Post-school Qualification Indicator (Stats NZ)',\n", + " '2023 Highest Secondary School Qualification (Stats NZ)',\n", + " '2023 Post-school Qualification (Stats NZ)',\n", + " '2023 Empolyment Status (Stats NZ)', '2023 Ethnicity (Stats NZ)',\n", + " '2023 Family Totals (Stats NZ)',\n", + " '2023 Dwelling by Family Type (Stats NZ)',\n", + " '2023 Number of People in Family (Stats NZ)',\n", + " '2023 Family Income (Stats NZ)',\n", + " '2023 Extended Family Totals (Stats NZ)',\n", + " '2023 Dwelling by Extended Family Type (Stats NZ)',\n", + " '2023 Extended Family Income (Stats NZ)',\n", + " '2023 Difficulty Seeing (Stats NZ)',\n", + " '2023 Difficulty Hearing (Stats NZ)',\n", + " '2023 Difficulty Walking (Stats NZ)',\n", + " '2023 Difficulty Remembering (Stats NZ)',\n", + " '2023 Difficulty Washing or Dressing (Stats NZ)',\n", + " '2023 Difficulty Communicating (Stats NZ)',\n", + " '2023 LGBTIQ+ Indicator (Stats NZ)',\n", + " '2023 Sexual Identity (Stats NZ)',\n", + " '2023 Disability Indicator (Stats NZ)',\n", + " '2023 Heating Source (Stats NZ)', '2023 Heating Fuel (Stats NZ)',\n", + " '2023 Households by Tenure (Stats NZ)',\n", + " '2023 Home Ownership Status (Stats NZ)',\n", + " '2023 Sector of Ownership (Stats NZ)',\n", + " '2023 Hours Worked (Stats NZ)', '2023 Household Income (Stats NZ)',\n", + " '2023 Dwelling By Number Of Rooms (Stats NZ)',\n", + " '2023 Dwelling By Number Of Bedrooms (Stats NZ)',\n", + " '2023 Household Crowding Index (Stats NZ)',\n", + " '2023 Household Composition (Stats NZ)',\n", + " '2023 Housing Costs (Stats NZ)',\n", + " '2023 Years Since Immigration (Stats NZ)',\n", + " '2023 Industry By Residence (Stats NZ)',\n", + " '2023 Industry by Workplace (Stats NZ)',\n", + " '2023 Job Search Methods (2023)', 'Key Demographic Indicators',\n", + " '2023 Labour Force Status (Stats NZ)',\n", + " '2023 Landlord Type (Stats NZ)',\n", + " '2023 Languages Spoken (Stats NZ)',\n", + " '2023 Life Cycle Group (Stats NZ)',\n", + " '2023 Māori Descent (Stats NZ)', '2023 Marital Status (Stats NZ)',\n", + " '2023 Partnership Status (Stats NZ)',\n", + " '2023 Travel To Work by Residence (Stats NZ)',\n", + " '2023 Travel To Work by Workplace (Stats NZ)',\n", + " '2023 Number Of Children (Stats NZ)',\n", + " '2023 Occupancy Status (Stats NZ)',\n", + " '2023 Occupation By Residence (Stats NZ)',\n", + " '2023 Occupation By Workplace (Stats NZ)',\n", + " '2023 Personal Income (Stats NZ)',\n", + " '2023 Source of Income (Stats NZ)',\n", + " '2023 Population Totals (Stats NZ)',\n", + " '2023 Sex at Birth (Stats NZ)',\n", + " '2023 Religious Affiliation (Stats NZ)',\n", + " '2023 Smoking Behaviour (Stats NZ)',\n", + " '2023 Dwelling Record Type (Stats NZ)',\n", + " '2023 Dwelling Structure Type (Stats NZ)',\n", + " '2023 Study Participation (Stats NZ)',\n", + " '2023 Travel To Education By Residence (Stats NZ)',\n", + " '2023 Travel To Education By Institution (Stats NZ)',\n", + " '2023 Unpaid Activities (Stats NZ)',\n", + " '2023 Years at Residence (Stats NZ)',\n", + " '2023 5-Year Residence History (Stats NZ)',\n", + " '2023 1-Year Residence History (Stats NZ)',\n", + " '2023 Number of Usual Residents (Stats NZ)',\n", + " '2023 Vehicles Available (Stats NZ)'], dtype=object)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.fieldCategory.unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Looking at `fieldCategory` is a great way to clearly understand what the data collection is about. However, to query a data collection its unique ID (`dataCollectionID`) must be used." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Data Collections for Socio-demographic Factors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "New Zealand has fewer `data_collections` compared to U.S. Let's look at data collections for Key Facts, Education and Family." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Data Collection for Key Facts__" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
analysisVariablealiasfieldCategoryvintage
dataCollectionID
KeyGlobalFactsKeyGlobalFacts.TOTPOPTotal PopulationKey Demographic IndicatorsNaN
KeyGlobalFactsKeyGlobalFacts.TOTHHTotal HouseholdsKey Demographic IndicatorsNaN
KeyGlobalFactsKeyGlobalFacts.TOTFEMALESFemale PopulationKey Demographic IndicatorsNaN
KeyGlobalFactsKeyGlobalFacts.TOTMALESMale PopulationKey Demographic IndicatorsNaN
KeyGlobalFactsKeyGlobalFacts.AVGHHSZAverage Household SizeKey Demographic IndicatorsNaN
\n", + "
" + ], + "text/plain": [ + " analysisVariable alias \\\n", + "dataCollectionID \n", + "KeyGlobalFacts KeyGlobalFacts.TOTPOP Total Population \n", + "KeyGlobalFacts KeyGlobalFacts.TOTHH Total Households \n", + "KeyGlobalFacts KeyGlobalFacts.TOTFEMALES Female Population \n", + "KeyGlobalFacts KeyGlobalFacts.TOTMALES Male Population \n", + "KeyGlobalFacts KeyGlobalFacts.AVGHHSZ Average Household Size \n", + "\n", + " fieldCategory vintage \n", + "dataCollectionID \n", + "KeyGlobalFacts Key Demographic Indicators NaN \n", + "KeyGlobalFacts Key Demographic Indicators NaN \n", + "KeyGlobalFacts Key Demographic Indicators NaN \n", + "KeyGlobalFacts Key Demographic Indicators NaN \n", + "KeyGlobalFacts Key Demographic Indicators NaN " + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.loc['KeyGlobalFacts']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Data Collection for Education__\n", + "\n", + "Let's take a look at the first 5 rows for this collection." + ] + }, + { + "cell_type": "code", + "execution_count": 76, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
analysisVariablealiasfieldCategoryvintage
dataCollectionID
EducationalAttainmentStatsNZEducationalAttainmentStatsNZ.HighestQual_Total2023 Education Attainment: Total2023 Educational Attainment (Stats NZ)2023
EducationalAttainmentStatsNZEducationalAttainmentStatsNZ.HighestQual_TStated2023 Education Attainment: Total Stated2023 Educational Attainment (Stats NZ)2023
EducationalAttainmentStatsNZEducationalAttainmentStatsNZ.HighestQual_No_quali2023 Education Attainment: No Qualifications2023 Educational Attainment (Stats NZ)2023
EducationalAttainmentStatsNZEducationalAttainmentStatsNZ.HighestQual_L1_Certi2023 Education Attainment: Level 1 Certificate2023 Educational Attainment (Stats NZ)2023
EducationalAttainmentStatsNZEducationalAttainmentStatsNZ.HighestQual_L2_Certi2023 Education Attainment: Level 2 Certificate2023 Educational Attainment (Stats NZ)2023
\n", + "
" + ], + "text/plain": [ + " analysisVariable \\\n", + "dataCollectionID \n", + "EducationalAttainmentStatsNZ EducationalAttainmentStatsNZ.HighestQual_Total \n", + "EducationalAttainmentStatsNZ EducationalAttainmentStatsNZ.HighestQual_TStated \n", + "EducationalAttainmentStatsNZ EducationalAttainmentStatsNZ.HighestQual_No_quali \n", + "EducationalAttainmentStatsNZ EducationalAttainmentStatsNZ.HighestQual_L1_Certi \n", + "EducationalAttainmentStatsNZ EducationalAttainmentStatsNZ.HighestQual_L2_Certi \n", + "\n", + " alias \\\n", + "dataCollectionID \n", + "EducationalAttainmentStatsNZ 2023 Education Attainment: Total \n", + "EducationalAttainmentStatsNZ 2023 Education Attainment: Total Stated \n", + "EducationalAttainmentStatsNZ 2023 Education Attainment: No Qualifications \n", + "EducationalAttainmentStatsNZ 2023 Education Attainment: Level 1 Certificate \n", + "EducationalAttainmentStatsNZ 2023 Education Attainment: Level 2 Certificate \n", + "\n", + " fieldCategory vintage \n", + "dataCollectionID \n", + "EducationalAttainmentStatsNZ 2023 Educational Attainment (Stats NZ) 2023 \n", + "EducationalAttainmentStatsNZ 2023 Educational Attainment (Stats NZ) 2023 \n", + "EducationalAttainmentStatsNZ 2023 Educational Attainment (Stats NZ) 2023 \n", + "EducationalAttainmentStatsNZ 2023 Educational Attainment (Stats NZ) 2023 \n", + "EducationalAttainmentStatsNZ 2023 Educational Attainment (Stats NZ) 2023 " + ] + }, + "execution_count": 76, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.loc['EducationalAttainmentStatsNZ'].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Data Collection for Family__\n", + "\n", + "Let's take a look at the first 5 rows for this collection." + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
analysisVariablealiasfieldCategoryvintage
dataCollectionID
FamilyStatsNZFamilyStatsNZ.FamilyCount_Total2023 Count of Families: Total2023 Family Totals (Stats NZ)2023
FamilyStatsNZFamilyStatsNZ.FamType_Total2023 Family Type: Total2023 Dwelling by Family Type (Stats NZ)2023
FamilyStatsNZFamilyStatsNZ.FamType_CoupNoChildren2023 Family Type: Couple Without Children2023 Dwelling by Family Type (Stats NZ)2023
FamilyStatsNZFamilyStatsNZ.FamType_CoupWithChildren2023 Family Type: Couple With Child(ren)2023 Dwelling by Family Type (Stats NZ)2023
FamilyStatsNZFamilyStatsNZ.FamType_OneParent2023 Family Type: One Parent With Child(ren)2023 Dwelling by Family Type (Stats NZ)2023
\n", + "
" + ], + "text/plain": [ + " analysisVariable \\\n", + "dataCollectionID \n", + "FamilyStatsNZ FamilyStatsNZ.FamilyCount_Total \n", + "FamilyStatsNZ FamilyStatsNZ.FamType_Total \n", + "FamilyStatsNZ FamilyStatsNZ.FamType_CoupNoChildren \n", + "FamilyStatsNZ FamilyStatsNZ.FamType_CoupWithChildren \n", + "FamilyStatsNZ FamilyStatsNZ.FamType_OneParent \n", + "\n", + " alias \\\n", + "dataCollectionID \n", + "FamilyStatsNZ 2023 Count of Families: Total \n", + "FamilyStatsNZ 2023 Family Type: Total \n", + "FamilyStatsNZ 2023 Family Type: Couple Without Children \n", + "FamilyStatsNZ 2023 Family Type: Couple With Child(ren) \n", + "FamilyStatsNZ 2023 Family Type: One Parent With Child(ren) \n", + "\n", + " fieldCategory vintage \n", + "dataCollectionID \n", + "FamilyStatsNZ 2023 Family Totals (Stats NZ) 2023 \n", + "FamilyStatsNZ 2023 Dwelling by Family Type (Stats NZ) 2023 \n", + "FamilyStatsNZ 2023 Dwelling by Family Type (Stats NZ) 2023 \n", + "FamilyStatsNZ 2023 Dwelling by Family Type (Stats NZ) 2023 \n", + "FamilyStatsNZ 2023 Dwelling by Family Type (Stats NZ) 2023 " + ] + }, + "execution_count": 77, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.loc['FamilyStatsNZ'].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Analysis variables for Data Collections\n", + "Once we know the data collection we would like to use, we can look at all the unique variables available in that data collection using its unique ID. Let's discover `analysisVariable`s for some of the data collections we looked at earlier." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Analysis variables for `KeyGlobalFacts` data collection__" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['KeyGlobalFacts.TOTPOP', 'KeyGlobalFacts.TOTHH',\n", + " 'KeyGlobalFacts.TOTFEMALES', 'KeyGlobalFacts.TOTMALES',\n", + " 'KeyGlobalFacts.AVGHHSZ'], dtype=object)" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.loc['KeyGlobalFacts']['analysisVariable'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Analysis variables for `EducationalAttainmentStatsNZ` data collection__" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['EducationalAttainmentStatsNZ.HighestQual_Total',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_TStated',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_No_quali',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_L1_Certi',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_L2_Certi',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_L3_Certi',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_L4_Certi',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_L5_Diplo',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_L6_Diplo',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_Bachelor',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_PostGrad',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_Masters',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_Doctorat',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_OSSecSch',\n", + " 'EducationalAttainmentStatsNZ.HighestQual_NEI',\n", + " 'EducationalAttainmentStatsNZ.PostIndicator_No',\n", + " 'EducationalAttainmentStatsNZ.PostIndicator_NZ',\n", + " 'EducationalAttainmentStatsNZ.PostIndicator_Overseas',\n", + " 'EducationalAttainmentStatsNZ.PostIndicator_NEI',\n", + " 'EducationalAttainmentStatsNZ.PostIndicator_Total',\n", + " 'EducationalAttainmentStatsNZ.PostIndicator_Tstated',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_No_quali',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_L1_Certi',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_L2_Certi',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_L3L4_Certi',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_Overseas',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_NEI',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_Total',\n", + " 'EducationalAttainmentStatsNZ.HighSecondQual_TStated',\n", + " 'EducationalAttainmentStatsNZ.PostQual_Total',\n", + " 'EducationalAttainmentStatsNZ.PostQual_TStated',\n", + " 'EducationalAttainmentStatsNZ.PostQual_No_quali',\n", + " 'EducationalAttainmentStatsNZ.PostQual_L1_Certi',\n", + " 'EducationalAttainmentStatsNZ.PostQual_L2_Certi',\n", + " 'EducationalAttainmentStatsNZ.PostQual_L3_Certi',\n", + " 'EducationalAttainmentStatsNZ.PostQual_L4_Certi',\n", + " 'EducationalAttainmentStatsNZ.PostQual_L5_Diplo',\n", + " 'EducationalAttainmentStatsNZ.PostQual_L6_Diplo',\n", + " 'EducationalAttainmentStatsNZ.PostQual_Bachelor',\n", + " 'EducationalAttainmentStatsNZ.PostQual_PostGrad',\n", + " 'EducationalAttainmentStatsNZ.PostQual_Masters',\n", + " 'EducationalAttainmentStatsNZ.PostQual_Doctorat',\n", + " 'EducationalAttainmentStatsNZ.PostQual_NotGiven',\n", + " 'EducationalAttainmentStatsNZ.PostQual_NEI'], dtype=object)" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.loc['EducationalAttainmentStatsNZ']['analysisVariable'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Analysis variables for `FamilyStatsNZ` data collection__" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['FamilyStatsNZ.FamilyCount_Total', 'FamilyStatsNZ.FamType_Total',\n", + " 'FamilyStatsNZ.FamType_CoupNoChildren',\n", + " 'FamilyStatsNZ.FamType_CoupWithChildren',\n", + " 'FamilyStatsNZ.FamType_OneParent', 'FamilyStatsNZ.NumberFam_Total',\n", + " 'FamilyStatsNZ.NumberFam_Two', 'FamilyStatsNZ.NumberFam_Three',\n", + " 'FamilyStatsNZ.NumberFam_Four', 'FamilyStatsNZ.NumberFam_Five',\n", + " 'FamilyStatsNZ.NumberFam_Six', 'FamilyStatsNZ.NumberFam_SevenMore',\n", + " 'FamilyStatsNZ.NumberFam_Average', 'FamilyStatsNZ.FamIncome_Total',\n", + " 'FamilyStatsNZ.FamIncome_20kOrLess',\n", + " 'FamilyStatsNZ.FamIncome_20kto30k',\n", + " 'FamilyStatsNZ.FamIncome_30kto50k',\n", + " 'FamilyStatsNZ.FamIncome_50kto70k',\n", + " 'FamilyStatsNZ.FamIncome_70kto100k',\n", + " 'FamilyStatsNZ.FamIncome_100kto150k',\n", + " 'FamilyStatsNZ.FamIncome_150kto200k',\n", + " 'FamilyStatsNZ.FamIncome_200korMore',\n", + " 'FamilyStatsNZ.FamIncome_Median',\n", + " 'FamilyStatsNZ.FamIncome_Tstated',\n", + " 'FamilyStatsNZ.FamIncome_NotStated',\n", + " 'FamilyStatsNZ.ExtFamilyCount_Total',\n", + " 'FamilyStatsNZ.ExtFamType_Total',\n", + " 'FamilyStatsNZ.ExtFamType_OneGen',\n", + " 'FamilyStatsNZ.ExtFamType_TwoGen',\n", + " 'FamilyStatsNZ.ExtFamType_ThreeMore',\n", + " 'FamilyStatsNZ.ExtFamType_NotClassi',\n", + " 'FamilyStatsNZ.ExtFamType_Tstated',\n", + " 'FamilyStatsNZ.ExtFamIncome_Total',\n", + " 'FamilyStatsNZ.ExtFamIncome_30kOrLess',\n", + " 'FamilyStatsNZ.ExtFamIncome_30kto50k',\n", + " 'FamilyStatsNZ.ExtFamIncome_50kto70k',\n", + " 'FamilyStatsNZ.ExtFamIncome_70kto100k',\n", + " 'FamilyStatsNZ.ExtFamIncome_100kto150k',\n", + " 'FamilyStatsNZ.ExtFamIncome_150kto200k',\n", + " 'FamilyStatsNZ.ExtFamIncome_200korMore',\n", + " 'FamilyStatsNZ.ExtFamIncome_Median',\n", + " 'FamilyStatsNZ.ExtFamIncome_Tstated',\n", + " 'FamilyStatsNZ.ExtFamIncome_NotStated'], dtype=object)" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "nz_df.loc['FamilyStatsNZ']['analysisVariable'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Perform Enrichment using Data Collections and Analysis Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Data Collections can be used to enrich various study areas. `data_collection`s and `analysis_variable`s can be passed in the `enrich()` method. Details about enriching study areas can be found in __Enriching Study Areas__ section. \n", + "\n", + "Let's look at a few similar examples of GeoEnrichment here." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Enrich using Data Collections" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Enrich with `Age` data collection__\n", + "\n", + "Here we see an address being enriched by data from `Age` data collection." + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "# Enriching single address as single line imput\n", + "age_coll = enrich(study_areas=[\"380 New York St Redlands CA 92373\"], \n", + " data_collections=['Age'])" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidence...fem45fem50fem55fem60fem65fem70fem75fem80fem85SHAPE
0US-117.19483534.057242RingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.576...381.0375.0323.0341.0281.0255.0190.0132.0116.0{\"rings\": [[[-117.194835113918, 34.07175043587...
\n", + "

1 rows × 48 columns

\n", + "
" + ], + "text/plain": [ + " source_country x y area_type buffer_units \\\n", + "0 US -117.194835 34.057242 RingBuffer esriMiles \n", + "\n", + " buffer_units_alias buffer_radii \\\n", + "0 Miles 1.0 \n", + "\n", + " aggregation_method \\\n", + "0 BlockApportionment:US.BlockGroups;PointsLayer:... \n", + "\n", + " population_to_polygon_size_rating apportionment_confidence ... fem45 \\\n", + "0 2.191 2.576 ... 381.0 \n", + "\n", + " fem50 fem55 fem60 fem65 fem70 fem75 fem80 fem85 \\\n", + "0 375.0 323.0 341.0 281.0 255.0 190.0 132.0 116.0 \n", + "\n", + " SHAPE \n", + "0 {\"rings\": [[[-117.194835113918, 34.07175043587... \n", + "\n", + "[1 rows x 48 columns]" + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "age_coll" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['source_country', 'x', 'y', 'area_type', 'buffer_units',\n", + " 'buffer_units_alias', 'buffer_radii', 'aggregation_method',\n", + " 'population_to_polygon_size_rating', 'apportionment_confidence',\n", + " 'has_data', 'male0', 'male5', 'male10', 'male15', 'male20', 'male25',\n", + " 'male30', 'male35', 'male40', 'male45', 'male50', 'male55', 'male60',\n", + " 'male65', 'male70', 'male75', 'male80', 'male85', 'fem0', 'fem5',\n", + " 'fem10', 'fem15', 'fem20', 'fem25', 'fem30', 'fem35', 'fem40', 'fem45',\n", + " 'fem50', 'fem55', 'fem60', 'fem65', 'fem70', 'fem75', 'fem80', 'fem85',\n", + " 'SHAPE'],\n", + " dtype='object')" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "age_coll.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When a data collection is specified without specific analysis variables, all variables under the data collection are used for enrichment as can be seen above." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "__Enrich with `Health` data collection__\n", + "\n", + "Here we see a zip code being enriched by data from Health data collection." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "redlands = usa.subgeographies.states['California'].zip5['92373']" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "redlands_df = enrich(study_areas=[redlands], data_collections=['Health'] )" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
std_geography_levelstd_geography_namestd_geography_idsource_countryaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datarel65_hi2_ocacscivnins...pop85_cypop18up_cypop21up_cymedage_cyhhu18_c10medhinc_cys27_buss27_saless27_empSHAPE
0US.ZIP5Redlands92373USQuery:US.ZIP52.1912.57611.032904.0...1409.028175.027097.041.83805.0105863.0245.0418153000.05296.0{\"rings\": [[[-117.12524300001411, 34.027986999...
\n", + "

1 rows × 431 columns

\n", + "
" + ], + "text/plain": [ + " std_geography_level std_geography_name std_geography_id source_country \\\n", + "0 US.ZIP5 Redlands 92373 US \n", + "\n", + " aggregation_method population_to_polygon_size_rating \\\n", + "0 Query:US.ZIP5 2.191 \n", + "\n", + " apportionment_confidence has_data rel65_hi2_oc acscivnins ... \\\n", + "0 2.576 1 1.0 32904.0 ... \n", + "\n", + " pop85_cy pop18up_cy pop21up_cy medage_cy hhu18_c10 medhinc_cy \\\n", + "0 1409.0 28175.0 27097.0 41.8 3805.0 105863.0 \n", + "\n", + " s27_bus s27_sales s27_emp \\\n", + "0 245.0 418153000.0 5296.0 \n", + "\n", + " SHAPE \n", + "0 {\"rings\": [[[-117.12524300001411, 34.027986999... \n", + "\n", + "[1 rows x 431 columns]" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "redlands_df" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['std_geography_level', 'std_geography_name', 'std_geography_id',\n", + " 'source_country', 'aggregation_method',\n", + " 'population_to_polygon_size_rating', 'apportionment_confidence',\n", + " 'has_data', 'rel65_hi2_oc', 'acscivnins',\n", + " ...\n", + " 'pop85_cy', 'pop18up_cy', 'pop21up_cy', 'medage_cy', 'hhu18_c10',\n", + " 'medhinc_cy', 's27_bus', 's27_sales', 's27_emp', 'SHAPE'],\n", + " dtype='object', length=431)" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "redlands_df.columns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Enrich using Analysis Variables\n", + "\n", + "Data can be enriched by specifying specific analysis variables of a data collection with which we want to enrich our data. In this example, we will look at `analysis_variables` for Age `data_collection` and then use specific analysis variables to `enrich()` a study area." + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['Age.MALE0', 'Age.MALE5', 'Age.MALE10', 'Age.MALE15', 'Age.MALE20',\n", + " 'Age.MALE25', 'Age.MALE30', 'Age.MALE35', 'Age.MALE40',\n", + " 'Age.MALE45', 'Age.MALE50', 'Age.MALE55', 'Age.MALE60',\n", + " 'Age.MALE65', 'Age.MALE70', 'Age.MALE75', 'Age.MALE80',\n", + " 'Age.MALE85', 'Age.FEM0', 'Age.FEM5', 'Age.FEM10', 'Age.FEM15',\n", + " 'Age.FEM20', 'Age.FEM25', 'Age.FEM30', 'Age.FEM35', 'Age.FEM40',\n", + " 'Age.FEM45', 'Age.FEM50', 'Age.FEM55', 'Age.FEM60', 'Age.FEM65',\n", + " 'Age.FEM70', 'Age.FEM75', 'Age.FEM80', 'Age.FEM85'], dtype=object)" + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Unique analysis variables for Age data collection\n", + "usa = Country.get('US')\n", + "usa.data_collections.loc['Age']['analysisVariable'].unique()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, we will enrich our study area with `Age.FEM45, Age.FEM55, Age.FEM65` variables" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
source_countryxyarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datafem45fem55fem65SHAPE
0US-117.19483534.057242RingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.5761381.0323.0281.0{\"rings\": [[[-117.194835113918, 34.07175043587...
\n", + "
" + ], + "text/plain": [ + " source_country x y area_type buffer_units \\\n", + "0 US -117.194835 34.057242 RingBuffer esriMiles \n", + "\n", + " buffer_units_alias buffer_radii \\\n", + "0 Miles 1.0 \n", + "\n", + " aggregation_method \\\n", + "0 BlockApportionment:US.BlockGroups;PointsLayer:... \n", + "\n", + " population_to_polygon_size_rating apportionment_confidence has_data \\\n", + "0 2.191 2.576 1 \n", + "\n", + " fem45 fem55 fem65 SHAPE \n", + "0 381.0 323.0 281.0 {\"rings\": [[[-117.194835113918, 34.07175043587... " + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "enrich(study_areas=[\"380 New York St Redlands CA 92373\"], \n", + " analysis_variables=[\"Age.FEM45\",\"Age.FEM55\",\"Age.FEM65\"])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Enriching Spatially Enabled Dataframes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One of the most common use case for GeoEnrichment is enriching existing data in feature layers. As a user, you may need to analyze and enrich your data that already exists in feature layers. Spatially Enabled DataFrame (SeDF) helps us bring the data from layer into a dataframe which can then be GeoEnriched. \n", + "\n", + "Let's look at an example using an existing layer of Covid-19 dataset. This feature layer includes latest Covid-19 Cases, Recovered and Deaths data for U.S. at the county level." + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Get the layer\n", + "gis = GIS(set_active=False)\n", + "covid_item = gis.content.get('628578697fb24d8ea4c32fa0c5ae1843')\n", + "print(covid_item)\n", + "covid_layer = covid_item.layers[0]\n", + "covid_layer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can query the layer as a dataframe and then use the dataframe for enrichment." + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(3272, 19)" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "covid_df = covid_layer.query(as_df=True)\n", + "covid_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
OBJECTIDProvince_StateCountry_RegionLast_UpdateLatLong_ConfirmedRecoveredDeathsActiveAdmin2FIPSCombined_KeyIncident_RatePeople_TestedPeople_HospitalizedUIDISO3SHAPE
01AlabamaUS2023-03-10 13:21:0232.539527-86.64408219790<NA>232<NA>Autauga01001Autauga, Alabama, US35422.14824<NA><NA>84001001USA{\"x\": -86.64408226999996, \"y\": 32.539527450000...
12AlabamaUS2023-03-10 13:21:0230.72775-87.72207169860<NA>727<NA>Baldwin01003Baldwin, Alabama, US31294.516068<NA><NA>84001003USA{\"x\": -87.72207057999998, \"y\": 30.727749910000...
23AlabamaUS2023-03-10 13:21:0231.868263-85.3871297485<NA>103<NA>Barbour01005Barbour, Alabama, US30320.82962<NA><NA>84001005USA{\"x\": -85.38712859999998, \"y\": 31.868263000000...
34AlabamaUS2023-03-10 13:21:0232.996421-87.1251158091<NA>109<NA>Bibb01007Bibb, Alabama, US36130.21345<NA><NA>84001007USA{\"x\": -87.12511459999996, \"y\": 32.996420640000...
45AlabamaUS2023-03-10 13:21:0233.982109-86.56790618704<NA>261<NA>Blount01009Blount, Alabama, US32345.311797<NA><NA>84001009USA{\"x\": -86.56790592999994, \"y\": 33.982109180000...
\n", + "
" + ], + "text/plain": [ + " OBJECTID Province_State Country_Region Last_Update Lat \\\n", + "0 1 Alabama US 2023-03-10 13:21:02 32.539527 \n", + "1 2 Alabama US 2023-03-10 13:21:02 30.72775 \n", + "2 3 Alabama US 2023-03-10 13:21:02 31.868263 \n", + "3 4 Alabama US 2023-03-10 13:21:02 32.996421 \n", + "4 5 Alabama US 2023-03-10 13:21:02 33.982109 \n", + "\n", + " Long_ Confirmed Recovered Deaths Active Admin2 FIPS \\\n", + "0 -86.644082 19790 232 Autauga 01001 \n", + "1 -87.722071 69860 727 Baldwin 01003 \n", + "2 -85.387129 7485 103 Barbour 01005 \n", + "3 -87.125115 8091 109 Bibb 01007 \n", + "4 -86.567906 18704 261 Blount 01009 \n", + "\n", + " Combined_Key Incident_Rate People_Tested People_Hospitalized \\\n", + "0 Autauga, Alabama, US 35422.14824 \n", + "1 Baldwin, Alabama, US 31294.516068 \n", + "2 Barbour, Alabama, US 30320.82962 \n", + "3 Bibb, Alabama, US 36130.21345 \n", + "4 Blount, Alabama, US 32345.311797 \n", + "\n", + " UID ISO3 SHAPE \n", + "0 84001001 USA {\"x\": -86.64408226999996, \"y\": 32.539527450000... \n", + "1 84001003 USA {\"x\": -87.72207057999998, \"y\": 30.727749910000... \n", + "2 84001005 USA {\"x\": -85.38712859999998, \"y\": 31.868263000000... \n", + "3 84001007 USA {\"x\": -87.12511459999996, \"y\": 32.996420640000... \n", + "4 84001009 USA {\"x\": -86.56790592999994, \"y\": 33.982109180000... " + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "covid_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To showcase GeoEnrichment, we will create a subset of the original data and then `enrich()` the subset." + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(100, 19)" + ] + }, + "execution_count": 39, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Create subset\n", + "test_df = covid_df.iloc[:100].copy()\n", + "test_df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['point', None]" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check geometry\n", + "test_df.spatial.geometry_type" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A dataframe can be passed as a value to `study_areas` parameter of the `enrich()` method. Here we are enriching our dataframe with specific variables from `Age` data collection." + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [], + "source": [ + "# Enrich dataframe\n", + "new_df = enrich(study_areas=test_df.spatial, \n", + " analysis_variables=[\"Age.FEM45\",\"Age.FEM55\",\"Age.FEM65\"])" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
source_countryarea_typebuffer_unitsbuffer_units_aliasbuffer_radiiaggregation_methodpopulation_to_polygon_size_ratingapportionment_confidencehas_datafem45fem55fem65SHAPE
0USRingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57615.05.05.0{\"rings\": [[[-86.64408226999996, 32.5540396153...
1USRingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57610.00.00.0{\"rings\": [[[-87.72207057999998, 30.7422661988...
2USRingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57612.02.03.0{\"rings\": [[[-85.38712859999997, 31.8827767082...
3USRingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57600.00.00.0{\"rings\": [[[-87.12511459999996, 33.0109317454...
4USRingBufferesriMilesMiles1.0BlockApportionment:US.BlockGroups;PointsLayer:...2.1912.57617.08.05.0{\"rings\": [[[-86.56790592999992, 33.9966179736...
\n", + "
" + ], + "text/plain": [ + " source_country area_type buffer_units buffer_units_alias buffer_radii \\\n", + "0 US RingBuffer esriMiles Miles 1.0 \n", + "1 US RingBuffer esriMiles Miles 1.0 \n", + "2 US RingBuffer esriMiles Miles 1.0 \n", + "3 US RingBuffer esriMiles Miles 1.0 \n", + "4 US RingBuffer esriMiles Miles 1.0 \n", + "\n", + " aggregation_method \\\n", + "0 BlockApportionment:US.BlockGroups;PointsLayer:... \n", + "1 BlockApportionment:US.BlockGroups;PointsLayer:... \n", + "2 BlockApportionment:US.BlockGroups;PointsLayer:... \n", + "3 BlockApportionment:US.BlockGroups;PointsLayer:... \n", + "4 BlockApportionment:US.BlockGroups;PointsLayer:... \n", + "\n", + " population_to_polygon_size_rating apportionment_confidence has_data \\\n", + "0 2.191 2.576 1 \n", + "1 2.191 2.576 1 \n", + "2 2.191 2.576 1 \n", + "3 2.191 2.576 0 \n", + "4 2.191 2.576 1 \n", + "\n", + " fem45 fem55 fem65 SHAPE \n", + "0 5.0 5.0 5.0 {\"rings\": [[[-86.64408226999996, 32.5540396153... \n", + "1 0.0 0.0 0.0 {\"rings\": [[[-87.72207057999998, 30.7422661988... \n", + "2 2.0 2.0 3.0 {\"rings\": [[[-85.38712859999997, 31.8827767082... \n", + "3 0.0 0.0 0.0 {\"rings\": [[[-87.12511459999996, 33.0109317454... \n", + "4 7.0 8.0 5.0 {\"rings\": [[[-86.56790592999992, 33.9966179736... " + ] + }, + "execution_count": 43, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['source_country', 'area_type', 'buffer_units', 'buffer_units_alias',\n", + " 'buffer_radii', 'aggregation_method',\n", + " 'population_to_polygon_size_rating', 'apportionment_confidence',\n", + " 'has_data', 'fem45', 'fem55', 'fem65', 'SHAPE'],\n", + " dtype='object')" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "new_df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(97, 13)" + ] + }, + "execution_count": 45, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Check shape\n", + "new_df.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can see that enrichment resulted in 97 records and 13 columns. There are some areas in our dataframe for which enrichment information is not available. Hence, we have 97 records instead of 100." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Visualize on a Map\n", + "\n", + "Let's visualize the enriched dataframe on a map. We will use `FEM65` column to classify our data for plotting on the map." + ] + }, + { + "cell_type": "code", + "execution_count": 74, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "" + ], + "text/plain": [ + "" + ] + }, + "execution_count": 74, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "covid_map = gis.map('Alabama, USA')\n", + "covid_map" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 68, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Plot on a map\n", + "new_df.spatial.plot(covid_map)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "covid_map.basemap.basemap = 'arcgis-light-gray'" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Conclusion" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this part of the `arcgis.geoenrichment` module guide series, you saw how `data_collections` property of a `Country` object lists its available `data_collection`s and `analysis_variable`s. You explored different data collections, their analysis variables and then enriched study areas using the same. Towards the end, you experienced how spatially enabled dataframes can be enriched.\n", + "\n", + "In the subsequent pages, you will learn about Generating Reports and Standard Geography Queries." + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.11" + }, + "livereveal": { + "scroll": true + }, + "toc": { + "base_numbering": 1, + "nav_menu": {}, + "number_sections": true, + "sideBar": true, + "skip_h1_title": true, + "title_cell": "Table of Contents", + "title_sidebar": "Contents", + "toc_cell": true, + "toc_position": { + "height": "calc(100% - 180px)", + "left": "10px", + "top": "150px", + "width": "274px" + }, + "toc_section_display": true, + "toc_window_display": true + }, + "vscode": { + "interpreter": { + "hash": "07c13af76457a6f4e5d5b34d0e1bc42b2e017343b05b916c8871f483eed35ce6" + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}