Skip to content

GDSC-FSC/Java-to-cloud-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Java-to-Cloud BigQuery Quickstart Demo

Effortlessly provision Google BigQuery datasets from your Java applications.

Build Status License Version Language Java Version


πŸ“ Table of Contents


🌟 Overview

What is this Application?

The Java-to-Cloud-demo is a concise, educational project designed to showcase the fundamental process of integrating Java applications with Google Cloud's BigQuery service. Specifically, it provides a quickstart example (QuickstartSample.java) that demonstrates how to programmatically create a new dataset within Google BigQuery using the official Google Cloud Java Client Library. It serves as a foundational environment setup for learning how to interact with Google Cloud services from Java.

Why does it Matter?

Data warehousing and analytics are critical for modern applications. Google BigQuery offers a highly scalable, cost-effective, and fully managed enterprise data warehouse. This demonstration provides a direct, hands-on entry point for Java developers to:

  • Understand the basics of Google Cloud authentication.
  • Learn how to use the Google Cloud BigQuery client library in a Maven project.
  • Quickly set up and execute a Java application that interacts with a core GCP service. It solves the initial "how do I even start?" problem for BigQuery integration in Java.

Target Audience

This repository is ideal for:

  • Java Developers: Looking to integrate their applications with Google Cloud Platform.
  • Students & Learners: Especially those participating in GDSC (Google Developer Student Clubs) tutorials on Java and Google Cloud.
  • Cloud Enthusiasts: Interested in quick, practical examples of cloud service interaction.
  • Data Engineers: Seeking a basic boilerplate for BigQuery resource provisioning.

⬆️ Back to Top


✨ Feature Highlights

This demonstration focuses on a single, crucial feature to ensure clarity and ease of understanding.

  • βœ… BigQuery Dataset Creation:
    • Programmatically creates a new, empty BigQuery dataset in your specified Google Cloud project.
    • Leverages the robust Google Cloud BigQuery client library for reliable interaction.
    • Provides instant feedback on the creation status.
  • πŸ“¦ Maven Project Structure:
    • Pre-configured pom.xml to manage dependencies and build processes.
    • Includes necessary Google Cloud BigQuery client library as a dependency.
    • Ready-to-build and run with standard Maven commands.
  • πŸ’‘ Clear Example Code:
    • The QuickstartSample.java is well-commented and straightforward, making it easy to follow the BigQuery interaction logic.
    • Illustrates best practices for BigQuery client instantiation and usage.
  • πŸš€ Quick Deployment & Execution:
    • Minimal setup required to get the sample running against your GCP project.
    • Designed for quick verification of BigQuery connectivity and functionality.

⬆️ Back to Top


πŸ—οΈ Architecture & Design

The Java-to-Cloud BigQuery Quickstart Demo follows a straightforward client-server architecture, where your local Java application acts as a client interacting directly with the Google Cloud BigQuery service.

High-Level Component Diagram

graph TD
    subgraph Local Development Environment
        A[Java Quickstart Application] --> B(Google Cloud BigQuery Client Library)
    end

    subgraph Google Cloud Platform
        B --> C{BigQuery API}
        C --> D[BigQuery Dataset (Storage & Processing)]
    end

    style A fill:#f9f,stroke:#333,stroke-width:2px
    style B fill:#bbf,stroke:#333,stroke-width:2px
    style C fill:#ccf,stroke:#333,stroke-width:2px
    style D fill:#ddf,stroke:#333,stroke-width:2px
Loading

Component Responsibilities

  • Java Quickstart Application (QuickstartSample.java):
    • The entry point of the demonstration.
    • Initializes the BigQuery client.
    • Defines the target dataset name.
    • Executes the command to create the dataset.
    • Prints the result of the operation.
  • Google Cloud BigQuery Client Library:
    • Provides idiomatic Java interfaces for interacting with BigQuery.
    • Handles authentication, request serialization, and response deserialization.
    • Abstracts away the complexities of direct HTTP API calls.
  • BigQuery API:
    • The official RESTful API endpoint for Google BigQuery.
    • Receives requests from the client library.
    • Performs the actual dataset creation operation within Google Cloud.
    • Returns the status and details of the operation.
  • BigQuery Dataset:
    • The logical container within Google BigQuery where tables and views are stored.
    • The ultimate resource being created by this application.

Technology Stack

  • Programming Language: Java 8+
  • Build Tool: Apache Maven 3.6+
  • Cloud Service: Google Cloud BigQuery
  • Client Library: com.google.cloud:google-cloud-bigquery
  • Parent POM: com.google.cloud.samples:shared-configuration (for common style and testing)

⬆️ Back to Top


βš™οΈ Getting Started

Follow these instructions to get a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Before you begin, ensure you have the following installed and configured:

  • Java Development Kit (JDK): Version 8 or higher.
    • Verify with: java -version
  • Apache Maven: Version 3.6 or higher.
    • Verify with: mvn -v
  • Google Cloud Platform Account: An active GCP account with billing enabled.
  • Google Cloud Project: A GCP project where you have permission to create BigQuery datasets.
  • Google Cloud SDK (gcloud CLI): Installed and authenticated.

Installation Steps

  1. Clone the Repository: Navigate to your desired directory and clone the project:

    git clone https://github.com/GDSC-FSC/Java-to-cloud-demo.git
    cd Java-to-cloud-demo
  2. Build the Project with Maven: Compile the Java source code and download all necessary dependencies:

    mvn clean install
    πŸ’‘ **What does `mvn clean install` do?** `clean` removes the target directory (compiled classes, JARs, etc.) from previous builds. `install` compiles the source code, runs tests (if any), and packages the compiled code into a JAR file, placing it in your local Maven repository for other projects to use. For this quickstart, it primarily compiles the application.

⬆️ Back to Top

Configuration (Google Cloud Authentication)

The application uses Google Cloud's Application Default Credentials (ADC) to authenticate with GCP services.

  1. Authenticate your gcloud CLI: If you haven't already, authenticate your gcloud CLI. This creates credentials that Java (and other client libraries) can automatically pick up.

    gcloud auth application-default login

    This command will open a browser window for you to log in with your Google account. Ensure this account has sufficient permissions to create BigQuery datasets in your target GCP project.

  2. Set Google Cloud Project (Optional but Recommended): While not strictly required for ADC to find credentials, explicitly setting your project ID for gcloud is good practice.

    gcloud config set project YOUR_GCP_PROJECT_ID

    Replace YOUR_GCP_PROJECT_ID with the actual ID of your Google Cloud Project.

    ⚠️ **Important Note on Permissions** The authenticated user/service account needs the `BigQuery Data Editor` or `BigQuery Admin` role (or custom roles with `bigquery.datasets.create` permission) in the target GCP project to successfully create a dataset.

Running the Application

After installation and configuration, you can run the QuickstartSample directly using Maven:

mvn compile exec:java -Dexec.mainClass="com.example.bigquery.QuickstartSample"
πŸ” **Expected Output** If successful, you will see output similar to this: ``` [INFO] Scanning for projects... [INFO] [INFO] -----------------< com.google.cloud:google-cloud-bigquery-samples >----------------- [INFO] Building Google BigQuery Samples Parent 0.0.1-SNAPSHOT [INFO] --------------------------------[ pom ]--------------------------------- ... [INFO] --- exec-maven-plugin:3.0.0:java (default-cli) @ google-cloud-bigquery-samples --- Dataset my_new_dataset created. ```
  • Verifying Dataset Creation: You can verify the dataset creation by visiting the Google Cloud Console BigQuery page for your project, or by running a gcloud command:
    gcloud bigquery datasets list --project YOUR_GCP_PROJECT_ID
    You should see my_new_dataset listed.

⬆️ Back to Top


πŸ’‘ Usage & Workflows

This section guides you through the typical workflow of using this quickstart demo and explains the underlying code.

Quickstart Scenario

The primary workflow is designed to quickly demonstrate BigQuery dataset creation.

  1. Authenticate: Ensure your local environment is authenticated to Google Cloud using gcloud auth application-default login.
  2. Clone & Build: Get the project code and compile it with mvn clean install.
  3. Execute: Run the QuickstartSample.java program using mvn exec:java.
  4. Verify: Confirm the dataset my_new_dataset has been created in your Google Cloud Project via the GCP Console or gcloud CLI.
sequenceDiagram
    participant User
    participant LocalMachine as Local Java App
    participant GcpAuth as Google Cloud Auth
    participant GcpBigQuery as Google BigQuery Service

    User->>LocalMachine: Clone repository
    User->>LocalMachine: Run `mvn clean install`
    User->>GcpAuth: `gcloud auth application-default login` (one-time setup)
    GcpAuth-->>User: Provides local credentials (e.g., in `~/.config/gcloud/`)
    User->>LocalMachine: Run `mvn exec:java ...`
    LocalMachine->>LocalMachine: Initialize BigQuery Client
    LocalMachine->>GcpBigQuery: Authenticate using ADC
    LocalMachine->>GcpBigQuery: Request: Create Dataset "my_new_dataset"
    GcpBigQuery-->>LocalMachine: Response: Dataset created successfully
    LocalMachine->>User: Print "Dataset my_new_dataset created."
    User->>GcpBigQuery: (Optional) Verify in GCP Console / `gcloud bigquery datasets list`
Loading

CLI Commands

Here are the essential CLI commands you'll use:

  • To authenticate your local environment for GCP access:

    gcloud auth application-default login

    Explanation: This command sets up the necessary credentials on your local machine, allowing Google Cloud client libraries (like the one used in QuickstartSample.java) to automatically find and use them for authentication.

  • To compile and package the Java application:

    mvn clean install

    Explanation: mvn clean removes any compiled .class files and target directories from previous builds. mvn install compiles the current project's code, runs any tests, and packages it into a .jar file, placing it in your local Maven repository.

  • To run the QuickstartSample Java class:

    mvn compile exec:java -Dexec.mainClass="com.example.bigquery.QuickstartSample"

    Explanation: mvn compile ensures all code is compiled. exec:java is a goal from the exec-maven-plugin that allows you to execute a Java application's main method from Maven. -Dexec.mainClass specifies which class contains the main method to run.

  • To list BigQuery datasets in your project (for verification):

    gcloud bigquery datasets list --project YOUR_GCP_PROJECT_ID

    Explanation: This gcloud command directly queries the BigQuery service to list all datasets within YOUR_GCP_PROJECT_ID. Useful for confirming the Java application's success.

Code Snippet Walkthrough

Let's break down the core logic in src/main/java/com/example/bigquery/QuickstartSample.java:

/*
 * Copyright 2016 Google Inc.
 * ... (License header) ...
 */

package com.example.bigquery;

// Imports the Google Cloud client library
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
import com.google.cloud.bigquery.Dataset;
import com.google.cloud.bigquery.DatasetInfo;

public class QuickstartSample {
  public static void main(String... args) throws Exception {
    // 1. Instantiate a BigQuery client.
    // The client library automatically looks for credentials in the environment,
    // such as the GOOGLE_APPLICATION_CREDENTIALS environment variable or those
    // set by `gcloud auth application-default login`.
    BigQuery bigquery = BigQueryOptions.getDefaultInstance().getService();

    // 2. Define the name for the new dataset
    String datasetName = "my_new_dataset"; // You can change this name!

    // 3. Prepare the dataset information
    // A DatasetInfo object holds metadata about the dataset,
    // like its ID, location, and access controls.
    DatasetInfo datasetInfo = DatasetInfo.newBuilder(datasetName).build();

    // 4. Create the dataset in BigQuery
    Dataset dataset = bigquery.create(datasetInfo);

    // 5. Print confirmation
    System.out.printf("Dataset %s created.%n", dataset.getDatasetId().getDataset());
  }
}

Key Points:

  • BigQueryOptions.getDefaultInstance().getService(): This is the standard way to initialize the BigQuery client. It automatically handles authentication using Application Default Credentials (ADC), looking for credentials in well-known locations (like GOOGLE_APPLICATION_CREDENTIALS env var or gcloud auth application-default login).
  • DatasetInfo.newBuilder(datasetName).build(): Constructs an object defining the properties of the new dataset. Here, we only set its name. You could add location, description, or default table expiration here.
  • bigquery.create(datasetInfo): This is the core API call that sends a request to Google BigQuery to create the dataset.

Common Use Cases

This quickstart forms the basis for many BigQuery-related tasks in Java applications:

  • Automated Data Warehouse Setup: As part of an automated deployment pipeline, use similar code to provision BigQuery datasets and tables needed for new microservices or analytical projects.
  • Data Ingestion Pipelines: Create a dataset as the first step before streaming data into BigQuery tables from other systems (e.g., Kafka, Pub/Sub).
  • Ad-hoc Resource Management: For developers or administrators to quickly create temporary datasets for testing or specific analytical tasks.
  • Educational Demonstrations: A simple, clear example for teaching BigQuery integration in Java.

⬆️ Back to Top


🚧 Limitations, Known Issues & Future Roadmap

Current Limitations

  • Single Functionality: This quickstart only creates a BigQuery dataset. It does not create tables, insert data, run queries, or delete resources.
  • Minimal Error Handling: The main method simply throws Exception, meaning production-ready error handling (e.g., specific try-catch blocks for BigQuery API exceptions) is not implemented.
  • Hardcoded Dataset Name: The dataset name (my_new_dataset) is hardcoded within the QuickstartSample.java file.
  • No Command-Line Arguments: The application does not accept configuration via command-line arguments.

Known Issues

  • Authentication Failures: The most common issue is improper Google Cloud authentication.
    • Symptom: com.google.api.gax.rpc.UnauthenticatedException or similar errors.
    • Solution: Double-check gcloud auth application-default login and ensure the authenticated account has the necessary permissions. Verify GOOGLE_APPLICATION_CREDENTIALS is not set incorrectly.
  • Maven Build Issues: Network problems can prevent Maven from downloading dependencies.
    • Solution: Check your internet connection. Try mvn clean install -U to force update dependencies.

Future Roadmap

We have several ideas for enhancing this demonstration to cover more advanced BigQuery functionalities and improve usability:

  • βœ… Parameterized Dataset Creation: Allow the dataset name and location to be passed as command-line arguments or environment variables.
  • πŸš€ Table Creation and Schema Definition: Extend the application to create tables within the dataset, including schema definition.
  • πŸ’‘ Data Insertion Examples: Add examples for inserting data into tables (e.g., from a CSV file, or using streaming inserts).
  • πŸ” Query Execution: Demonstrate how to run SQL queries against BigQuery and process the results.
  • πŸ—‘οΈ Resource Deletion: Include functionality to safely delete created datasets or tables.
  • πŸ“ˆ Advanced Error Handling: Implement robust try-catch blocks for specific BigQuery API exceptions and provide user-friendly error messages.
  • 🌐 Internationalization: Allow specifying BigQuery dataset location.
  • 🧩 Modularization: Break down functionality into smaller, reusable methods or classes.
  • πŸ§ͺ Unit and Integration Tests: Add basic tests to ensure the BigQuery interactions work as expected.

⬆️ Back to Top


🀝 Contributing & Development Guidelines

We welcome contributions to enhance this quickstart demo! Whether it's bug fixes, new features, or improved documentation, your input is valuable.

How to Contribute

  1. Fork the Repository: Start by forking the GDSC-FSC/Java-to-cloud-demo repository to your GitHub account.
  2. Clone Your Fork: Clone your forked repository to your local machine:
    git clone https://github.com/YOUR_USERNAME/Java-to-cloud-demo.git
    cd Java-to-cloud-demo
  3. Create a New Branch: Create a new branch for your feature or bug fix:
    git checkout -b feature/your-feature-name-or-issue-id
    (e.g., git checkout -b feature/add-table-creation or git checkout -b bugfix/auth-issue)
  4. Make Your Changes: Implement your changes, add tests if applicable, and ensure existing tests pass.
  5. Commit Your Changes: Commit your changes with a clear and concise commit message:
    git commit -m "feat: Add support for creating BigQuery tables"
    (We follow Conventional Commits where possible.)
  6. Push to Your Fork: Push your new branch to your forked repository:
    git push origin feature/your-feature-name-or-issue-id
  7. Create a Pull Request: Go to the original GDSC-FSC/Java-to-cloud-demo repository on GitHub and open a new Pull Request from your branch. Provide a detailed description of your changes.

Branching and Pull Request Guidelines

  • Branch Naming: Use descriptive branch names like feature/new-feature, bugfix/issue-description, or docs/update-readme.
  • Pull Request Description: Clearly describe the problem your PR solves and the solution you've implemented. Reference any related issues.
  • Review: All pull requests require at least one reviewer approval before merging.
  • Squash and Merge: We typically use squash and merge to keep the main branch history clean.

Code Style and Quality

  • Java Code Style: Adhere to standard Java conventions. For Google Cloud projects, this often means following Google Java Format.
    • You can format your code using Maven: mvn com.coveo:fmt-maven-plugin:format
  • Linting: Ensure your code passes any static analysis checks (e.g., Checkstyle, SpotBugs if configured in pom.xml).
  • Testing: If adding new functionality, please include relevant unit or integration tests. For this simple demo, basic functionality testing is sufficient.

Development Setup

To replicate the development environment:

  1. IDE: We recommend using an IDE like IntelliJ IDEA or Eclipse. Import the project as a Maven project.
  2. Dependencies: Maven will automatically download dependencies when you build the project.
  3. Local Testing: Ensure you have your GCP credentials configured locally (as described in Configuration) to run and test BigQuery interactions.

⬆️ Back to Top


πŸ“„ License, Credits & Contact

License Information

This project is licensed under the Apache License, Version 2.0. You may obtain a copy of the License at:

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Acknowledgements

  • Google Cloud Platform: For providing the robust BigQuery service and comprehensive Java client libraries.
  • Google Developer Student Clubs (GDSC-FSC): For the initiative to provide hands-on learning resources and this foundational repository.
  • Google BigQuery Samples: This project's pom.xml and QuickstartSample.java are based on or heavily inspired by official Google BigQuery Java samples, particularly from googleapis/java-bigquery.

Maintainer Contact

This project is primarily maintained by the GDSC-FSC community.

For questions, feedback, or further collaboration, please reach out via:

  • GitHub Issues: For bug reports or feature requests, please open an issue.
  • GDSC-FSC Community Channels: Refer to your local GDSC chapter's communication channels (e.g., Discord, Slack, mailing list).

⬆️ Back to Top


πŸ“š Appendix

Changelog

Version 0.0.1-SNAPSHOT (Initial Release)

  • Initial project setup with Maven.
  • Includes QuickstartSample.java for BigQuery dataset creation.
  • Basic README.md for getting started.

FAQ

Q: Can I change the dataset name? A: Yes, you can. Open `src/main/java/com/example/bigquery/QuickstartSample.java` and change the `datasetName` variable: ```java String datasetName = "my_custom_new_dataset"; // Change this line ``` Then recompile and run the application.
Q: What if I get an `UnauthenticatedException`? A: This means your Java application could not find valid Google Cloud credentials. 1. Ensure you have run `gcloud auth application-default login`. 2. Check that the Google account you logged in with has permissions to access BigQuery in your selected GCP project. 3. Make sure the `GOOGLE_APPLICATION_CREDENTIALS` environment variable is not pointing to an invalid or expired key. If you set it manually, try unsetting it to let ADC find the `gcloud` credentials.
Q: How do I specify a different Google Cloud Project? A: The BigQuery client automatically uses the project associated with your authenticated credentials. To target a specific project, ensure your `gcloud` is configured for it: ```bash gcloud config set project YOUR_SPECIFIC_PROJECT_ID ``` Alternatively, you can explicitly specify the project ID when initializing the BigQuery client in Java: ```java BigQuery bigquery = BigQueryOptions.newBuilder().setProjectId("YOUR_SPECIFIC_PROJECT_ID").build().getService(); ```
Q: Why is the Maven build slow? A: The first time you run `mvn install`, Maven needs to download all required dependencies from remote repositories, which can take some time depending on your internet connection. Subsequent builds should be much faster as dependencies are cached locally.

Troubleshooting Guide

Issue Symptom Common Solutions
UnauthenticatedException 401 Unauthorized or Invalid Credentials error. 1. Run gcloud auth application-default login again.
2. Verify your authenticated Google account has the necessary BigQuery permissions in the target GCP project.
3. Ensure GOOGLE_APPLICATION_CREDENTIALS environment variable is not misconfigured.
NotFoundException or Project Not Found Error indicating the project or resource could not be found. 1. Ensure gcloud config set project YOUR_GCP_PROJECT_ID has been run.
2. Verify the project ID is correct and active.
3. Check network connectivity to Google Cloud endpoints.
Maven Could not resolve dependencies Errors like Failed to read artifact descriptor or Missing artifact. 1. Check your internet connection.
2. Run mvn clean install -U to force update dependencies.
3. Ensure your ~/.m2/repository directory is not corrupted (sometimes deleting the problematic dependency folder helps).
BigQueryException with 403 Forbidden Error indicating insufficient permissions to create the dataset. 1. The authenticated account lacks the bigquery.datasets.create permission.
2. Assign the BigQuery Data Editor or BigQuery Admin role to the account in the target GCP project via the IAM & Admin console.
Dataset not visible in GCP Console Application reports success, but dataset doesn't appear in BigQuery console. 1. Ensure you are viewing the correct Google Cloud Project in the console.
2. Refresh the BigQuery page.
3. Use gcloud bigquery datasets list --project YOUR_GCP_PROJECT_ID to confirm programmatically. There might be a slight delay for console propagation.

API Reference Links

⬆️ Back to Top

About

No description or website provided.

Topics

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages