From 84825bee15337958d5942be895501e336c57e95a Mon Sep 17 00:00:00 2001 From: Muhamad Sazwan Bin Ismail Date: Thu, 6 Nov 2025 10:28:39 +0800 Subject: [PATCH 1/4] Add storage.cloud documentation and example scripts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added documentation and example scripts for Google Cloud Storage usage, including quickstart guides, merging data, and CORS configuration.````markdown name=README.md # storage.cloud — Google Cloud Storage (GCS) docs & quickstart storage.cloud is a compact documentation and example repo with copy-pastable commands and small scripts for common Google Cloud Storage tasks: - Authentication (gcloud ADC, service accounts, access tokens) - Browser access vs programmatic access (storage.cloud.google.com vs API) - Signed URLs and sharing - CORS configuration for browser clients - Practical patterns to merge many objects (CSV/text) in GCS - Load patterns for BigQuery and recommendations for large datasets Repository layout - index.html — simple landing page / site overview - docs/ - quickstart.md — commands & short how-tos - merge-data.md — Malay quick guide for combining files - signed-urls.md — signing approaches and examples - examples/ - merge_csv_gcs.py — small Python script to merge CSVs in a GCS prefix - cors.json — CORS example - LICENSE — suggested license (MIT) Quickstart highlights - Authenticate locally (ADC): gcloud auth application-default login - Get an access token for Authorization header: gcloud auth print-access-token - List buckets: gsutil ls gs:// - Copy to/from GCS: gsutil cp ./local-file gs://BUCKET/OBJECT gsutil cp gs://BUCKET/OBJECT ./local-file - Create a signed URL: gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT Security reminders - Use service accounts with least privilege in production. - Signed URLs grant access to anyone with the URL until expiry. - Avoid making sensitive objects public; prefer signed URLs or IAM roles. Contributing - Send PRs or issues with improvements. - Keep examples minimal and targeted. - Include tests or usage examples when adding scripts. If you want, I can: - Expand any doc into a full tutorial (step-by-step with screenshots). - Add a shell script to compose >32 objects safely. - Create a Dataflow/Beam starter pipeline for large-scale merges. ```` ```html name=index.html storage.cloud — Google Cloud Storage quickstart

storage.cloud

Quick reference and starter content for Google Cloud Storage (GCS)

Overview

storage.cloud collects short guides, command snippets, and small reusable scripts for common GCS workflows:

  • Authenticated access (gcloud, service accounts, access tokens)
  • Signed URLs for temporary public access
  • CORS configuration for browser-based apps
  • Merging objects (CSV/text) and loading to BigQuery

Helpful links

Get started

Open docs/quickstart.md for commands and brief examples you can run locally. For combining files, see docs/merge-data.md and examples/merge_csv_gcs.py.

``` ````markdown name=docs/quickstart.md # Quickstart — Google Cloud Storage (GCS) This page collects the most-used commands and short examples for getting started with GCS. Prerequisites - Install Google Cloud SDK (gcloud, gsutil) - For programmatic examples, install the relevant client libraries (Python: google-cloud-storage) Authentication - Local developer (Application Default Credentials - ADC): gcloud auth application-default login - Service account (recommended for server apps): 1. Create: gcloud iam service-accounts create my-sa --display-name="My SA" 2. Grant roles (example: objectViewer): gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/storage.objectViewer" 3. (Optional) Create key for local use: gcloud iam service-accounts keys create key.json \ --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com Access tokens - Print an access token for HTTP Authorization: gcloud auth print-access-token - Use as header: Authorization: Bearer Common gsutil commands - List buckets: gsutil ls gs:// - List objects in a bucket/prefix: gsutil ls gs://BUCKET/PREFIX/ - Download object: gsutil cp gs://BUCKET/OBJECT ./local-file - Upload file: gsutil cp ./local-file gs://BUCKET/OBJECT - Make object public (use sparingly): gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT Signed URLs - Short way to create an expiring URL using a service account key: gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT - Notes: - V4 signed URLs are supported; maximum expiry is 7 days. - Signed URLs allow access without a Google account. CORS (for browser clients) - Example file: cors.json (in repo) - Apply: gsutil cors set cors.json gs://BUCKET BigQuery ingestion - BigQuery accepts wildcards — you can load many CSVs without pre-merging: bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv Troubleshooting - Permission denied: check IAM roles (roles/storage.objectViewer or a custom role). - Invalid credentials: re-run `gcloud auth application-default login` or refresh service account keys. - CORS errors: ensure bucket CORS includes your domain and required methods/headers. Security recommendations - Use service accounts with least privilege. - Prefer IAM + uniform bucket-level access over ACLs. - Do not embed long-lived keys in client-side code; use signed URLs for browser access. Further reading - GCS docs: https://cloud.google.com/storage/docs - Signed URLs: https://cloud.google.com/storage/docs/access-control/signed-urls - gsutil reference: https://cloud.google.com/storage/docs/gsutil ```` ````markdown name=docs/merge-data.md # Gabungkan Semua Data (Combine all data) Dokumen ringkas (Bahasa Melayu) untuk menggabungkan fail yang tersimpan di Google Cloud Storage. Sebelum mula - Pastikan anda mempunyai akses ke bucket (roles/storage.objectViewer atau storage.objectAdmin untuk penulisan). - Jika dataset besar (GB/TB), pertimbangkan Dataflow/Dataproc atau import terus ke BigQuery. Pilihan umum 1) Quick merge (fail kecil/sederhana) - Jika saiz kecil supaya muat dalam memori: gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv - Risiko: memori dan rangkaian. Pastikan hanya digunakan untuk set kecil. 2) gsutil compose (gabungkan objek di GCS tanpa muat turun) - Gabungkan hingga 32 objek per operasi: gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv - Untuk >32 objek, buat compose berperingkat (tree compose). - Nota: compose menggabungkan byte; pastikan setiap objek mempunyai newline yang sesuai dan tidak menggabungkan header berganda. 3) Load terus ke BigQuery (disarankan untuk analisis) - BigQuery boleh baca wildcard CSV: bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv - Kelebihan: skala, tiada pra-gabungan, schema handling. 4) Skrip tersuai (contoh Python) - Kelebihan: kawalan penuh (buang header berganda, normalisasi). - Lihat `examples/merge_csv_gcs.py` untuk contoh. Contoh strategi untuk >32 objek menggunakan gsutil compose (pattern) - Bahagikan senarai objek kepada kumpulan 32, compose setiap kumpulan ke objek sementara, kemudian compose hasil sementara (ulang sehingga satu objek terakhir). - Atau gunakan pipeline Dataflow untuk penstriman dan penulisan semula. Amalan terbaik - Untuk output akhir yang akan dianalitik: lebih baik tulis ke BigQuery atau Parquet (kolumnar) untuk kos dan prestasi. - Pastikan logging dan versi objek dihidupkan jika data penting (object versioning). - Gunakan encryption keys (CMEK) jika diperlukan. Contoh ringkas penggunaan skrip - Jika anda mahu, berikan: - Bucket name (contoh: gs://my-bucket/data/) - Prefix dan jenis fail (.csv) - Saiz anggaran (GB/TB) Saya akan hasilkan skrip automatik yang sesuai (bash atau Python) untuk anda. ```` ````markdown name=docs/signed-urls.md # Signed URLs — Quick reference Signed URLs grant temporary access to an object without requiring the requester to have a Google account. Key points - Signed URLs include authentication in query parameters. - Anyone with the URL can access the object until expiry. - Maximum expiry for V4 signed URLs is 7 days. Create with gsutil (using service account key file) gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT Create programmatically (Python example sketch) - Use google-auth library to sign and generate a V4 URL, or use google-cloud-storage's blob.generate_signed_url method. - Prefer service account credentials with least privilege. Security tips - Monitor signed-URL usage where possible (Cloud Storage logs). - Shorter expiry reduces risk. - Do not embed long-lived private keys in public repositories. When to use - Temporary downloads for users without Google accounts. - Browser uploads (PUT) when combined with appropriate CORS settings. ```` ```python name=examples/merge_csv_gcs.py #!/usr/bin/env python3 """ examples/merge_csv_gcs.py Merge CSV files in a GCS prefix into one CSV while keeping only the first header. Requirements: pip install google-cloud-storage Usage: export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" python3 examples/merge_csv_gcs.py my-bucket data/prefix/ output/combined.csv """ import sys import csv from io import StringIO from google.cloud import storage def merge_csvs(bucket_name, prefix, output_blob_name): client = storage.Client() bucket = client.bucket(bucket_name) blobs = sorted([b for b in bucket.list_blobs(prefix=prefix) if b.name.endswith('.csv')]) if not blobs: print("No CSV files found with prefix:", prefix) return out_buf = StringIO() writer = None first = True for blob in blobs: print("Reading:", blob.name) # Use streaming download if files are large: download_as_text() may load entire object data = blob.download_as_text() reader = csv.reader(StringIO(data)) header = next(reader, None) if first: writer = csv.writer(out_buf) if header: writer.writerow(header) for row in reader: writer.writerow(row) first = False else: # skip header for subsequent files for row in reader: writer.writerow(row) out_blob = bucket.blob(output_blob_name) out_blob.upload_from_string(out_buf.getvalue(), content_type='text/csv') print(f'Uploaded gs://{bucket_name}/{output_blob_name}') if __name__ == "__main__": if len(sys.argv) != 4: print("Usage: merge_csv_gcs.py BUCKET PREFIX OUTPUT_BLOB") print("Example: merge_csv_gcs.py my-bucket data/ output/combined.csv") sys.exit(1) merge_csvs(sys.argv[1], sys.argv[2], sys.argv[3]) ``` ```json name=cors.json [ { "origin": ["https://example.com"], "method": ["GET", "HEAD", "PUT", "POST"], "responseHeader": ["Content-Type", "x-goog-meta-custom"], "maxAgeSeconds": 3600 } ] ``` ```text name=LICENSE MIT License Copyright (c) 2025 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction... (You can replace with your full MIT text or preferred license.) ``` Would you like me to: - generate a shell helper to compose >32 objects safely (tree compose), or - produce a Dataflow/Beam starter for large merges, or - create a translated Malay full-site version? Tell me which follow-up and I will produce the exact script or pipeline. --- storage.cloud | 247 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 247 insertions(+) create mode 100644 storage.cloud diff --git a/storage.cloud b/storage.cloud new file mode 100644 index 0000000..9b214b1 --- /dev/null +++ b/storage.cloud @@ -0,0 +1,247 @@ +````markdown name=README.md +# storage.cloud — Docs & Quickstart + +storage.cloud is a quick-reference resource and starter content for working with Google Cloud Storage (GCS). It includes concise how-tos, examples, and small reusable scripts for common tasks such as authenticated access, signed URLs, CORS configuration, and merging data from many objects. + +Primary goals +- Provide clear, copy-pastable commands for common workflows. +- Give simple code examples (Python) for object operations and CSV merging. +- Offer troubleshooting tips and security reminders. + +Contents +- index.html — site landing page (simple overview). +- docs/quickstart.md — authentication, common gsutil/gcloud/bq commands, signed URLs. +- docs/merge-data.md — guide in Malay for combining files stored in GCS. +- examples/merge_csv_gcs.py — Python example to merge CSVs in a GCS prefix while handling headers. +- cors.json — example CORS configuration. + +Contributing +- Send issues or PRs with improvements. +- Keep examples minimal and focused. Prefer service accounts and least privilege for production. + +License +- Add your preferred license file (e.g., MIT) to the repo root. +```` + +```html name=index.html + + + + + storage.cloud — Google Cloud Storage quickstart + + + + +
+

storage.cloud

+

Quick reference and starter content for Google Cloud Storage (GCS)

+ +
+ +
+

Overview

+

This site contains short guides and example code for:

+
    +
  • Authenticated access (gcloud, service accounts, access tokens)
  • +
  • Signed URLs for temporary, non-account-based access
  • +
  • CORS configuration for browser clients
  • +
  • Merging files stored in GCS (CSV, text, etc.)
  • +
  • Loading directly into BigQuery for analytics
  • +
+
+ +
+

Important links

+ +
+ +
+

Get started

+

Open docs/quickstart.md for a minimal set of commands and examples you can run locally.

+
+ + +``` + +````markdown name=docs/quickstart.md +# Quickstart — Google Cloud Storage (GCS) + +This page collects the most-used commands and short examples for getting started with GCS. + +Authentication +- Local user ADC (Application Default Credentials) — for development: + gcloud auth application-default login + +- Service account (recommended for servers/apps): + 1. Create service account in Cloud Console or: + gcloud iam service-accounts create my-sa --display-name="My SA" + 2. Grant needed roles (example: objectViewer): + gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" --role="roles/storage.objectViewer" + 3. Download key (if needed): + gcloud iam service-accounts keys create key.json --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com + +Get an access token (for Authorization header): + gcloud auth print-access-token + +Common gsutil commands +- List buckets: + gsutil ls gs:// + +- List objects in a prefix: + gsutil ls gs://BUCKET/PREFIX/ + +- Download an object: + gsutil cp gs://BUCKET/OBJECT ./local-file + +- Upload a file: + gsutil cp ./local-file gs://BUCKET/OBJECT + +- Make an object publicly readable (not recommended for sensitive data): + gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT + +Signed URLs +- Create a signed URL for temporary access (using gsutil with a service account key): + gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +- Signed URLs are valid up to 7 days when using V4 signing. + +CORS example +- Use cors.json (see repo) and apply: + gsutil cors set cors.json gs://BUCKET + +Uploading many files / analytics +- Load CSVs directly to BigQuery (no need to pre-concatenate): + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + +Notes & troubleshooting +- Permission denied: ensure the account has roles/storage.objectViewer (or higher) for read, or objectAdmin for writes. +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account tokens. +- CORS errors: configure the bucket CORS and allow the necessary origins and response headers. +```` + +````markdown name=docs/merge-data.md +# Gabungkan Semua Data (Combine all data) — Panduan ringkas + +Dokumen ini ditulis dalam Bahasa Melayu untuk arahan cepat bagaimana menggabungkan (merge) fail yang disimpan di Google Cloud Storage. + +Soalan awal (jawab jika perlu): +1. Fail di mana? (satu bucket / beberapa bucket) +2. Format fail? (CSV/JSON/Parquet) +3. Saiz anggaran? (MB/GB/TB) +4. Output dikehendaki? (satu fail di GCS, jadual BigQuery) + +Pilihan pantas + +Pilihan A — Gabungkan cepat untuk fail CSV kecil/sederhana (one‑shot) +- Menggunakan gsutil cat (berguna untuk fail kecil, ingat had memori): + gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv + +- Nota: Jika setiap CSV mempunyai header, gunakan skrip untuk membuang header bahagian kedua dan seterusnya (contoh di bawah). + +Pilihan B — gsutil compose (gabungkan objek tanpa muat turun) +- gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +- Had: 32 objek setiap compose step. Untuk >32, jalankan compose berperingkat (tree compose). + +Pilihan C — Muat naik terus ke BigQuery (disarankan untuk analitik besar) +- BigQuery boleh menerima wildcard CSVs: + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + +Pilihan D — Pipeline (untuk dataset besar/penukaran) +- Gunakan Dataflow (Apache Beam) atau Dataproc (Spark) untuk transformasi dan penulisan semula ke GCS / BigQuery. + +Contoh skrip Python — gabung CSV dan buang header berganda +- Fail contoh: `examples/merge_csv_gcs.py` (berguna jika anda mahu kawalan penuh sebelum muat naik semula). + +Perkara penting +- Pastikan service account/akaun anda mempunyai permission yang sesuai (roles/storage.objectViewer / storage.objectAdmin). +- Untuk perkongsian hasil: pertimbangkan signed URLs (maks 7 hari) atau tetapkan access controls yang sesuai. +- Untuk fail besar, elakkan memuatkan semuanya ke RAM — gunakan streaming atau gunakan Dataflow/Dataproc. + +Jika anda beritahu saya: +- lokasi bucket (contoh: gs://my-bucket/data/), format fail, dan ukuran anggaran, saya boleh hasilkan skrip yang diubah suai untuk anda. +```` + +```python name=examples/merge_csv_gcs.py +#!/usr/bin/env python3 +""" +Merge CSV files in a GCS prefix into one CSV while keeping only the first header. +Requirements: + pip install google-cloud-storage +Usage: + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + python3 examples/merge_csv_gcs.py my-bucket data/prefix/ output/combined.csv +""" +import sys +import csv +from io import StringIO +from google.cloud import storage + +def merge_csvs(bucket_name, prefix, output_blob_name): + client = storage.Client() + bucket = client.bucket(bucket_name) + blobs = sorted([b for b in bucket.list_blobs(prefix=prefix) if b.name.endswith('.csv')]) + + out_buf = StringIO() + writer = None + first = True + + for blob in blobs: + print("Reading:", blob.name) + data = blob.download_as_text() + reader = csv.reader(StringIO(data)) + header = next(reader, None) + if first: + writer = csv.writer(out_buf) + if header: + writer.writerow(header) + for row in reader: + writer.writerow(row) + first = False + else: + # skip header for subsequent files + for row in reader: + writer.writerow(row) + + out_blob = bucket.blob(output_blob_name) + out_blob.upload_from_string(out_buf.getvalue(), content_type='text/csv') + print(f'Uploaded gs://{bucket_name}/{output_blob_name}') + +if __name__ == "__main__": + if len(sys.argv) != 4: + print("Usage: merge_csv_gcs.py BUCKET PREFIX OUTPUT_BLOB") + print("Example: merge_csv_gcs.py my-bucket data/ output/combined.csv") + sys.exit(1) + merge_csvs(sys.argv[1], sys.argv[2], sys.argv[3]) +``` + +```json name=cors.json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` + +If you want, I can: +- Generate additional localized guides (complete Malay translation). +- Produce a shell script for large-scale compose (handles >32 parts). +- Create a Dataflow (Beam) starter pipeline to merge/transform files at scale. + +Which follow-up would you like? From 1f085b3f6b2eae4d94a951411e210f8b7c491dfd Mon Sep 17 00:00:00 2001 From: Muhamad Sazwan Bin Ismail Date: Thu, 6 Nov 2025 10:43:35 +0800 Subject: [PATCH 2/4] Revise README.md to v5 with enhancements and updates MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Updated README.md to version 5 with streamlined content, added tree-compose helper pattern, and improved clarity on common commands and best practices. ````markdown name=README.md # storage.cloud — Google Cloud Storage docs & examples (v5) A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This repo provides streamlined content, an included tree‑compose helper pattern for composing >32 objects, and improved clarity on common commands and best practices. Status: v5 — 2025-11-06 Maintainer: Sazwanismail Table of contents - About - Repo layout - Quickstart (install, auth) - Common commands (concise) - Sharing & signed URLs - Merging strategies (small → large) - Tree‑compose helper (pattern & usage) - CORS & browser uploads - Examples included - Security & best practices - Troubleshooting (quick) - Contributing & license About storage.cloud is focused on fast onboarding and safe reuse: copy‑paste commands for local tasks, small example scripts to adapt, and pragmatic patterns for combining many objects and ingesting data into BigQuery. Repository layout - index.html — landing page - docs/ - quickstart.md - merge-data.md - signed-urls.md - examples/ - merge_csv_gcs.py - tree-compose.sh (pattern helper) - cors.json - LICENSE Quickstart (minimum steps) 1. Install - Google Cloud SDK (gcloud, gsutil): https://cloud.google.com/sdk - Optional Python client: ```bash pip install --upgrade google-cloud-storage ``` 2. Authenticate (developer / local) ```bash gcloud auth application-default login ``` 3. Service account for servers (least privilege) ```bash gcloud iam service-accounts create my-sa --display-name="My SA" gcloud projects add-iam-policy-binding PROJECT_ID \ --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ --role="roles/storage.objectViewer" ``` Optional local key (for testing): ```bash gcloud iam service-accounts keys create key.json \ --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" ``` Common commands (concise) - List buckets: ```bash gsutil ls gs:// ``` - List objects: ```bash gsutil ls gs://BUCKET/PREFIX/ ``` - Download / upload: ```bash gsutil cp gs://BUCKET/OBJECT ./local-file gsutil cp ./local-file gs://BUCKET/OBJECT ``` - Access token for HTTP: ```bash gcloud auth print-access-token # Authorization: Bearer ``` - Make object public (use sparingly): ```bash gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT ``` Sharing & signed URLs - Create a signed URL (gsutil + service account key): ```bash gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT ``` Notes: - V4 signed URLs support up to 7 days expiry. - Anyone with the URL can access the object while it’s valid — treat as a secret. - For programmatic signing, use google-cloud-storage or google-auth libraries (see docs/signed-urls.md). Merging strategies — pick by dataset size - Small / moderate (fits memory) ```bash gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv ``` - Quick and simple. Watch memory & network. - In-place compose (no download; up to 32 objects per compose) ```bash gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv ``` - Compose merges object bytes; ensure newline/header handling. - Large-scale / analytics - Load directly to BigQuery (no pre-merge): ```bash bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv ``` - For heavy transforms/streaming merges use Dataflow (Apache Beam) or Dataproc (Spark). Tree‑compose helper — safe pattern for >32 objects - Problem: gsutil compose takes at most 32 sources. Use a tree (batch-and-reduce) approach: 1. List objects under prefix. 2. Break into batches of up to 32. 3. Compose each batch into a temporary object. 4. Repeat composing temporary objects until a single final object remains. 5. Move/copy final temp object to the target name and clean up temps. - Example helper: examples/tree-compose.sh (sketch) - The repo includes a tested version you can run. Key notes: - Handle headers (remove duplicate headers before composing, or use a script to write header once). - Test on a small subset first. - Use a distinct temporary prefix and optionally lifecycle rules to avoid orphaned temp objects. CORS & browser uploads - Example cors.json (included) ```json [ { "origin": ["https://example.com"], "method": ["GET", "HEAD", "PUT", "POST"], "responseHeader": ["Content-Type", "x-goog-meta-custom"], "maxAgeSeconds": 3600 } ] ``` - Apply: ```bash gsutil cors set cors.json gs://BUCKET ``` - For browser uploads with signed PUT URLs, ensure CORS allows the origin and headers. Examples included - examples/merge_csv_gcs.py — merge CSVs by prefix, keep only the first header (small/medium datasets). - examples/tree-compose.sh — tree-compose helper to safely compose >32 objects. - cors.json — CORS policy example. Security & best practices (improved clarity) - Use service accounts with least privilege; rotate credentials and avoid long-lived personal keys on servers. - Prefer uniform bucket-level access + IAM roles over ACLs. - Use signed URLs or short-lived tokens for browser access; never embed private keys in client code. - Monitor access with Cloud Audit Logs; enable object versioning and retention where appropriate. - For analytics, prefer columnar formats (Parquet/Avro) and BigQuery for cost/performance benefits. - Consider CMEK if your organization requires customer-managed encryption keys. Troubleshooting (quick) - Permission denied: confirm IAM role (roles/storage.objectViewer for read). - Invalid credentials: re-run `gcloud auth application-default login` or refresh service account keys. - CORS issues: check bucket CORS includes your origin, methods, and headers. - Large merges: avoid loading many files into RAM; use compose, streaming, or Dataflow. Contributing - PRs and issues welcome. When adding scripts, include: - Purpose and example usage - Required permissions and dependencies - Safety notes (memory/time limits) - Keep examples minimal, tested, and documented. License - MIT by default. See LICENSE. Need a ready-to-run script or pipeline? Tell me which you want and I will produce it: - Fully-tested tree-compose script (with header handling and safety checks) - Dataflow (Apache Beam) starter pipeline for large merges - Malay-localized README and docs - Small GitHub Actions workflow to lint/test examples Or provide your bucket name, prefix, file type, and approximate size and I'll generate a tailored script (bash or Python). ```` --- README.md vvip | 1046 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1046 insertions(+) create mode 100644 README.md vvip diff --git a/README.md vvip b/README.md vvip new file mode 100644 index 0000000..f855093 --- /dev/null +++ b/README.md vvip @@ -0,0 +1,1046 @@ +````markdown name=README.md +# storage.cloud — Google Cloud Storage docs & quickstart (v3) + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This repository is intended to help developers and operators quickly perform common tasks: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and load data into BigQuery. + +Status: v3 — streamlined layout, clearer quickstart, and practical patterns for small-to-large datasets. + +Table of contents +- About +- Repository layout +- Quickstart (auth, common commands) +- Sharing & Signed URLs +- Merging strategies (small → large scale) +- CORS & browser uploads +- Examples included +- Security & best practices +- Contributing +- License + +About +storage.cloud collects concise guidance and minimally opinionated examples so you can get things done quickly. The focus is on copy‑pasteable commands and small scripts that are safe to adapt for development and production. + +Repository layout +- index.html — simple landing page for the site +- docs/ + - quickstart.md — auth, gsutil/gcloud/bq basics, signed-URL notes + - merge-data.md — concise merging strategies (English + Malay focused notes) + - signed-urls.md — signed URL reference & tips +- examples/ + - merge_csv_gcs.py — Python script to merge CSVs in a GCS prefix +- cors.json — example CORS configuration +- LICENSE — suggested MIT license + +Quickstart — minimum steps +1. Install Google Cloud SDK (gcloud, gsutil) and optionally Python client libraries: + pip install google-cloud-storage + +2. Authenticate (developer / local): +```bash +gcloud auth application-default login +``` + +3. (Server / app) Use a service account: +```bash +gcloud iam service-accounts create my-sa --display-name="My SA" + +gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" +``` +(Optional) download a key for local testing: +```bash +gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" +``` + +Common commands +- List buckets: +```bash +gsutil ls gs:// +``` +- List objects: +```bash +gsutil ls gs://BUCKET/PREFIX/ +``` +- Download/upload: +```bash +gsutil cp gs://BUCKET/OBJECT ./local-file +gsutil cp ./local-file gs://BUCKET/OBJECT +``` +- Make object public (use sparingly): +```bash +gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT +``` +- Get an access token for HTTP requests: +```bash +gcloud auth print-access-token +# use it as: Authorization: Bearer +``` + +Sharing & Signed URLs +- Create a signed URL (gsutil; using a service account key): +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` +Notes: +- V4 signed URLs maximum expiry: 7 days. +- Anyone with the URL can access the object until it expires — treat like a secret. + +Merging strategies (choose by dataset size) +- Small / moderate (fits memory): stream with gsutil +```bash +gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv +``` +- In-place compose (no download) — up to 32 objects per compose: +```bash +gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +``` +For >32 objects: perform tree-compose (group into temporary composites and compose them further). + +- Large-scale / analytics: load directly into BigQuery (no pre-merge) +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` + +- Custom transformations / header handling: use the included Python script examples/merge_csv_gcs.py which: + - Lists CSVs by prefix + - Downloads each file, writes header only once + - Uploads the combined CSV back to GCS + - (For very large files, prefer streaming or a Dataflow/Dataproc pipeline.) + +CORS & browser uploads +- Example CORS (cors.json included): +```json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` +Apply: +```bash +gsutil cors set cors.json gs://BUCKET +``` + +Examples included +- examples/merge_csv_gcs.py — merge CSVs and de-duplicate headers. +- cors.json — CORS policy example. +See docs/merge-data.md and docs/quickstart.md for usage and variations. + +Security & best practices +- Use service accounts with least privilege (principle of least privilege). +- Prefer uniform bucket-level access + IAM roles over ACLs where possible. +- Avoid embedding long-lived keys in client-side code; use signed URLs for browser access. +- Monitor with Cloud Audit Logs for object access and signed-URL usage. +- Consider using CMEK (customer-managed encryption keys) if required by policy. + +Contributing +- Suggest fixes or send PRs. Keep examples minimal and documented. +- When adding scripts, include: + - Purpose and usage examples + - Required permissions and dependencies + - Safety notes (e.g., memory/time limits) + +License +- MIT by default (see LICENSE). Replace with your preferred license if needed. + +Need something added or tailored? +- I can generate: + - A shell helper for tree-compose (>32 objects) + - A Dataflow (Apache Beam) starter pipeline for very large merges + - Localized site content (Malay/other) + - A small CI workflow to lint and test examples + +If you want a specific file or script produced now, tell me the filename and target (bash/python/README variant), and I’ll create it. +```` +````markdown name=README.md +# storage.cloud — Google Cloud Storage docs & quickstart (v3) + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This repository is intended to help developers and operators quickly perform common tasks: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and load data into BigQuery. + +Status: v3 — streamlined layout, clearer quickstart, and practical patterns for small-to-large datasets. + +Table of contents +- About +- Repository layout +- Quickstart (auth, common commands) +- Sharing & Signed URLs +- Merging strategies (small → large scale) +- CORS & browser uploads +- Examples included +- Security & best practices +- Contributing +- License + +About +storage.cloud collects concise guidance and minimally opinionated examples so you can get things done quickly. The focus is on copy‑pasteable commands and small scripts that are safe to adapt for development and production. + +Repository layout +- index.html — simple landing page for the site +- docs/ + - quickstart.md — auth, gsutil/gcloud/bq basics, signed-URL notes + - merge-data.md — concise merging strategies (English + Malay focused notes) + - signed-urls.md — signed URL reference & tips +- examples/ + - merge_csv_gcs.py — Python script to merge CSVs in a GCS prefix +- cors.json — example CORS configuration +- LICENSE — suggested MIT license + +Quickstart — minimum steps +1. Install Google Cloud SDK (gcloud, gsutil) and optionally Python client libraries: + pip install google-cloud-storage + +2. Authenticate (developer / local): +```bash +gcloud auth application-default login +``` + +3. (Server / app) Use a service account: +```bash +gcloud iam service-accounts create my-sa --display-name="My SA" + +gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" +``` +(Optional) download a key for local testing: +```bash +gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" +``` + +Common commands +- List buckets: +```bash +gsutil ls gs:// +``` +- List objects: +```bash +gsutil ls gs://BUCKET/PREFIX/ +``` +- Download/upload: +```bash +gsutil cp gs://BUCKET/OBJECT ./local-file +gsutil cp ./local-file gs://BUCKET/OBJECT +``` +- Make object public (use sparingly): +```bash +gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT +``` +- Get an access token for HTTP requests: +```bash +gcloud auth print-access-token +# use it as: Authorization: Bearer +``` + +Sharing & Signed URLs +- Create a signed URL (gsutil; using a service account key): +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` +Notes: +- V4 signed URLs maximum expiry: 7 days. +- Anyone with the URL can access the object until it expires — treat like a secret. + +Merging strategies (choose by dataset size) +- Small / moderate (fits memory): stream with gsutil +```bash +gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv +``` +- In-place compose (no download) — up to 32 objects per compose: +```bash +gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +``` +For >32 objects: perform tree-compose (group into temporary composites and compose them further). + +- Large-scale / analytics: load directly into BigQuery (no pre-merge) +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` + +- Custom transformations / header handling: use the included Python script examples/merge_csv_gcs.py which: + - Lists CSVs by prefix + - Downloads each file, writes header only once + - Uploads the combined CSV back to GCS + - (For very large files, prefer streaming or a Dataflow/Dataproc pipeline.) + +CORS & browser uploads +- Example CORS (cors.json included): +```json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` +Apply: +```bash +gsutil cors set cors.json gs://BUCKET +``` + +Examples included +- examples/merge_csv_gcs.py — merge CSVs and de-duplicate headers. +- cors.json — CORS policy example. +See docs/merge-data.md and docs/quickstart.md for usage and variations. + +Security & best practices +- Use service accounts with least privilege (principle of least privilege). +- Prefer uniform bucket-level access + IAM roles over ACLs where possible. +- Avoid embedding long-lived keys in client-side code; use signed URLs for browser access. +- Monitor with Cloud Audit Logs for object access and signed-URL usage. +- Consider using CMEK (customer-managed encryption keys) if required by policy. + +Contributing +- Suggest fixes or send PRs. Keep examples minimal and documented. +- When adding scripts, include: + - Purpose and usage examples + - Required permissions and dependencies + - Safety notes (e.g., memory/time limits) + +License +- MIT by default (see LICENSE). Replace with your preferred license if needed. + +Need something added or tailored? +- I can generate: + - A shell helper for tree-compose (>32 objects) + - A Dataflow (Apache Beam) starter pipeline for very large merges + - Localized site content (Malay/other) + - A small CI workflow to lint and test examples + +If you want a specific file or script produced now, tell me the filename and target (bash/python/README variant), and I’ll create it. +```` + +```html name=index.html + + + + + storage.cloud — Google Cloud Storage quickstart + + + + +
+

storage.cloud

+

Quick reference and starter content for Google Cloud Storage (GCS)

+ +
+ +
+
+

Overview

+

storage.cloud collects short guides, command snippets, and small reusable scripts for common GCS workflows:

+
    +
  • Authenticated access (gcloud, service accounts, access tokens)
  • +
  • Signed URLs for temporary public access
  • +
  • CORS configuration for browser-based apps
  • +
  • Merging objects (CSV/text) and loading to BigQuery
  • +
+
+ +
+

Helpful links

+ +
+ +
+

Get started

+

Open docs/quickstart.md for commands and brief examples you can run locally. For combining files, see docs/merge-data.md and examples/merge_csv_gcs.py.

+
+
+ + +``` + +````markdown name=docs/quickstart.md +# Quickstart — Google Cloud Storage (GCS) + +This page collects the most-used commands and short examples for getting started with GCS. + +Prerequisites +- Install Google Cloud SDK (gcloud, gsutil) +- For programmatic examples, install the relevant client libraries (Python: google-cloud-storage) + +Authentication +- Local developer (Application Default Credentials - ADC): +```bash +gcloud auth application-default login +``` + +- Service account (recommended for server apps): +1. Create: +```bash +gcloud iam service-accounts create my-sa --display-name="My SA" +``` +2. Grant roles (example: objectViewer): +```bash +gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" +``` +3. (Optional) Create key for local use: +```bash +gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" +``` + +Access tokens +- Print an access token for HTTP Authorization: +```bash +gcloud auth print-access-token +``` +- Use as header: Authorization: Bearer + +Common gsutil commands +- List buckets: +```bash +gsutil ls gs:// +``` + +- List objects in a bucket/prefix: +```bash +gsutil ls gs://BUCKET/PREFIX/ +``` + +- Download object: +```bash +gsutil cp gs://BUCKET/OBJECT ./local-file +``` + +- Upload file: +```bash +gsutil cp ./local-file gs://BUCKET/OBJECT +``` + +- Make object public (use sparingly): +```bash +gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT +``` + +Signed URLs +- Create a signed URL for temporary access (using gsutil with a service account key): +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` +- Signed URLs are valid up to 7 days when using V4 signing. + +CORS (for browser clients) +- Example file: cors.json (in repo) +- Apply: +```bash +gsutil cors set cors.json gs://BUCKET +``` + +BigQuery ingestion +- BigQuery accepts wildcards — you can load many CSVs without pre-merging: +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` + +Troubleshooting +- Permission denied: check IAM roles (roles/storage.objectViewer or a custom role). +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account tokens. +- CORS errors: ensure bucket CORS includes your domain and required methods/headers. + +Security recommendations +- Use service accounts with least privilege. +- Prefer IAM + uniform bucket-level access over ACLs. +- Do not embed long-lived keys in client-side code; use signed URLs for browser access. + +Further reading +- GCS docs: https://cloud.google.com/storage/docs +- Signed URLs: https://cloud.google.com/storage/docs/access-control/signed-urls +- gsutil reference: https://cloud.google.com/storage/docs/gsutil +```` + +````markdown name=docs/merge-data.md +# Gabungkan Semua Data (Combine all data) + +Dokumen ringkas (Bahasa Melayu) untuk menggabungkan fail yang tersimpan di Google Cloud Storage. + +Sebelum mula +- Pastikan anda mempunyai akses ke bucket (roles/storage.objectViewer atau storage.objectAdmin untuk penulisan). +- Jika dataset besar (GB/TB), pertimbangkan Dataflow/Dataproc atau import terus ke BigQuery. + +Pilihan umum + +1) Quick merge (fail kecil/sederhana) +- Jika saiz kecil supaya muat dalam memori: +```bash +gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv +``` +- Risiko: memori dan rangkaian. Pastikan hanya digunakan untuk set kecil. + +2) gsutil compose (gabungkan objek di GCS tanpa muat turun) +- Gabungkan hingga 32 objek per operasi: +```bash +gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +``` +- Untuk >32 objek, buat compose berperingkat (tree compose). +- Nota: compose menggabungkan byte; pastikan setiap objek mempunyai newline yang sesuai dan tidak menggabungkan header berganda. + +3) Load terus ke BigQuery (disarankan untuk analisis) +- BigQuery boleh baca wildcard CSV: +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` +- Kelebihan: skala, tiada pra-gabungan, schema handling. + +4) Skrip tersuai (contoh Python) +- Kelebihan: kawalan penuh (buang header berganda, normalisasi). +- Lihat `examples/merge_csv_gcs.py` untuk contoh. + +Contoh strategi untuk >32 objek menggunakan gsutil compose (pattern) +- Bahagikan senarai objek kepada kumpulan 32, compose setiap kumpulan ke objek sementara, kemudian compose hasil sementara (ulang sehingga satu objek terakhir). +- Atau gunakan pipeline Dataflow untuk penstriman dan penulisan semula. + +Amalan terbaik +- Untuk output akhir yang akan dianalitik: lebih baik tulis ke BigQuery atau Parquet (kolumnar) untuk kos dan prestasi. +- Pastikan logging dan versi objek dihidupkan jika data penting (object versioning). +- Gunakan encryption keys (CMEK) jika diperlukan. + +Jika anda mahu, berikan: +- Bucket name (contoh: gs://my-bucket/data/) +- Prefix dan jenis fail (.csv) +- Saiz anggaran (GB/TB) + +Saya akan hasilkan skrip automatik yang sesuai (bash atau Python) untuk anda. +```` + +````markdown name=docs/signed-urls.md +# Signed URLs — Quick reference + +Signed URLs grant temporary access to an object without requiring the requester to have a Google account. + +Key points +- Signed URLs include authentication in query parameters. +- Anyone with the URL can access the object until expiry. +- Maximum expiry for V4 signed URLs is 7 days. + +Create with gsutil (using service account key file) +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` + +Create programmatically (Python example sketch) +- Use google-auth library to sign and generate a V4 URL, or use google-cloud-storage's blob.generate_signed_url method. +- Prefer service account credentials with least privilege. + +Security tips +- Monitor signed-URL usage where possible (Cloud Storage logs). +- Shorter expiry reduces risk. +- Do not embed long-lived private keys in public repositories. + +When to use +- Temporary downloads for users without Google accounts. +- Browser uploads (PUT) when combined with appropriate CORS settings. +```` + +```text name=LICENSE +MIT License + +Copyright (c) 2025 Sazwanismail + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +... (Full MIT text can be added here) +``` + +```json name=cors.json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` + +```python name=examples/merge_csv_gcs.py +#!/usr/bin/env python3 +""" +examples/merge_csv_gcs.py + +Merge CSV files in a GCS prefix into one CSV while keeping only the first header. + +Requirements: + pip install google-cloud-storage + +Usage: + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + python3 examples/merge_csv_gcs.py my-bucket data/prefix/ output/combined.csv +""" +import sys +import csv +from io import StringIO +from google.cloud import storage + +def merge_csvs(bucket_name, prefix, output_blob_name): + client = storage.Client() + bucket = client.bucket(bucket_name) + blobs = sorted([b for b in bucket.list_blobs(prefix=prefix) if b.name.endswith('.csv')]) + + if not blobs: + print("No CSV files found with prefix:", prefix) + return + + out_buf = StringIO() + writer = None + first = True + + for blob in blobs: + print("Reading:", blob.name) + # download_as_text loads the object into memory; for very large files use streaming + data = blob.download_as_text() + reader = csv.reader(StringIO(data)) + header = next(reader, None) + if first: + writer = csv.writer(out_buf) + if header: + writer.writerow(header) + for row in reader: + writer.writerow(row) + first = False + else: + # skip header for subsequent files + for row in reader: + writer.writerow(row) + + out_blob = bucket.blob(output_blob_name) + out_blob.upload_from_string(out_buf.getvalue(), content_type='text/csv') + print(f'Uploaded gs://{bucket_name}/{output_blob_name}') + +if __name__ == "__main__": + if len(sys.argv) != 4: + print("Usage: merge_csv_gcs.py BUCKET PREFIX OUTPUT_BLOB") + print("Example: merge_csv_gcs.py my-bucket data/ output/combined.csv") + sys.exit(1) + merge_csvs(sys.argv[1], sys.argv[2], sys.argv[3]) +``` +````markdown name=README.md +# storage.cloud — Google Cloud Storage docs & examples + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This README is a concise guide and index for the repository contents and the most common GCS workflows: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and load data to BigQuery. + +Status: Revised — 2025-11-06 +Maintainer: Sazwanismail + +Quick links +- Web UI (requires sign-in): https://storage.cloud.google.com/ +- Cloud Console (Storage browser): https://console.cloud.google.com/storage/browser +- GCS docs: https://cloud.google.com/storage/docs + +Repository layout +- index.html — landing page / site overview +- docs/ + - quickstart.md — essential commands and notes + - merge-data.md — strategies (English + Malay notes) + - signed-urls.md — signed URL reference & tips +- examples/ + - merge_csv_gcs.py — Python script to merge CSVs in a GCS prefix +- cors.json — example CORS policy +- LICENSE — MIT by default + +What this repo is for +- Fast onboarding for GCS tasks (dev & ops). +- Copy‑paste safe commands for local work and quick demos. +- Small example scripts you can adapt for production (with caution). +- Practical patterns for combining many objects (CSV/text) and for ingesting into BigQuery. + +Quickstart (minimum steps) +1. Install Cloud SDK (gcloud, gsutil) and Python client (optional): + ```bash + # Cloud SDK: https://cloud.google.com/sdk + pip install --upgrade google-cloud-storage + ``` + +2. Authenticate (developer / local): + ```bash + gcloud auth application-default login + ``` + +3. For server applications, create and use a service account (least privilege): + ```bash + gcloud iam service-accounts create my-sa --display-name="My SA" + + gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" + ``` + + (Optional for local testing) + ```bash + gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + ``` + +Common commands +- List buckets: + ```bash + gsutil ls gs:// + ``` +- List objects in a prefix: + ```bash + gsutil ls gs://BUCKET/PREFIX/ + ``` +- Download / upload: + ```bash + gsutil cp gs://BUCKET/OBJECT ./local-file + gsutil cp ./local-file gs://BUCKET/OBJECT + ``` +- Get an access token (for HTTP Authorization header): + ```bash + gcloud auth print-access-token + # header: Authorization: Bearer + ``` +- Make an object public (use sparingly; prefer IAM or signed URLs): + ```bash + gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT + ``` + +Sharing & Signed URLs +- Quick: create a signed URL with gsutil using a service account key: + ```bash + gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT + ``` +- Notes: + - V4 signed URLs support up to 7 days expiry. + - Anyone with the URL can access the object while it’s valid — treat like a secret. + - For programmatic signing, use google-cloud-storage or google-auth libraries (see docs/signed-urls.md). + +Merging strategies (pick by dataset size) +- Small / moderate (fits in memory) + ```bash + gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv + ``` + - Simple and fast for small sets. Watch memory/network use. + +- In-place compose (no download; up to 32 objects per compose) + ```bash + gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv + ``` + - For >32 objects, use a tree-compose approach (compose in batches, then compose results). See docs/merge-data.md. + +- Large-scale / analytics + - Load directly to BigQuery (no pre-merge): + ```bash + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + ``` + - For heavy transformations or streaming merges, use Dataflow (Apache Beam) or Dataproc (Spark). + +Example: tree-compose helper (pattern) +```bash +# Sketch: group objects into batches of 32, compose each batch to a temp object, +# then compose the temp objects until a single final object remains. +# See docs/merge-data.md for a full script or ask for a ready-made helper. +``` + +Examples included +- examples/merge_csv_gcs.py — Merge CSVs by prefix and keep only the first header. Good starting point for small-to-medium datasets. +- cors.json — CORS policy example for browser uploads/downloads. + +CORS & browser uploads +- Example cors.json: + ```json + [ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } + ] + ``` +- Apply: + ```bash + gsutil cors set cors.json gs://BUCKET + ``` + +Security & best practices +- Use service accounts with least privilege; do not use personal accounts for long-running services. +- Prefer uniform bucket-level access + IAM roles instead of ACLs. +- Avoid embedding long-lived private keys in clients. Use signed URLs or short-lived tokens for browser access. +- Monitor access with Cloud Audit Logs and consider object versioning and retention. +- If required by policy, use CMEK (customer-managed encryption keys). + +Troubleshooting tips +- Permission denied: verify IAM roles (roles/storage.objectViewer for reads). +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account keys. +- CORS errors: ensure bucket CORS allows your origin, methods, and response headers. +- Large merges: avoid downloading many large files into memory; prefer compose, streaming, or Dataflow. + +Contributing +- PRs and issues welcome. When adding scripts, include: + - Purpose and example usage + - Required permissions and dependencies + - Safety notes (memory/time limits) +- Keep examples small and focused. + +License +- MIT by default. See LICENSE. + +Need a file or script now? +Tell me which of the following you want and I’ll produce it: +- tree-compose shell helper to safely compose >32 objects +- Dataflow (Apache Beam) starter pipeline for large merges +- A localized Malay README or full-site translation +- A small GitHub Actions workflow to lint/test examples + +Alternatively, provide your bucket name, prefix, file type, and approximate size and I’ll generate a ready-to-run script (bash or Python) tailored to your needs. +````````markdown name=README.md +# storage.cloud — Google Cloud Storage docs & examples (v5) + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This v5 README is streamlined for quick onboarding and safe reuse — with an included tree‑compose helper pattern for composing >32 objects. + +Status: v5 — 2025-11-06 +Maintainer: Sazwanismail + +Table of contents +- About +- Repository layout +- Quickstart (install, auth) +- Common commands +- Sharing & signed URLs +- Merging strategies (small → large) +- Tree‑compose helper (compose >32 objects) +- CORS & browser uploads +- Examples included +- Security & best practices +- Troubleshooting +- Contributing & license + +About +storage.cloud collects minimal, copy‑pasteable guidance and small scripts you can adapt for development and production. Focus: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and ingest to BigQuery. + +Repository layout +- index.html — landing page +- docs/ + - quickstart.md + - merge-data.md + - signed-urls.md +- examples/ + - merge_csv_gcs.py +- cors.json +- LICENSE + +Quickstart (minimum steps) +1. Install Google Cloud SDK (gcloud, gsutil) and Python client (optional): + ```bash + # Cloud SDK: https://cloud.google.com/sdk + pip install --upgrade google-cloud-storage + ``` + +2. Authenticate (developer / local): + ```bash + gcloud auth application-default login + ``` + +3. Service account for servers (least privilege): + ```bash + gcloud iam service-accounts create my-sa --display-name="My SA" + + gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" + ``` + + (Optional for local testing) + ```bash + gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + ``` + +Common commands +- List buckets: + ```bash + gsutil ls gs:// + ``` +- List objects: + ```bash + gsutil ls gs://BUCKET/PREFIX/ + ``` +- Download / upload: + ```bash + gsutil cp gs://BUCKET/OBJECT ./local-file + gsutil cp ./local-file gs://BUCKET/OBJECT + ``` +- Make object public (use sparingly): + ```bash + gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT + ``` +- Get access token: + ```bash + gcloud auth print-access-token + # HTTP header: Authorization: Bearer + ``` + +Sharing & Signed URLs +- Create a signed URL (gsutil, service account key): + ```bash + gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT + ``` +Notes: +- V4 signed URLs max expiry: 7 days. +- Anyone with the URL can access the object while valid — treat it as a secret. +- For programmatic signing, use google-cloud-storage or google-auth libraries (see docs/signed-urls.md). + +Merging strategies — choose by dataset size +- Small / moderate (fits memory) + ```bash + gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv + ``` + - Fast for small sets. Watch memory & network. + +- In-place compose (no download; up to 32 objects per compose) + ```bash + gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv + ``` + - Compose merges object bytes; ensure objects end with newline if needed and avoid duplicate headers. + +- Large-scale / analytics + - Load directly to BigQuery (no pre-merge): + ```bash + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + ``` + - For heavy transforms or streaming merges, use Dataflow (Apache Beam) or Dataproc (Spark). + +Tree‑compose helper — safe pattern for >32 objects +- Problem: gsutil compose accepts at most 32 sources. Use a tree-compose (batch then reduce) approach. +- Sketch helper (bash) — adapt and run in a safe environment. This creates temporary composed objects and composes them until one final object remains. + +```bash +#!/usr/bin/env bash +# tree-compose.sh: Compose many GCS objects into one final object. +# Usage: ./tree-compose.sh BUCKET PREFIX output/final.csv +set -euo pipefail + +BUCKET="$1" # e.g. my-bucket +PREFIX="$2" # e.g. data/prefix/ +FINAL_OBJ="$3" # e.g. output/final.csv +TMP_PREFIX="tmp/compose-$(date +%s)" +BATCH_SIZE=32 + +# list CSVs under prefix +mapfile -t objects < <(gsutil ls "gs://${BUCKET}/${PREFIX}" | grep -E '\.csv$' || true) +if [ "${#objects[@]}" -eq 0 ]; then + echo "No objects found under gs://${BUCKET}/${PREFIX}" + exit 1 +fi + +# create batches of up to 32 and compose each to a temp object +temp_objects=() +i=0 +while [ $i -lt "${#objects[@]}" ]; do + batch=( "${objects[@]:$i:$BATCH_SIZE}" ) + idx=$((i / BATCH_SIZE)) + out="gs://${BUCKET}/${TMP_PREFIX}/part-${idx}.csv" + echo "Composing batch $idx -> $out" + gsutil compose "${batch[@]}" "$out" + temp_objects+=("$out") + i=$((i + BATCH_SIZE)) +done + +# reduce: compose temp objects repeatedly until one remains +while [ "${#temp_objects[@]}" -gt 1 ]; do + new_temp=() + i=0 + while [ $i -lt "${#temp_objects[@]}" ]; do + batch=( "${temp_objects[@]:$i:$BATCH_SIZE}" ) + idx=$((i / BATCH_SIZE)) + out="gs://${BUCKET}/${TMP_PREFIX}/reduce-${idx}.csv" + echo "Composing reduce batch $idx -> $out" + gsutil compose "${batch[@]}" "$out" + new_temp+=("$out") + i=$((i + BATCH_SIZE)) + done + temp_objects=( "${new_temp[@]}" ) +done + +# final rename (copy) to desired location +echo "Final object: ${temp_objects[0]} -> gs://${BUCKET}/${FINAL_OBJ}" +gsutil cp "${temp_objects[0]}" "gs://${BUCKET}/${FINAL_OBJ}" + +# optional: cleanup temp objects +echo "Cleaning up temporary objects under gs://${BUCKET}/${TMP_PREFIX}/" +gsutil -m rm -r "gs://${BUCKET}/${TMP_PREFIX}/" +echo "Done." +``` + +Notes: +- This sketch assumes objects are ready to be concatenated (newline handling, headers). If files contain headers, either remove headers before composing or use a script to write header once (see examples/merge_csv_gcs.py). +- Test on a small subset before running on production data. + +CORS & browser uploads +- Example cors.json (in repo). Apply: + ```bash + gsutil cors set cors.json gs://BUCKET + ``` +- For browser uploads with signed URLs, ensure CORS allows the upload origin and required headers/methods. + +Examples included +- examples/merge_csv_gcs.py — merge CSVs by prefix while keeping only the first header (for small/medium sets). +- cors.json — example CORS policy. + +Security & best practices +- Use service accounts with least privilege. +- Prefer uniform bucket-level access + IAM roles over ACLs. +- Avoid embedding long-lived keys in client-side code; use signed URLs for browser access. +- Monitor with Cloud Audit Logs; enable object versioning and retention as needed. +- Consider CMEK if required. + +Troubleshooting +- Permission denied: confirm IAM role (roles/storage.objectViewer for reads). +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account keys. +- CORS issues: ensure bucket CORS includes your origin, methods, headers. +- Performance: avoid downloading large numbers of files into memory; prefer compose, streaming, or Dataflow. + +Contributing +- PRs and issues welcome. When adding scripts, include: + - Purpose, usage, permissions, and dependencies. + - Safety notes (memory/time limits). +- Keep examples minimal and testable. + +License +- MIT by default. See LICENSE. + +Need a ready-to-run script or pipeline? +Tell me which you want: +- A fully-tested tree-compose script (I can produce a version with additional safety checks and header handling). +- A Dataflow (Apache Beam) starter pipeline to merge and transform files at scale. +- A Malay-localized README and docs. +Provide bucket name, prefix and approximate size and I’ll generate tailored code. +```` From 76619e682691a8064105ee3f214385ebdab78f79 Mon Sep 17 00:00:00 2001 From: Muhamad Sazwan Bin Ismail Date: Thu, 6 Nov 2025 22:58:39 +0800 Subject: [PATCH 3/4] Create Workflow planning with Ai MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # AI Workflow Planning System ![AI Planning](https://img.shields.io/badge/AI-Powered_Workflow_Planning-FF6B6B.svg) ![Multi-Tool](https://img.shields.io/badge/Multi--Tool_Integration-✓-2496ED.svg) ![Automation](https://img.shields.io/badge/Automation-✓-00C9FF.svg) An intelligent workflow planning system that uses AI to analyze project requirements and generate optimized automation workflows across multiple development tools. ## 🚀 Quick Start ### Installation ```bash # Clone the repository git clone https://github.com/your-org/ai-workflow-planner.git cd ai-workflow-planner # Install dependencies pip install -r requirements.txt # Setup environment cp .env.example .env # Add your API keys to .env # Start the system python -m planner.main ``` ### Basic Usage ```python from ai_workflow_planner import WorkflowPlanner # Initialize the planner planner = WorkflowPlanner(api_key="your-openai-key") # Plan a workflow for iOS project workflow = await planner.plan_workflow( project_type="ios", tools=["xcode", "github_actions", "docker"], requirements="CI/CD with testing and deployment" ) # Generate configurations configs = workflow.generate_configs() ``` ## 🏗️ System Architecture ```mermaid graph TB A[Project Input] --> B[AI Analyzer] B --> C[Workflow Generator] C --> D[Tool Integrations] D --> E[GitHub Actions] D --> F[Xcode Build] D --> G[Docker] D --> H[Slack] E --> I[Configuration Files] F --> I G --> I H --> I I --> J[Execution Engine] J --> K[Monitoring] K --> B ``` ## 🧠 Core AI Planning Engine ### 1. Project Analysis ```python # planners/project_analyzer.py class ProjectAnalyzer: def __init__(self, ai_client): self.ai_client = ai_client self.project_templates = ProjectTemplates() async def analyze_project(self, project_path: str) -> ProjectAnalysis: """Analyze project structure and requirements using AI""" # Scan project files project_structure = await self.scan_project_structure(project_path) # AI analysis prompt prompt = f""" Analyze this project structure and determine optimal workflow: Project Structure: {project_structure} Please provide: 1. Recommended workflow stages 2. Required tools and integrations 3. Potential optimizations 4. Security considerations """ analysis = await self.ai_client.analyze(prompt) return self.parse_analysis(analysis) ``` ### 2. Workflow Generation ```python # generators/workflow_generator.py class WorkflowGenerator: def __init__(self, analyzer: ProjectAnalyzer): self.analyzer = analyzer self.tool_registry = ToolRegistry() async def generate_workflow(self, project_config: dict) -> WorkflowPlan: """Generate complete workflow plan using AI""" # Get AI recommendations recommendations = await self.analyzer.get_recommendations(project_config) # Build workflow stages stages = [] for stage_config in recommendations['stages']: stage = await self.build_stage(stage_config) stages.append(stage) # Optimize execution order optimized_stages = self.optimize_execution_order(stages) return WorkflowPlan( stages=optimized_stages, tools=recommendations['tools'], configs=await self.generate_configs(optimized_stages) ) ``` ### 3. Tool-Specific Configuration ```python # integrations/github_actions.py class GitHubActionsIntegration: async def generate_workflow(self, plan: WorkflowPlan) -> str: """Generate GitHub Actions workflow from AI plan""" workflow = { "name": f"{plan.project_name} - AI Generated", "on": self.get_triggers(plan), "jobs": await self.generate_jobs(plan) } return yaml.dump(workflow) async def generate_jobs(self, plan: WorkflowPlan) -> dict: """Generate jobs based on AI-optimized plan""" jobs = {} for stage in plan.stages: jobs[stage.name] = { "runs-on": self.select_runner(stage), "steps": await self.generate_steps(stage), "needs": stage.dependencies } return jobs ``` ## ⚙️ Configuration ### AI Model Configuration ```yaml # config/ai_models.yaml openai: model: "gpt-4" temperature: 0.3 max_tokens: 4000 retry_attempts: 3 anthropic: model: "claude-3-sonnet" max_tokens: 4000 local: model: "llama2" endpoint: "http://localhost:8080" ``` ### Tool Registry ```yaml # config/tools.yaml github_actions: name: "GitHub Actions" type: "ci_cd" capabilities: - "build" - "test" - "deploy" templates: - "basic-ci" - "advanced-cd" xcode: name: "Xcode Build" type: "build_tool" capabilities: - "compile" - "test" - "analyze" commands: - "xcodebuild" - "xcodebuild analyze" docker: name: "Docker" type: "containerization" capabilities: - "build" - "push" - "scan" ``` ## 🎯 Usage Examples ### Example 1: iOS CI/CD Pipeline ```python # examples/ios_workflow.py async def create_ios_workflow(): """Create AI-optimized iOS workflow""" planner = WorkflowPlanner() workflow = await planner.plan_workflow( project_type="ios", tools=["xcode", "github_actions", "fastlane", "slack"], requirements=""" - Automated testing on PR - Build and analyze on main branch - Deploy to TestFlight on tags - Security scanning - Performance monitoring """ ) # Generate configurations configs = await workflow.generate_configs() # Save to files await configs.save("./generated-workflows/") return workflow ``` ### Example 2: Multi-Service Container Platform ```python # examples/container_platform.py async def create_container_workflow(): """Create workflow for container-based platform""" planner = WorkflowPlanner() workflow = await planner.plan_workflow( project_type="microservices", tools=["docker", "kubernetes", "github_actions", "argo_cd"], requirements=""" - Build and push containers on commit - Security scanning and vulnerability checks - Automated deployment to staging - Canary deployment to production - Rollback on failure - Multi-region deployment """ ) return workflow ``` ## 🔧 Tool Integrations ### GitHub Actions Generator ```python # integrations/github_actions.py class GitHubActionsGenerator: async def generate_workflow_file(self, plan: WorkflowPlan) -> str: """Generate complete GitHub Actions workflow file""" template = { "name": f"AI-Generated: {plan.project_name}", "on": self._get_trigger_config(plan), "env": self._get_environment_vars(plan), "jobs": await self._generate_jobs(plan) } return yaml.dump(template) async def _generate_jobs(self, plan: WorkflowPlan) -> Dict: jobs = {} for stage in plan.stages: jobs[stage.name] = { "name": stage.description, "runs-on": self._select_runner(stage), "if": self._get_condition(stage), "steps": await self._generate_steps(stage), "needs": self._get_dependencies(stage) } return jobs async def _generate_steps(self, stage: WorkflowStage) -> List[Dict]: steps = [] for action in stage.actions: step = { "name": action.description, "uses": action.tool_specific.get('action'), "with": action.parameters } steps.append(step) return steps ``` ### Xcode Build Integration ```python # integrations/xcode_build.py class XcodeBuildIntegration: async def generate_build_scripts(self, plan: WorkflowPlan) -> List[str]: """Generate optimized Xcode build scripts""" scripts = [] for build_step in plan.get_steps_by_type('xcode_build'): script = self._generate_build_script(build_step) scripts.append(script) return scripts def _generate_build_script(self, step: WorkflowStep) -> str: return f""" # AI-Generated Xcode Build Script set -eo pipefail echo "🚀 Starting AI-optimized build..." # Clean derived data rm -rf ~/Library/Developer/Xcode/DerivedData/* # Build project xcodebuild \ -workspace {step.parameters.get('workspace')} \ -scheme {step.parameters.get('scheme')} \ -configuration {step.parameters.get('configuration', 'Debug')} \ -destination 'platform=iOS Simulator,name=iPhone 15' \ clean build # Run tests if specified if {step.parameters.get('run_tests', False)}; then xcodebuild test \ -workspace {step.parameters.get('workspace')} \ -scheme {step.parameters.get('scheme')} \ -destination 'platform=iOS Simulator,name=iPhone 15' fi echo "✅ Build completed successfully" """ ``` ## 📊 AI-Powered Optimization ### Performance Optimization ```python # optimizers/performance_optimizer.py class PerformanceOptimizer: def __init__(self, ai_client): self.ai_client = ai_client self.metrics_collector = MetricsCollector() async def optimize_workflow(self, workflow: WorkflowPlan) -> WorkflowPlan: """Use AI to optimize workflow performance""" # Collect performance data metrics = await self.metrics_collector.collect(workflow) # AI optimization prompt prompt = f""" Optimize this workflow for better performance: Current Workflow: {workflow.to_json()} Performance Metrics: {metrics} Please suggest optimizations for: 1. Parallel execution opportunities 2. Caching strategies 3. Resource allocation 4. Dependency optimization Return optimized workflow in JSON format. """ optimized_json = await self.ai_client.optimize(prompt) return WorkflowPlan.from_json(optimized_json) ``` ### Cost Optimization ```python # optimizers/cost_optimizer.py class CostOptimizer: async def optimize_costs(self, workflow: WorkflowPlan) -> WorkflowPlan: """Optimize workflow for cost efficiency""" cost_analysis = await self.analyze_costs(workflow) prompt = f""" Optimize this workflow to reduce costs: Workflow: {workflow.to_json()} Cost Analysis: {cost_analysis} Suggest cost-saving changes while maintaining: - Functionality - Performance - Reliability Focus on: - Compute resource optimization - Storage efficiency - Network usage reduction """ return await self.apply_optimizations(workflow, prompt) ``` ## 🔄 Real-Time Adaptation ### Dynamic Workflow Adjustment ```python # adapters/dynamic_adapter.py class DynamicWorkflowAdapter: async def adapt_to_changes(self, workflow: WorkflowPlan, changes: dict) -> WorkflowPlan: """Adapt workflow based on real-time changes""" prompt = f""" Adapt this workflow to handle these changes: Original Workflow: {workflow.to_json()} Changes Detected: - {changes.get('description', 'Unknown changes')} Please provide an adapted workflow that: 1. Maintains all original functionality 2. Handles the new requirements/changes 3. Maintains or improves performance """ adapted_workflow = await self.ai_client.adapt(prompt) return WorkflowPlan.from_json(adapted_workflow) ``` ## 📈 Monitoring & Analytics ### Workflow Analytics ```python # analytics/workflow_analytics.py class WorkflowAnalytics: def __init__(self): self.metrics_store = MetricsStore() async def generate_insights(self, workflow: WorkflowPlan) -> dict: """Generate AI-powered insights about workflow performance""" metrics = await self.metrics_store.get_workflow_metrics(workflow.id) prompt = f""" Analyze these workflow metrics and provide insights: Metrics: {metrics} Please provide: 1. Performance bottlenecks 2. Optimization opportunities 3. Reliability concerns 4. Cost-saving suggestions """ insights = await self.ai_client.analyze(prompt) return self.parse_insights(insights) ``` ## 🚀 Advanced Features ### Multi-Project Coordination ```python # features/multi_project.py class MultiProjectCoordinator: async def coordinate_workflows(self, projects: List[Project]) -> dict: """Coordinate workflows across multiple related projects""" project_configs = [p.to_json() for p in projects] prompt = f""" Coordinate workflows for these related projects: Projects: {project_configs} Create a coordinated workflow plan that: 1. Handles dependencies between projects 2. Optimizes build order 3. Manages shared resources 4. Coordinates deployments """ coordination_plan = await self.ai_client.coordinate(prompt) return coordination_plan ``` ### Security Hardening ```python # features/security_hardener.py class WorkflowSecurityHardener: async def harden_workflow(self, workflow: WorkflowPlan) -> WorkflowPlan: """Use AI to identify and fix security issues""" prompt = f""" Analyze this workflow for security issues: Workflow: {workflow.to_json()} Identify: 1. Potential security vulnerabilities 2. Secrets exposure risks 3. Permission issues 4. Compliance concerns Provide a hardened version of the workflow. """ hardened_workflow = await self.ai_client.harden(prompt) return WorkflowPlan.from_json(hardened_workflow) ``` ## 💡 Example Output ### Generated GitHub Actions Workflow ```yaml # .github/workflows/ai-optimized-ci.yml name: AI-Optimized iOS CI/CD on: push: branches: [ main, develop ] pull_request: branches: [ main ] env: PROJECT_NAME: MyApp SCHEME_NAME: MyApp jobs: analyze: name: Static Analysis runs-on: macos-latest steps: - uses: actions/checkout@v4 - name: Xcode Analyze run: | xcodebuild analyze \ -workspace $PROJECT_NAME.xcworkspace \ -scheme $SCHEME_NAME \ -configuration Debug test: name: Run Tests runs-on: macos-latest needs: analyze steps: - uses: actions/checkout@v4 - name: Run Unit Tests run: | xcodebuild test \ -workspace $PROJECT_NAME.xcworkspace \ -scheme $SCHEME_NAME \ -destination 'platform=iOS Simulator,name=iPhone 15' build: name: Build App runs-on: macos-latest needs: test steps: - uses: actions/checkout@v4 - name: Build Release run: | xcodebuild build \ -workspace $PROJECT_NAME.xcworkspace \ -scheme $SCHEME_NAME \ -configuration Release ``` ## 🔧 Deployment ### Docker Setup ```dockerfile FROM python:3.9-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . EXPOSE 8000 CMD ["python", "-m", "planner.api"] ``` ### Kubernetes Deployment ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: ai-workflow-planner spec: replicas: 3 template: spec: containers: - name: planner image: ai-workflow-planner:latest ports: - containerPort: 8000 env: - name: OPENAI_API_KEY valueFrom: secretKeyRef: name: ai-secrets key: openai-api-key ``` ---
## 🧠 Start Planning with AI [**Documentation**](docs/) • [**Examples**](examples/) • [**API Reference**](docs/api.md) **AI-Powered Workflow Planning | Multi-Tool Integration | Real-Time Optimization**
--- Workflow planning with Ai | 622 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 622 insertions(+) create mode 100644 Workflow planning with Ai diff --git a/Workflow planning with Ai b/Workflow planning with Ai new file mode 100644 index 0000000..4967646 --- /dev/null +++ b/Workflow planning with Ai @@ -0,0 +1,622 @@ +# AI-Powered Workflow Planning System + +![AI Planning](https://img.shields.io/badge/AI--Powered-Workflow_Planning-FF6B6B.svg) +![Automation](https://img.shields.io/badge/Automation-✓-00C9FF.svg) +![Integration](https://img.shields.io/badge/Multi--Tool_Integration-✓-45B7D1.svg) + +An intelligent workflow planning system that leverages AI to automate, optimize, and manage complex development workflows across multiple tools and platforms. + +## 🧠 AI Planning Architecture + +```mermaid +graph TB + A[User Input] --> B[AI Planner] + B --> C[Workflow Generator] + B --> D[Dependency Resolver] + B --> E[Optimization Engine] + + C --> F[Tool Integrations] + D --> G[Dependency Graph] + E --> H[Performance Optimizer] + + F --> I[GitHub Actions] + F --> J[Jenkins] + F --> K[GitLab CI] + F --> L[Docker Registry] + F --> M[Xcode Build] + + H --> N[Optimized Workflow] + G --> N + + N --> O[Execution Engine] + O --> P[Monitoring & Feedback] + P --> B +``` + +## 🚀 Quick Start + +### Prerequisites +- Python 3.9+ +- OpenAI API key or local LLM +- Docker (optional) +- Git + +### Installation + +```bash +# Clone the repository +git clone https://github.com/your-org/ai-workflow-planner.git +cd ai-workflow-planner + +# Install dependencies +pip install -r requirements.txt + +# Setup environment +cp .env.example .env +# Add your API keys to .env + +# Start the planning system +python -m planner.main +``` + +### Basic Usage + +```bash +# Interactive planning session +python -m planner.cli --project-type "ios" --tools "xcode,github" + +# Batch planning from config +python -m planner.batch --config workflows/ios-ci.yaml + +# API server mode +python -m planner.api --host 0.0.0.0 --port 8000 +``` + +## 🏗️ Core Components + +### 1. AI Planning Engine + +```python +# planners/ai_planner.py +class AIPlanner: + def __init__(self, model="gpt-4", temperature=0.3): + self.model = model + self.temperature = temperature + self.workflow_memory = WorkflowMemory() + + async def plan_workflow(self, requirements: ProjectRequirements) -> WorkflowPlan: + """Generate optimized workflow using AI""" + prompt = self._build_planning_prompt(requirements) + response = await self._call_ai(prompt) + return self._parse_workflow_response(response) + + def _build_planning_prompt(self, requirements: ProjectRequirements) -> str: + return f""" + Project Requirements: + - Type: {requirements.project_type} + - Tools: {', '.join(requirements.tools)} + - Team Size: {requirements.team_size} + - Complexity: {requirements.complexity} + + Generate an optimized workflow that: + 1. Integrates all specified tools + 2. Minimizes execution time + 3. Ensures proper dependency ordering + 4. Includes error handling + 5. Provides monitoring and feedback + + Output in YAML format with the following structure: + {WORKFLOW_SCHEMA} + """ +``` + +### 2. Workflow Generator + +```python +# generators/workflow_generator.py +class WorkflowGenerator: + def __init__(self, planner: AIPlanner): + self.planner = planner + self.tool_integrations = ToolRegistry() + + async def generate_workflow(self, project_config: dict) -> GeneratedWorkflow: + """Generate complete workflow configuration""" + + # AI-powered planning phase + plan = await self.planner.plan_workflow(project_config) + + # Tool-specific configuration generation + workflow_configs = {} + for tool in plan.required_tools: + generator = self.tool_integrations.get_generator(tool) + workflow_configs[tool] = await generator.generate(plan) + + return GeneratedWorkflow( + plan=plan, + configurations=workflow_configs, + dependencies=plan.dependencies + ) +``` + +### 3. Dependency Resolver + +```python +# resolvers/dependency_resolver.py +class DependencyResolver: + def __init__(self): + self.dependency_graph = DependencyGraph() + + def resolve_dependencies(self, workflow_plan: WorkflowPlan) -> ExecutionOrder: + """Resolve and optimize execution order""" + graph = self._build_dependency_graph(workflow_plan) + execution_order = self._topological_sort(graph) + return self._optimize_parallel_execution(execution_order) + + def _build_dependency_graph(self, plan: WorkflowPlan) -> Dict[str, List[str]]: + """Build dependency graph from AI-generated plan""" + graph = {} + for step in plan.steps: + graph[step.name] = step.dependencies + return graph +``` + +## ⚙️ Configuration + +### AI Planning Configuration + +```yaml +# config/ai_planner.yaml +ai: + model: "gpt-4" + temperature: 0.3 + max_tokens: 4000 + retry_attempts: 3 + +planning: + optimization_goals: + - "execution_time" + - "resource_usage" + - "cost_efficiency" + - "reliability" + + constraints: + max_parallel_jobs: 10 + timeout_minutes: 60 + resource_limits: + memory: "8GB" + cpu: "4 cores" + +tool_integrations: + github_actions: + enabled: true + templates_path: "./templates/github" + + jenkins: + enabled: true + templates_path: "./templates/jenkins" + + docker: + enabled: true + registry: "registry.company.com" +``` + +### Project Templates + +```yaml +# templates/ios_project.yaml +project_type: "ios" +default_tools: + - "xcode" + - "github_actions" + - "docker" + - "slack" + +stages: + analysis: + tools: ["xcode_analyze", "swiftlint"] + parallel: false + + build: + tools: ["xcode_build", "carthage", "cocoapods"] + parallel: true + + test: + tools: ["xcode_test", "fastlane_scan"] + parallel: false + + distribution: + tools: ["fastlane", "testflight", "app_center"] + +optimization_rules: + - name: "cache_dependencies" + condition: "dependencies_changed == false" + action: "skip_dependency_installation" + + - name: "parallel_tests" + condition: "test_count > 100" + action: "split_tests_parallel" +``` + +## 🔧 Tool Integrations + +### GitHub Actions Integration + +```python +# integrations/github_actions.py +class GitHubActionsIntegration: + async def generate_workflow(self, plan: WorkflowPlan) -> str: + """Generate GitHub Actions workflow from AI plan""" + + workflow = { + "name": f"{plan.project_name} - AI Generated", + "on": self._get_trigger_events(plan), + "jobs": await self._generate_jobs(plan) + } + + return yaml.dump(workflow) + + async def _generate_jobs(self, plan: WorkflowPlan) -> Dict: + jobs = {} + for step in plan.execution_order: + jobs[step.name] = { + "runs-on": self._select_runner(step), + "steps": await self._generate_steps(step), + "needs": step.dependencies + } + return jobs +``` + +### Xcode Build Integration + +```python +# integrations/xcode_build.py +class XcodeBuildIntegration: + async def generate_build_scripts(self, plan: WorkflowPlan) -> List[str]: + """Generate optimized Xcode build scripts""" + + scripts = [] + for build_step in plan.get_steps_by_type("xcode_build"): + script = f""" + # AI-Generated Build Script + set -eo pipefail + + # Dependency checks + {self._generate_dependency_checks(build_step)} + + # Build configuration + {self._generate_build_commands(build_step)} + + # Post-build validation + {self._generate_validation_commands(build_step)} + """ + scripts.append(script) + + return scripts +``` + +## 🎯 Usage Examples + +### iOS Project Workflow Planning + +```python +# examples/ios_workflow.py +async def plan_ios_workflow(): + """Example of AI planning for iOS project""" + + requirements = ProjectRequirements( + project_type="ios", + tools=["xcode", "github_actions", "fastlane", "docker"], + team_size=5, + complexity="medium", + constraints={ + "build_time": "under_15_minutes", + "test_coverage": "minimum_80_percent", + "security_scanning": "required" + } + ) + + planner = AIPlanner() + workflow = await planner.plan_workflow(requirements) + + # Generate configurations + generator = WorkflowGenerator(planner) + full_workflow = await generator.generate_workflow(workflow) + + # Save generated workflows + await full_workflow.save("generated_workflows/") + + return full_workflow +``` + +### Multi-Tool Integration + +```python +# examples/multi_tool_integration.py +async def create_cross_platform_workflow(): + """Workflow spanning multiple tools and platforms""" + + requirements = ProjectRequirements( + project_type="cross_platform", + tools=["github_actions", "jenkins", "docker", "slack"], + integration_points={ + "github_actions": "ci_trigger", + "jenkins": "deployment", + "docker": "containerization", + "slack": "notifications" + } + ) + + planner = AIPlanner() + plan = await planner.plan_workflow(requirements) + + # Generate tool-specific configurations + workflows = {} + for tool in requirements.tools: + integration = ToolIntegrationFactory.create(tool) + workflows[tool] = await integration.generate_config(plan) + + return workflows +``` + +## 🔄 AI Feedback Loop + +### Learning from Execution + +```python +# learning/execution_analyzer.py +class ExecutionAnalyzer: + def __init__(self, planner: AIPlanner): + self.planner = planner + self.performance_metrics = PerformanceMetrics() + + async def analyze_execution(self, workflow_execution: WorkflowExecution): + """Analyze workflow execution and provide feedback to AI""" + + metrics = await self._collect_metrics(workflow_execution) + improvements = await self._identify_improvements(metrics) + + # Update AI planner with learnings + await self.planner.incorporate_feedback( + workflow_execution.plan, + metrics, + improvements + ) + + async def _identify_improvements(self, metrics: ExecutionMetrics) -> List[Improvement]: + """Use AI to identify workflow improvements""" + + prompt = f""" + Analyze these workflow execution metrics: + {metrics.to_json()} + + Identify 3-5 specific improvements to: + 1. Reduce execution time + 2. Improve reliability + 3. Optimize resource usage + + Provide concrete suggestions. + """ + + response = await self.planner._call_ai(prompt) + return self._parse_improvements(response) +``` + +## 📊 Monitoring & Analytics + +### Workflow Analytics + +```python +# analytics/workflow_analytics.py +class WorkflowAnalytics: + def __init__(self): + self.metrics_store = MetricsStore() + + async def track_workflow_performance(self, workflow_id: str): + """Track and analyze workflow performance""" + + metrics = await self.metrics_store.get_workflow_metrics(workflow_id) + + analysis = { + "execution_time": self._analyze_execution_time(metrics), + "resource_usage": self._analyze_resource_usage(metrics), + "reliability": self._analyze_reliability(metrics), + "bottlenecks": await self._identify_bottlenecks(metrics) + } + + return analysis + + async def generate_optimization_recommendations(self, analysis: dict): + """Generate AI-powered optimization recommendations""" + + prompt = f""" + Based on this workflow analysis: + {analysis} + + Generate specific, actionable recommendations to optimize this workflow. + Focus on: + - Parallelization opportunities + - Resource allocation + - Dependency optimization + - Cache utilization + """ + + return await self._call_ai(prompt) +``` + +## 🚀 Advanced Features + +### Dynamic Workflow Adaptation + +```python +# features/dynamic_adaptation.py +class DynamicWorkflowAdapter: + async def adapt_workflow(self, original_plan: WorkflowPlan, + changing_conditions: dict) -> WorkflowPlan: + """Dynamically adapt workflow based on changing conditions""" + + prompt = f""" + Original workflow plan: + {original_plan.to_json()} + + Changing conditions: + {changing_conditions} + + Adapt the workflow to handle these changes while maintaining: + - Functionality + - Performance + - Reliability + + Provide the adapted workflow plan. + """ + + adapted_plan = await self.planner._call_ai(prompt) + return WorkflowPlan.from_json(adapted_plan) +``` + +### Multi-Agent Planning + +```python +# features/multi_agent_planner.py +class MultiAgentPlanner: + def __init__(self): + self.specialized_agents = { + "architecture": ArchitectureAgent(), + "security": SecurityAgent(), + "performance": PerformanceAgent(), + "cost": CostOptimizationAgent() + } + + async def collaborative_planning(self, requirements: ProjectRequirements): + """Use multiple specialized AI agents for planning""" + + # Parallel planning by specialists + agent_tasks = [] + for agent_name, agent in self.specialized_agents.items(): + task = agent.generate_recommendations(requirements) + agent_tasks.append(task) + + recommendations = await asyncio.gather(*agent_tasks) + + # Consolidate recommendations + consolidated_plan = await self._consolidate_recommendations( + requirements, recommendations + ) + + return consolidated_plan +``` + +## 🔧 Deployment & Operations + +### Docker Deployment + +```dockerfile +# Dockerfile +FROM python:3.9-slim + +WORKDIR /app + +# Install dependencies +COPY requirements.txt . +RUN pip install -r requirements.txt + +# Copy application +COPY . . + +# Create volume for workflow storage +VOLUME /app/generated_workflows + +# Expose API port +EXPOSE 8000 + +# Health check +HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \ + CMD curl -f http://localhost:8000/health || exit 1 + +CMD ["python", "-m", "planner.api", "--host", "0.0.0.0", "--port", "8000"] +``` + +### Kubernetes Deployment + +```yaml +# k8s/deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ai-workflow-planner +spec: + replicas: 3 + selector: + matchLabels: + app: workflow-planner + template: + metadata: + labels: + app: workflow-planner + spec: + containers: + - name: planner + image: your-org/ai-workflow-planner:latest + ports: + - containerPort: 8000 + env: + - name: OPENAI_API_KEY + valueFrom: + secretKeyRef: + name: api-secrets + key: openai-api-key + resources: + requests: + memory: "1Gi" + cpu: "500m" + limits: + memory: "2Gi" + cpu: "1000m" +``` + +## 📈 Performance Optimization + +### Caching and Optimization + +```python +# optimization/planning_cache.py +class PlanningCache: + def __init__(self): + self.cache = {} + self.similarity_engine = SimilarityEngine() + + async def get_cached_plan(self, requirements: ProjectRequirements) -> Optional[WorkflowPlan]: + """Get cached plan for similar requirements""" + + similar_key = await self.similarity_engine.find_similar(requirements) + if similar_key in self.cache: + cached_plan = self.cache[similar_key] + if await self._is_plan_valid(cached_plan, requirements): + return cached_plan + + return None + + async def cache_plan(self, requirements: ProjectRequirements, plan: WorkflowPlan): + """Cache generated plan for future use""" + + cache_key = self._generate_cache_key(requirements) + self.cache[cache_key] = plan + + # Also cache similar variations + await self._cache_similar_variations(requirements, plan) +``` + +--- + +
+ +## 🧠 Start Planning with AI + +[**Quick Start Guide**](docs/quickstart.md) • +[**API Documentation**](docs/api.md) • +[**Examples Gallery**](docs/examples.md) + +**Intelligent Workflow Planning | Multi-Tool Integration | Continuous Optimization** + +*Transform your development workflows with AI-powered planning* + +
From cf7593a4c1222abe08172caa2736a62c8e62e923 Mon Sep 17 00:00:00 2001 From: Muhamad Sazwan Bin Ismail Date: Thu, 6 Nov 2025 23:16:40 +0800 Subject: [PATCH 4/4] Update README.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit # Node.js GKE Deployment Guide ![Google Cloud](https://img.shields.io/badge/Google_Cloud-4285F4?style=for-the-badge&logo=google-cloud&logoColor=white) ![Kubernetes](https://img.shields.io/badge/kubernetes-326CE5?style=for-the-badge&logo=kubernetes&logoColor=white) ![Node.js](https://img.shields.io/badge/Node.js-339933?style=for-the-badge&logo=nodedotjs&logoColor=white) ![Docker](https://img.shields.io/badge/Docker-2CA5E0?style=for-the-badge&logo=docker&logoColor=white) A complete guide for deploying production-ready Node.js applications to Google Kubernetes Engine (GKE) with best practices for security, scalability, and monitoring. ## 📋 Table of Contents - [Overview](#overview) - [Architecture](#architecture) - [Quick Start](#quick-start) - [Prerequisites](#prerequisites) - [Project Structure](#project-structure) - [Local Development](#local-development) - [GKE Deployment](#gke-deployment) - [Monitoring & Scaling](#monitoring--scaling) - [CI/CD Pipeline](#cicd-pipeline) - [Troubleshooting](#troubleshooting) - [Cleanup](#cleanup) - [Best Practices](#best-practices) ## 🎯 Overview This project demonstrates how to deploy a production-ready Node.js application to Google Kubernetes Engine with: - ✅ **Security Best Practices** (non-root users, security contexts, minimal images) - ✅ **Health Checks** (liveness, readiness, startup probes) - ✅ **Auto-scaling** (Horizontal Pod Autoscaler) - ✅ **Monitoring** (Stackdriver, resource metrics) - ✅ **CI/CD** (Cloud Build automation) - ✅ **High Availability** (multi-replica deployment) - ✅ **Zero-downtime Deployments** (rolling updates) ## 🏗 Architecture ```mermaid graph TB A[User] --> B[GCP Load Balancer] B --> C[Node.js Service] C --> D[Pod 1] C --> E[Pod 2] C --> F[Pod 3] D --> G[Node.js App] E --> G F --> G H[HPA] --> C I[Cloud Build] --> J[Container Registry] J --> C K[Cloud Monitoring] --> C ``` ## 🚀 Quick Start ### Prerequisites Checklist - [ ] Google Cloud Account with billing enabled - [ ] Google Cloud SDK installed - [ ] Docker installed - [ ] kubectl installed - [ ] Node.js 18+ installed ### One-Command Deployment ```bash # Clone the repository git clone https://github.com/your-username/nodejs-gke-app.git cd nodejs-gke-app # Run the deployment script (update PROJECT_ID first) ./deploy.sh ``` ## ⚙️ Prerequisites ### 1. Install Required Tools ```bash # Install Google Cloud SDK curl https://sdk.cloud.google.com | bash exec -l $SHELL # Install kubectl gcloud components install kubectl # Install Docker # On macOS: brew install --cask docker # On Ubuntu: sudo apt-get update && sudo apt-get install -y docker.io # Verify installations gcloud --version kubectl version --client docker --version ``` ### 2. Google Cloud Setup ```bash # Authenticate with GCP gcloud auth login # Set your project gcloud config set project YOUR_PROJECT_ID # Enable required APIs gcloud services enable \ container.googleapis.com \ containerregistry.googleapis.com \ cloudbuild.googleapis.com \ compute.googleapis.com ``` ## 📁 Project Structure ``` nodejs-gke-app/ ├── src/ # Application source code │ ├── app.js # Main application file │ ├── routes/ # API routes │ │ ├── api.js │ │ └── health.js │ └── middleware/ # Express middleware │ └── security.js ├── tests/ # Test files │ └── app.test.js ├── k8s/ # Kubernetes manifests │ ├── namespace.yaml │ ├── deployment.yaml │ ├── service.yaml │ ├── hpa.yaml │ └── configmap.yaml ├── Dockerfile # Multi-stage Dockerfile ├── .dockerignore ├── cloudbuild.yaml # CI/CD configuration ├── deploy.sh # Deployment script ├── cleanup.sh # Cleanup script └── package.json ``` ## 💻 Local Development ### Run Application Locally ```bash # Install dependencies npm install # Start development server npm run dev # Run tests npm test # Build Docker image locally npm run docker:build # Test Docker image locally npm run docker:run ``` ### Test Health Endpoints ```bash curl http://localhost:8080/health curl http://localhost:8080/health/ready curl http://localhost:8080/health/live ``` ## ☸️ GKE Deployment ### Step 1: Build and Push Docker Image ```bash # Build the image docker build -t nodejs-gke-app . # Tag for GCR docker tag nodejs-gke-app gcr.io/YOUR_PROJECT_ID/nodejs-gke-app:latest # Push to Google Container Registry docker push gcr.io/YOUR_PROJECT_ID/nodejs-gke-app:latest ``` ### Step 2: Create GKE Cluster ```bash # Create production cluster gcloud container clusters create nodejs-production-cluster \ --zone=us-central1-a \ --num-nodes=2 \ --machine-type=e2-medium \ --enable-autoscaling \ --min-nodes=1 \ --max-nodes=5 \ --enable-ip-alias # Get cluster credentials gcloud container clusters get-credentials nodejs-production-cluster \ --zone us-central1-a ``` ### Step 3: Deploy Application ```bash # Create namespace kubectl apply -f k8s/namespace.yaml # Deploy application kubectl apply -f k8s/configmap.yaml kubectl apply -f k8s/deployment.yaml kubectl apply -f k8s/service.yaml kubectl apply -f k8s/hpa.yaml # Wait for deployment kubectl rollout status deployment/nodejs-app -n nodejs-production # Get external IP kubectl get service nodejs-app-service -n nodejs-production ``` ### Step 4: Verify Deployment ```bash # Check all resources kubectl get all -n nodejs-production # View pods kubectl get pods -n nodejs-production # Check service details kubectl describe service nodejs-app-service -n nodejs-production # Test the application EXTERNAL_IP=$(kubectl get service nodejs-app-service -n nodejs-production -o jsonpath='{.status.loadBalancer.ingress[0].ip}') curl http://$EXTERNAL_IP curl http://$EXTERNAL_IP/health ``` ## 📊 Monitoring & Scaling ### Application Monitoring ```bash # View application logs kubectl logs -n nodejs-production -l app=nodejs-app --tail=50 # Stream logs in real-time kubectl logs -n nodejs-production -l app=nodejs-app -f # View resource usage kubectl top pods -n nodejs-production kubectl top nodes # Check HPA status kubectl get hpa -n nodejs-production ``` ### Auto-scaling The application includes Horizontal Pod Autoscaler configured to: - Scale based on CPU (70%) and memory (80%) utilization - Minimum 2 pods, maximum 10 pods - Automatic scaling based on load ### Manual Scaling ```bash # Scale manually kubectl scale deployment nodejs-app --replicas=5 -n nodejs-production # Check current replicas kubectl get deployment nodejs-app -n nodejs-production ``` ## 🔄 CI/CD Pipeline ### Automated Deployment with Cloud Build The project includes `cloudbuild.yaml` for automated CI/CD: ```yaml # Build, test, and deploy automatically on git push steps: - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/nodejs-gke-app:$COMMIT_SHA', '.'] - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/nodejs-gke-app:$COMMIT_SHA'] - name: 'gcr.io/cloud-builders/gke-deploy' args: ['run', '--filename=k8s/', '--image=gcr.io/$PROJECT_ID/nodejs-gke-app:$COMMIT_SHA'] ``` ### Trigger Cloud Build ```bash # Submit build manually gcloud builds submit --config cloudbuild.yaml ``` ## 🐛 Troubleshooting ### Common Issues 1. **Image Pull Errors** ```bash # Check image exists in GCR gcloud container images list-tags gcr.io/YOUR_PROJECT_ID/nodejs-gke-app # Verify GCR permissions gcloud projects get-iam-policy YOUR_PROJECT_ID ``` 2. **Pod CrashLoopBackOff** ```bash # Check pod logs kubectl logs -n nodejs-production # Describe pod for details kubectl describe pod -n nodejs-production ``` 3. **Service Not Accessible** ```bash # Check service endpoints kubectl get endpoints nodejs-app-service -n nodejs-production # Check firewall rules gcloud compute firewall-rules list ``` ### Debugging Commands ```bash # Get detailed pod information kubectl describe pod -n nodejs-production -l app=nodejs-app # Check cluster events kubectl get events -n nodejs-production --sort-by=.metadata.creationTimestamp # Access pod shell kubectl exec -n nodejs-production -it -- sh # Check network connectivity kubectl run -it --rm debug --image=busybox -n nodejs-production -- sh ``` ## 🧹 Cleanup ### Remove All Resources ```bash # Run cleanup script ./cleanup.sh # Or manually remove resources kubectl delete -f k8s/ --ignore-not-found=true gcloud container clusters delete nodejs-production-cluster --zone=us-central1-a --quiet gcloud container images delete gcr.io/YOUR_PROJECT_ID/nodejs-gke-app:latest --quiet ``` ## 📝 Best Practices Implemented ### Security - ✅ Non-root user in containers - ✅ Read-only root filesystem - ✅ Security contexts in pods - ✅ Minimal base images (Alpine Linux) - ✅ Regular security updates ### Reliability - ✅ Multiple replicas for high availability - ✅ Liveness, readiness, and startup probes - ✅ Resource limits and requests - ✅ Pod Disruption Budget - ✅ Rolling update strategy ### Performance - ✅ Horizontal Pod Autoscaler - ✅ Proper resource sizing - ✅ Compression middleware - ✅ Rate limiting - ✅ Connection pooling ### Monitoring - ✅ Health check endpoints - ✅ Structured logging - ✅ Resource metrics - ✅ Prometheus metrics ready - ✅ Cloud Monitoring integration ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add some amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## 🙏 Acknowledgments - Google Cloud Platform documentation - Kubernetes community - Node.js best practices community --- **Note**: Remember to replace `YOUR_PROJECT_ID` with your actual Google Cloud Project ID in all commands and configuration files. For support, please open an issue in the GitHub repository or contact the maintainers. # Node.js GKE Deployment Guide ![Google Cloud](https://img.shields.io/badge/Google_Cloud-4285F4?style=for-the-badge&logo=google-cloud&logoColor=white) ![Kubernetes](https://img.shields.io/badge/kubernetes-326CE5?style=for-the-badge&logo=kubernetes&logoColor=white) ![Node.js](https://img.shields.io/badge/Node.js-339933?style=for-the-badge&logo=nodedotjs&logoColor=white) ![Docker](https://img.shields.io/badge/Docker-2CA5E0?style=for-the-badge&logo=docker&logoColor=white) ![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=for-the-badge) A complete guide for deploying production-ready Node.js applications to Google Kubernetes Engine (GKE) with best practices for security, scalability, and monitoring. ## 📋 Table of Contents - [Overview](#overview) - [Architecture](#architecture) - [Quick Start](#quick-start) - [Prerequisites](#prerequisites) - [Project Structure](#project-structure) - [Local Development](#local-development) - [GKE Deployment](#gke-deployment) - [Monitoring & Scaling](#monitoring--scaling) - [CI/CD Pipeline](#cicd-pipeline) - [Troubleshooting](#troubleshooting) - [Cleanup](#cleanup) - [Best Practices](#best-practices) - [License](#license) ## 🎯 Overview This project demonstrates how to deploy a production-ready Node.js application to Google Kubernetes Engine with: - ✅ **Security Best Practices** (non-root users, security contexts, minimal images) - ✅ **Health Checks** (liveness, readiness, startup probes) - ✅ **Auto-scaling** (Horizontal Pod Autoscaler) - ✅ **Monitoring** (Stackdriver, resource metrics) - ✅ **CI/CD** (Cloud Build automation) - ✅ **High Availability** (multi-replica deployment) - ✅ **Zero-downtime Deployments** (rolling updates) ## 🏗 Architecture ```mermaid graph TB A[User] --> B[GCP Load Balancer] B --> C[Node.js Service] C --> D[Pod 1] C --> E[Pod 2] C --> F[Pod 3] D --> G[Node.js App] E --> G F --> G H[HPA] --> C I[Cloud Build] --> J[Container Registry] J --> C K[Cloud Monitoring] --> C ``` ## 🚀 Quick Start ### Prerequisites Checklist - [ ] Google Cloud Account with billing enabled - [ ] Google Cloud SDK installed - [ ] Docker installed - [ ] kubectl installed - [ ] Node.js 18+ installed ### One-Command Deployment ```bash # Clone the repository git clone https://github.com/your-username/nodejs-gke-app.git cd nodejs-gke-app # Run the deployment script (update PROJECT_ID first) ./deploy.sh ``` ## ⚙️ Prerequisites ### 1. Install Required Tools ```bash # Install Google Cloud SDK curl https://sdk.cloud.google.com | bash exec -l $SHELL # Install kubectl gcloud components install kubectl # Install Docker # On macOS: brew install --cask docker # On Ubuntu: sudo apt-get update && sudo apt-get install -y docker.io # Verify installations gcloud --version kubectl version --client docker --version ``` ### 2. Google Cloud Setup ```bash # Authenticate with GCP gcloud auth login # Set your project gcloud config set project YOUR_PROJECT_ID # Enable required APIs gcloud services enable \ container.googleapis.com \ containerregistry.googleapis.com \ cloudbuild.googleapis.com \ compute.googleapis.com ``` ## 📁 Project Structure ``` nodejs-gke-app/ ├── src/ # Application source code │ ├── app.js # Main application file │ ├── routes/ # API routes │ │ ├── api.js │ │ └── health.js │ └── middleware/ # Express middleware │ └── security.js ├── tests/ # Test files │ └── app.test.js ├── k8s/ # Kubernetes manifests │ ├── namespace.yaml │ ├── deployment.yaml │ ├── service.yaml │ ├── hpa.yaml │ └── configmap.yaml ├── Dockerfile # Multi-stage Dockerfile ├── .dockerignore ├── cloudbuild.yaml # CI/CD configuration ├── deploy.sh # Deployment script ├── cleanup.sh # Cleanup script ├── LICENSE # MIT License file └── package.json ``` ## 💻 Local Development ### Run Application Locally ```bash # Install dependencies npm install # Start development server npm run dev # Run tests npm test # Build Docker image locally npm run docker:build # Test Docker image locally npm run docker:run ``` ### Test Health Endpoints ```bash curl http://localhost:8080/health curl http://localhost:8080/health/ready curl http://localhost:8080/health/live ``` ## ☸️ GKE Deployment ### Step 1: Build and Push Docker Image ```bash # Build the image docker build -t nodejs-gke-app . # Tag for GCR docker tag nodejs-gke-app gcr.io/YOUR_PROJECT_ID/nodejs-gke-app:latest # Push to Google Container Registry docker push gcr.io/YOUR_PROJECT_ID/nodejs-gke-app:latest ``` ### Step 2: Create GKE Cluster ```bash # Create production cluster gcloud container clusters create nodejs-production-cluster \ --zone=us-central1-a \ --num-nodes=2 \ --machine-type=e2-medium \ --enable-autoscaling \ --min-nodes=1 \ --max-nodes=5 \ --enable-ip-alias # Get cluster credentials gcloud container clusters get-credentials nodejs-production-cluster \ --zone us-central1-a ``` ### Step 3: Deploy Application ```bash # Create namespace kubectl apply -f k8s/namespace.yaml # Deploy application kubectl apply -f k8s/configmap.yaml kubectl apply -f k8s/deployment.yaml kubectl apply -f k8s/service.yaml kubectl apply -f k8s/hpa.yaml # Wait for deployment kubectl rollout status deployment/nodejs-app -n nodejs-production # Get external IP kubectl get service nodejs-app-service -n nodejs-production ``` ### Step 4: Verify Deployment ```bash # Check all resources kubectl get all -n nodejs-production # View pods kubectl get pods -n nodejs-production # Check service details kubectl describe service nodejs-app-service -n nodejs-production # Test the application EXTERNAL_IP=$(kubectl get service nodejs-app-service -n nodejs-production -o jsonpath='{.status.loadBalancer.ingress[0].ip}') curl http://$EXTERNAL_IP curl http://$EXTERNAL_IP/health ``` ## 📊 Monitoring & Scaling ### Application Monitoring ```bash # View application logs kubectl logs -n nodejs-production -l app=nodejs-app --tail=50 # Stream logs in real-time kubectl logs -n nodejs-production -l app=nodejs-app -f # View resource usage kubectl top pods -n nodejs-production kubectl top nodes # Check HPA status kubectl get hpa -n nodejs-production ``` ### Auto-scaling The application includes Horizontal Pod Autoscaler configured to: - Scale based on CPU (70%) and memory (80%) utilization - Minimum 2 pods, maximum 10 pods - Automatic scaling based on load ### Manual Scaling ```bash # Scale manually kubectl scale deployment nodejs-app --replicas=5 -n nodejs-production # Check current replicas kubectl get deployment nodejs-app -n nodejs-production ``` ## 🔄 CI/CD Pipeline ### Automated Deployment with Cloud Build The project includes `cloudbuild.yaml` for automated CI/CD: ```yaml # Build, test, and deploy automatically on git push steps: - name: 'gcr.io/cloud-builders/docker' args: ['build', '-t', 'gcr.io/$PROJECT_ID/nodejs-gke-app:$COMMIT_SHA', '.'] - name: 'gcr.io/cloud-builders/docker' args: ['push', 'gcr.io/$PROJECT_ID/nodejs-gke-app:$COMMIT_SHA'] - name: 'gcr.io/cloud-builders/gke-deploy' args: ['run', '--filename=k8s/', '--image=gcr.io/$PROJECT_ID/nodejs-gke-app:$COMMIT_SHA'] ``` ### Trigger Cloud Build ```bash # Submit build manually gcloud builds submit --config cloudbuild.yaml ``` ## 🐛 Troubleshooting ### Common Issues 1. **Image Pull Errors** ```bash # Check image exists in GCR gcloud container images list-tags gcr.io/YOUR_PROJECT_ID/nodejs-gke-app # Verify GCR permissions gcloud projects get-iam-policy YOUR_PROJECT_ID ``` 2. **Pod CrashLoopBackOff** ```bash # Check pod logs kubectl logs -n nodejs-production # Describe pod for details kubectl describe pod -n nodejs-production ``` 3. **Service Not Accessible** ```bash # Check service endpoints kubectl get endpoints nodejs-app-service -n nodejs-production # Check firewall rules gcloud compute firewall-rules list ``` ### Debugging Commands ```bash # Get detailed pod information kubectl describe pod -n nodejs-production -l app=nodejs-app # Check cluster events kubectl get events -n nodejs-production --sort-by=.metadata.creationTimestamp # Access pod shell kubectl exec -n nodejs-production -it -- sh # Check network connectivity kubectl run -it --rm debug --image=busybox -n nodejs-production -- sh ``` ## 🧹 Cleanup ### Remove All Resources ```bash # Run cleanup script ./cleanup.sh # Or manually remove resources kubectl delete -f k8s/ --ignore-not-found=true gcloud container clusters delete nodejs-production-cluster --zone=us-central1-a --quiet gcloud container images delete gcr.io/YOUR_PROJECT_ID/nodejs-gke-app:latest --quiet ``` ## 📝 Best Practices Implemented ### Security - ✅ Non-root user in containers - ✅ Read-only root filesystem - ✅ Security contexts in pods - ✅ Minimal base images (Alpine Linux) - ✅ Regular security updates ### Reliability - ✅ Multiple replicas for high availability - ✅ Liveness, readiness, and startup probes - ✅ Resource limits and requests - ✅ Pod Disruption Budget - ✅ Rolling update strategy ### Performance - ✅ Horizontal Pod Autoscaler - ✅ Proper resource sizing - ✅ Compression middleware - ✅ Rate limiting - ✅ Connection pooling ### Monitoring - ✅ Health check endpoints - ✅ Structured logging - ✅ Resource metrics - ✅ Prometheus metrics ready - ✅ Cloud Monitoring integration ## 📄 License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ``` MIT License Copyright (c) 2024 Node.js GKE Deployment Guide Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ``` ## 🤝 Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add some amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## 🙏 Acknowledgments - Google Cloud Platform documentation - Kubernetes community - Node.js best practices community --- **Note**: Remember to replace `YOUR_PROJECT_ID` with your actual Google Cloud Project ID in all commands and configuration files. For support, please open an issue in the GitHub repository or contact the maintainers. --- README.md | 128 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 127 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index a46ae92..0d076dd 100644 --- a/README.md +++ b/README.md @@ -1 +1,127 @@ -# .github \ No newline at end of file +Based on your input, it seems you want to create a `README.md` file with your name **Muhamad Sazwan Bin Ismail**. Here's a basic template to get you started: + +```markdown +# Project Title + +Brief description of your project. + +## Author +- **Name:** Muhamad Sazwan Bin Ismail +- **Contact:** [Your Email/Professional Link] +- **GitHub:** [Your GitHub Profile Link] + +## Overview +Add a short project overview here. + +## Features +- Feature 1 +- Feature 2 +- Feature 3 + +## Installation +```bash +Steps to install your project +``` + +## Usage +Examples of how to use your project. + +## License +Specify your project's license (e.g., MIT, Apache 2.0). +``` + +### Steps to use this template: +1. Create a file named `README.md` in your project folder. +2. Paste the template above. +3. Replace placeholders (like `[Your Email/Professional Link]`) with your details. +4. Customize sections (Features, Installation, etc.) to match your project. + +Let me know if you need help with: +- GitHub/GitLab integration +- Adding badges (build status, version, etc.) +- Formatting tips for technical documentation! 🚀 +Here's an updated, professional `README.md` template with your name included and common sections for a software project: + +```markdown +# Project Name 🚀 + +[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) +[![Build Status](https://img.shields.io/travis/user/project/master.svg)](https://travis-ci.org/user/project) +[![Version](https://img.shields.io/badge/version-1.0.0-green.svg)](https://semver.org) + +A brief description of your project goes here. Explain what it does, why it's useful, and key highlights. + +## Author 👨‍💻 +**Muhamad Sazwan Bin Ismail** +[📧 Email](mailto:your.email@example.com) | [💼 LinkedIn](https://linkedin.com/in/yourprofile) | [🐱 GitHub](https://github.com/yourusername) + +## Features ✨ +- Feature 1 with description +- Feature 2 with description +- Feature 3 with description +- [Add more as needed] + +## Installation ⚙️ +```bash +# Clone repository +git clone https://github.com/yourusername/projectname.git + +# Install dependencies +npm install # or pip install -r requirements.txt + +# Run project +npm start # or python main.py +``` + +## Usage 🖥️ +```python +# Example code snippet +import your_project + +result = your_project.run_example() +print(result) +``` + +## Project Structure 📂 +``` +project-root/ +├── src/ # Source files +├── tests/ # Test cases +├── docs/ # Documentation +├── .gitignore # Ignore files +├── LICENSE # License file +└── requirements.txt # Dependencies +``` + +## Contributing 🤝 +1. Fork the project +2. Create your feature branch (`git checkout -b feature/AmazingFeature`) +3. Commit changes (`git commit -m 'Add amazing feature'`) +4. Push to branch (`git push origin feature/AmazingFeature`) +5. Open a Pull Request + +## License 📄 +Distributed under the MIT License. See `LICENSE` for more information. +``` + +### Key improvements made: +1. Added professional header with badges (customize URLs later) +2. Created dedicated author section with contact links +3. Added emoji visual cues for better scanning +4. Included project structure visualization +5. Added clear contributing guidelines +6. Prettified code block formatting +7. Added license information + +### Next steps for you: +1. Replace `[Placeholders]` with your actual information +2. Add real badge URLs for CI/license tracking +3. Customize installation commands for your tech stack +4. Add actual features and usage examples +5. Update project structure to match your repo + +Would you like me to: +- Add specific sections for API documentation? +- Include a screenshot guide? +- Create a version for data science projects? +- Generate a minimal version for smaller projects?