diff --git a/README.md b/README.md index a46ae92..0d076dd 100644 --- a/README.md +++ b/README.md @@ -1 +1,127 @@ -# .github \ No newline at end of file +Based on your input, it seems you want to create a `README.md` file with your name **Muhamad Sazwan Bin Ismail**. Here's a basic template to get you started: + +```markdown +# Project Title + +Brief description of your project. + +## Author +- **Name:** Muhamad Sazwan Bin Ismail +- **Contact:** [Your Email/Professional Link] +- **GitHub:** [Your GitHub Profile Link] + +## Overview +Add a short project overview here. + +## Features +- Feature 1 +- Feature 2 +- Feature 3 + +## Installation +```bash +Steps to install your project +``` + +## Usage +Examples of how to use your project. + +## License +Specify your project's license (e.g., MIT, Apache 2.0). +``` + +### Steps to use this template: +1. Create a file named `README.md` in your project folder. +2. Paste the template above. +3. Replace placeholders (like `[Your Email/Professional Link]`) with your details. +4. Customize sections (Features, Installation, etc.) to match your project. + +Let me know if you need help with: +- GitHub/GitLab integration +- Adding badges (build status, version, etc.) +- Formatting tips for technical documentation! πŸš€ +Here's an updated, professional `README.md` template with your name included and common sections for a software project: + +```markdown +# Project Name πŸš€ + +[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT) +[![Build Status](https://img.shields.io/travis/user/project/master.svg)](https://travis-ci.org/user/project) +[![Version](https://img.shields.io/badge/version-1.0.0-green.svg)](https://semver.org) + +A brief description of your project goes here. Explain what it does, why it's useful, and key highlights. + +## Author πŸ‘¨β€πŸ’» +**Muhamad Sazwan Bin Ismail** +[πŸ“§ Email](mailto:your.email@example.com) | [πŸ’Ό LinkedIn](https://linkedin.com/in/yourprofile) | [🐱 GitHub](https://github.com/yourusername) + +## Features ✨ +- Feature 1 with description +- Feature 2 with description +- Feature 3 with description +- [Add more as needed] + +## Installation βš™οΈ +```bash +# Clone repository +git clone https://github.com/yourusername/projectname.git + +# Install dependencies +npm install # or pip install -r requirements.txt + +# Run project +npm start # or python main.py +``` + +## Usage πŸ–₯️ +```python +# Example code snippet +import your_project + +result = your_project.run_example() +print(result) +``` + +## Project Structure πŸ“‚ +``` +project-root/ +β”œβ”€β”€ src/ # Source files +β”œβ”€β”€ tests/ # Test cases +β”œβ”€β”€ docs/ # Documentation +β”œβ”€β”€ .gitignore # Ignore files +β”œβ”€β”€ LICENSE # License file +└── requirements.txt # Dependencies +``` + +## Contributing 🀝 +1. Fork the project +2. Create your feature branch (`git checkout -b feature/AmazingFeature`) +3. Commit changes (`git commit -m 'Add amazing feature'`) +4. Push to branch (`git push origin feature/AmazingFeature`) +5. Open a Pull Request + +## License πŸ“„ +Distributed under the MIT License. See `LICENSE` for more information. +``` + +### Key improvements made: +1. Added professional header with badges (customize URLs later) +2. Created dedicated author section with contact links +3. Added emoji visual cues for better scanning +4. Included project structure visualization +5. Added clear contributing guidelines +6. Prettified code block formatting +7. Added license information + +### Next steps for you: +1. Replace `[Placeholders]` with your actual information +2. Add real badge URLs for CI/license tracking +3. Customize installation commands for your tech stack +4. Add actual features and usage examples +5. Update project structure to match your repo + +Would you like me to: +- Add specific sections for API documentation? +- Include a screenshot guide? +- Create a version for data science projects? +- Generate a minimal version for smaller projects? diff --git a/README.md vvip b/README.md vvip new file mode 100644 index 0000000..f855093 --- /dev/null +++ b/README.md vvip @@ -0,0 +1,1046 @@ +````markdown name=README.md +# storage.cloud β€” Google Cloud Storage docs & quickstart (v3) + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This repository is intended to help developers and operators quickly perform common tasks: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and load data into BigQuery. + +Status: v3 β€” streamlined layout, clearer quickstart, and practical patterns for small-to-large datasets. + +Table of contents +- About +- Repository layout +- Quickstart (auth, common commands) +- Sharing & Signed URLs +- Merging strategies (small β†’ large scale) +- CORS & browser uploads +- Examples included +- Security & best practices +- Contributing +- License + +About +storage.cloud collects concise guidance and minimally opinionated examples so you can get things done quickly. The focus is on copy‑pasteable commands and small scripts that are safe to adapt for development and production. + +Repository layout +- index.html β€” simple landing page for the site +- docs/ + - quickstart.md β€” auth, gsutil/gcloud/bq basics, signed-URL notes + - merge-data.md β€” concise merging strategies (English + Malay focused notes) + - signed-urls.md β€” signed URL reference & tips +- examples/ + - merge_csv_gcs.py β€” Python script to merge CSVs in a GCS prefix +- cors.json β€” example CORS configuration +- LICENSE β€” suggested MIT license + +Quickstart β€” minimum steps +1. Install Google Cloud SDK (gcloud, gsutil) and optionally Python client libraries: + pip install google-cloud-storage + +2. Authenticate (developer / local): +```bash +gcloud auth application-default login +``` + +3. (Server / app) Use a service account: +```bash +gcloud iam service-accounts create my-sa --display-name="My SA" + +gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" +``` +(Optional) download a key for local testing: +```bash +gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" +``` + +Common commands +- List buckets: +```bash +gsutil ls gs:// +``` +- List objects: +```bash +gsutil ls gs://BUCKET/PREFIX/ +``` +- Download/upload: +```bash +gsutil cp gs://BUCKET/OBJECT ./local-file +gsutil cp ./local-file gs://BUCKET/OBJECT +``` +- Make object public (use sparingly): +```bash +gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT +``` +- Get an access token for HTTP requests: +```bash +gcloud auth print-access-token +# use it as: Authorization: Bearer +``` + +Sharing & Signed URLs +- Create a signed URL (gsutil; using a service account key): +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` +Notes: +- V4 signed URLs maximum expiry: 7 days. +- Anyone with the URL can access the object until it expires β€” treat like a secret. + +Merging strategies (choose by dataset size) +- Small / moderate (fits memory): stream with gsutil +```bash +gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv +``` +- In-place compose (no download) β€” up to 32 objects per compose: +```bash +gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +``` +For >32 objects: perform tree-compose (group into temporary composites and compose them further). + +- Large-scale / analytics: load directly into BigQuery (no pre-merge) +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` + +- Custom transformations / header handling: use the included Python script examples/merge_csv_gcs.py which: + - Lists CSVs by prefix + - Downloads each file, writes header only once + - Uploads the combined CSV back to GCS + - (For very large files, prefer streaming or a Dataflow/Dataproc pipeline.) + +CORS & browser uploads +- Example CORS (cors.json included): +```json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` +Apply: +```bash +gsutil cors set cors.json gs://BUCKET +``` + +Examples included +- examples/merge_csv_gcs.py β€” merge CSVs and de-duplicate headers. +- cors.json β€” CORS policy example. +See docs/merge-data.md and docs/quickstart.md for usage and variations. + +Security & best practices +- Use service accounts with least privilege (principle of least privilege). +- Prefer uniform bucket-level access + IAM roles over ACLs where possible. +- Avoid embedding long-lived keys in client-side code; use signed URLs for browser access. +- Monitor with Cloud Audit Logs for object access and signed-URL usage. +- Consider using CMEK (customer-managed encryption keys) if required by policy. + +Contributing +- Suggest fixes or send PRs. Keep examples minimal and documented. +- When adding scripts, include: + - Purpose and usage examples + - Required permissions and dependencies + - Safety notes (e.g., memory/time limits) + +License +- MIT by default (see LICENSE). Replace with your preferred license if needed. + +Need something added or tailored? +- I can generate: + - A shell helper for tree-compose (>32 objects) + - A Dataflow (Apache Beam) starter pipeline for very large merges + - Localized site content (Malay/other) + - A small CI workflow to lint and test examples + +If you want a specific file or script produced now, tell me the filename and target (bash/python/README variant), and I’ll create it. +```` +````markdown name=README.md +# storage.cloud β€” Google Cloud Storage docs & quickstart (v3) + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This repository is intended to help developers and operators quickly perform common tasks: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and load data into BigQuery. + +Status: v3 β€” streamlined layout, clearer quickstart, and practical patterns for small-to-large datasets. + +Table of contents +- About +- Repository layout +- Quickstart (auth, common commands) +- Sharing & Signed URLs +- Merging strategies (small β†’ large scale) +- CORS & browser uploads +- Examples included +- Security & best practices +- Contributing +- License + +About +storage.cloud collects concise guidance and minimally opinionated examples so you can get things done quickly. The focus is on copy‑pasteable commands and small scripts that are safe to adapt for development and production. + +Repository layout +- index.html β€” simple landing page for the site +- docs/ + - quickstart.md β€” auth, gsutil/gcloud/bq basics, signed-URL notes + - merge-data.md β€” concise merging strategies (English + Malay focused notes) + - signed-urls.md β€” signed URL reference & tips +- examples/ + - merge_csv_gcs.py β€” Python script to merge CSVs in a GCS prefix +- cors.json β€” example CORS configuration +- LICENSE β€” suggested MIT license + +Quickstart β€” minimum steps +1. Install Google Cloud SDK (gcloud, gsutil) and optionally Python client libraries: + pip install google-cloud-storage + +2. Authenticate (developer / local): +```bash +gcloud auth application-default login +``` + +3. (Server / app) Use a service account: +```bash +gcloud iam service-accounts create my-sa --display-name="My SA" + +gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" +``` +(Optional) download a key for local testing: +```bash +gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" +``` + +Common commands +- List buckets: +```bash +gsutil ls gs:// +``` +- List objects: +```bash +gsutil ls gs://BUCKET/PREFIX/ +``` +- Download/upload: +```bash +gsutil cp gs://BUCKET/OBJECT ./local-file +gsutil cp ./local-file gs://BUCKET/OBJECT +``` +- Make object public (use sparingly): +```bash +gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT +``` +- Get an access token for HTTP requests: +```bash +gcloud auth print-access-token +# use it as: Authorization: Bearer +``` + +Sharing & Signed URLs +- Create a signed URL (gsutil; using a service account key): +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` +Notes: +- V4 signed URLs maximum expiry: 7 days. +- Anyone with the URL can access the object until it expires β€” treat like a secret. + +Merging strategies (choose by dataset size) +- Small / moderate (fits memory): stream with gsutil +```bash +gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv +``` +- In-place compose (no download) β€” up to 32 objects per compose: +```bash +gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +``` +For >32 objects: perform tree-compose (group into temporary composites and compose them further). + +- Large-scale / analytics: load directly into BigQuery (no pre-merge) +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` + +- Custom transformations / header handling: use the included Python script examples/merge_csv_gcs.py which: + - Lists CSVs by prefix + - Downloads each file, writes header only once + - Uploads the combined CSV back to GCS + - (For very large files, prefer streaming or a Dataflow/Dataproc pipeline.) + +CORS & browser uploads +- Example CORS (cors.json included): +```json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` +Apply: +```bash +gsutil cors set cors.json gs://BUCKET +``` + +Examples included +- examples/merge_csv_gcs.py β€” merge CSVs and de-duplicate headers. +- cors.json β€” CORS policy example. +See docs/merge-data.md and docs/quickstart.md for usage and variations. + +Security & best practices +- Use service accounts with least privilege (principle of least privilege). +- Prefer uniform bucket-level access + IAM roles over ACLs where possible. +- Avoid embedding long-lived keys in client-side code; use signed URLs for browser access. +- Monitor with Cloud Audit Logs for object access and signed-URL usage. +- Consider using CMEK (customer-managed encryption keys) if required by policy. + +Contributing +- Suggest fixes or send PRs. Keep examples minimal and documented. +- When adding scripts, include: + - Purpose and usage examples + - Required permissions and dependencies + - Safety notes (e.g., memory/time limits) + +License +- MIT by default (see LICENSE). Replace with your preferred license if needed. + +Need something added or tailored? +- I can generate: + - A shell helper for tree-compose (>32 objects) + - A Dataflow (Apache Beam) starter pipeline for very large merges + - Localized site content (Malay/other) + - A small CI workflow to lint and test examples + +If you want a specific file or script produced now, tell me the filename and target (bash/python/README variant), and I’ll create it. +```` + +```html name=index.html + + + + + storage.cloud β€” Google Cloud Storage quickstart + + + + +
+

storage.cloud

+

Quick reference and starter content for Google Cloud Storage (GCS)

+ +
+ +
+
+

Overview

+

storage.cloud collects short guides, command snippets, and small reusable scripts for common GCS workflows:

+
    +
  • Authenticated access (gcloud, service accounts, access tokens)
  • +
  • Signed URLs for temporary public access
  • +
  • CORS configuration for browser-based apps
  • +
  • Merging objects (CSV/text) and loading to BigQuery
  • +
+
+ +
+

Helpful links

+ +
+ +
+

Get started

+

Open docs/quickstart.md for commands and brief examples you can run locally. For combining files, see docs/merge-data.md and examples/merge_csv_gcs.py.

+
+
+ + +``` + +````markdown name=docs/quickstart.md +# Quickstart β€” Google Cloud Storage (GCS) + +This page collects the most-used commands and short examples for getting started with GCS. + +Prerequisites +- Install Google Cloud SDK (gcloud, gsutil) +- For programmatic examples, install the relevant client libraries (Python: google-cloud-storage) + +Authentication +- Local developer (Application Default Credentials - ADC): +```bash +gcloud auth application-default login +``` + +- Service account (recommended for server apps): +1. Create: +```bash +gcloud iam service-accounts create my-sa --display-name="My SA" +``` +2. Grant roles (example: objectViewer): +```bash +gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" +``` +3. (Optional) Create key for local use: +```bash +gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com +export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" +``` + +Access tokens +- Print an access token for HTTP Authorization: +```bash +gcloud auth print-access-token +``` +- Use as header: Authorization: Bearer + +Common gsutil commands +- List buckets: +```bash +gsutil ls gs:// +``` + +- List objects in a bucket/prefix: +```bash +gsutil ls gs://BUCKET/PREFIX/ +``` + +- Download object: +```bash +gsutil cp gs://BUCKET/OBJECT ./local-file +``` + +- Upload file: +```bash +gsutil cp ./local-file gs://BUCKET/OBJECT +``` + +- Make object public (use sparingly): +```bash +gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT +``` + +Signed URLs +- Create a signed URL for temporary access (using gsutil with a service account key): +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` +- Signed URLs are valid up to 7 days when using V4 signing. + +CORS (for browser clients) +- Example file: cors.json (in repo) +- Apply: +```bash +gsutil cors set cors.json gs://BUCKET +``` + +BigQuery ingestion +- BigQuery accepts wildcards β€” you can load many CSVs without pre-merging: +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` + +Troubleshooting +- Permission denied: check IAM roles (roles/storage.objectViewer or a custom role). +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account tokens. +- CORS errors: ensure bucket CORS includes your domain and required methods/headers. + +Security recommendations +- Use service accounts with least privilege. +- Prefer IAM + uniform bucket-level access over ACLs. +- Do not embed long-lived keys in client-side code; use signed URLs for browser access. + +Further reading +- GCS docs: https://cloud.google.com/storage/docs +- Signed URLs: https://cloud.google.com/storage/docs/access-control/signed-urls +- gsutil reference: https://cloud.google.com/storage/docs/gsutil +```` + +````markdown name=docs/merge-data.md +# Gabungkan Semua Data (Combine all data) + +Dokumen ringkas (Bahasa Melayu) untuk menggabungkan fail yang tersimpan di Google Cloud Storage. + +Sebelum mula +- Pastikan anda mempunyai akses ke bucket (roles/storage.objectViewer atau storage.objectAdmin untuk penulisan). +- Jika dataset besar (GB/TB), pertimbangkan Dataflow/Dataproc atau import terus ke BigQuery. + +Pilihan umum + +1) Quick merge (fail kecil/sederhana) +- Jika saiz kecil supaya muat dalam memori: +```bash +gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv +``` +- Risiko: memori dan rangkaian. Pastikan hanya digunakan untuk set kecil. + +2) gsutil compose (gabungkan objek di GCS tanpa muat turun) +- Gabungkan hingga 32 objek per operasi: +```bash +gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +``` +- Untuk >32 objek, buat compose berperingkat (tree compose). +- Nota: compose menggabungkan byte; pastikan setiap objek mempunyai newline yang sesuai dan tidak menggabungkan header berganda. + +3) Load terus ke BigQuery (disarankan untuk analisis) +- BigQuery boleh baca wildcard CSV: +```bash +bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv +``` +- Kelebihan: skala, tiada pra-gabungan, schema handling. + +4) Skrip tersuai (contoh Python) +- Kelebihan: kawalan penuh (buang header berganda, normalisasi). +- Lihat `examples/merge_csv_gcs.py` untuk contoh. + +Contoh strategi untuk >32 objek menggunakan gsutil compose (pattern) +- Bahagikan senarai objek kepada kumpulan 32, compose setiap kumpulan ke objek sementara, kemudian compose hasil sementara (ulang sehingga satu objek terakhir). +- Atau gunakan pipeline Dataflow untuk penstriman dan penulisan semula. + +Amalan terbaik +- Untuk output akhir yang akan dianalitik: lebih baik tulis ke BigQuery atau Parquet (kolumnar) untuk kos dan prestasi. +- Pastikan logging dan versi objek dihidupkan jika data penting (object versioning). +- Gunakan encryption keys (CMEK) jika diperlukan. + +Jika anda mahu, berikan: +- Bucket name (contoh: gs://my-bucket/data/) +- Prefix dan jenis fail (.csv) +- Saiz anggaran (GB/TB) + +Saya akan hasilkan skrip automatik yang sesuai (bash atau Python) untuk anda. +```` + +````markdown name=docs/signed-urls.md +# Signed URLs β€” Quick reference + +Signed URLs grant temporary access to an object without requiring the requester to have a Google account. + +Key points +- Signed URLs include authentication in query parameters. +- Anyone with the URL can access the object until expiry. +- Maximum expiry for V4 signed URLs is 7 days. + +Create with gsutil (using service account key file) +```bash +gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +``` + +Create programmatically (Python example sketch) +- Use google-auth library to sign and generate a V4 URL, or use google-cloud-storage's blob.generate_signed_url method. +- Prefer service account credentials with least privilege. + +Security tips +- Monitor signed-URL usage where possible (Cloud Storage logs). +- Shorter expiry reduces risk. +- Do not embed long-lived private keys in public repositories. + +When to use +- Temporary downloads for users without Google accounts. +- Browser uploads (PUT) when combined with appropriate CORS settings. +```` + +```text name=LICENSE +MIT License + +Copyright (c) 2025 Sazwanismail + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +... (Full MIT text can be added here) +``` + +```json name=cors.json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` + +```python name=examples/merge_csv_gcs.py +#!/usr/bin/env python3 +""" +examples/merge_csv_gcs.py + +Merge CSV files in a GCS prefix into one CSV while keeping only the first header. + +Requirements: + pip install google-cloud-storage + +Usage: + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + python3 examples/merge_csv_gcs.py my-bucket data/prefix/ output/combined.csv +""" +import sys +import csv +from io import StringIO +from google.cloud import storage + +def merge_csvs(bucket_name, prefix, output_blob_name): + client = storage.Client() + bucket = client.bucket(bucket_name) + blobs = sorted([b for b in bucket.list_blobs(prefix=prefix) if b.name.endswith('.csv')]) + + if not blobs: + print("No CSV files found with prefix:", prefix) + return + + out_buf = StringIO() + writer = None + first = True + + for blob in blobs: + print("Reading:", blob.name) + # download_as_text loads the object into memory; for very large files use streaming + data = blob.download_as_text() + reader = csv.reader(StringIO(data)) + header = next(reader, None) + if first: + writer = csv.writer(out_buf) + if header: + writer.writerow(header) + for row in reader: + writer.writerow(row) + first = False + else: + # skip header for subsequent files + for row in reader: + writer.writerow(row) + + out_blob = bucket.blob(output_blob_name) + out_blob.upload_from_string(out_buf.getvalue(), content_type='text/csv') + print(f'Uploaded gs://{bucket_name}/{output_blob_name}') + +if __name__ == "__main__": + if len(sys.argv) != 4: + print("Usage: merge_csv_gcs.py BUCKET PREFIX OUTPUT_BLOB") + print("Example: merge_csv_gcs.py my-bucket data/ output/combined.csv") + sys.exit(1) + merge_csvs(sys.argv[1], sys.argv[2], sys.argv[3]) +``` +````markdown name=README.md +# storage.cloud β€” Google Cloud Storage docs & examples + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This README is a concise guide and index for the repository contents and the most common GCS workflows: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and load data to BigQuery. + +Status: Revised β€” 2025-11-06 +Maintainer: Sazwanismail + +Quick links +- Web UI (requires sign-in): https://storage.cloud.google.com/ +- Cloud Console (Storage browser): https://console.cloud.google.com/storage/browser +- GCS docs: https://cloud.google.com/storage/docs + +Repository layout +- index.html β€” landing page / site overview +- docs/ + - quickstart.md β€” essential commands and notes + - merge-data.md β€” strategies (English + Malay notes) + - signed-urls.md β€” signed URL reference & tips +- examples/ + - merge_csv_gcs.py β€” Python script to merge CSVs in a GCS prefix +- cors.json β€” example CORS policy +- LICENSE β€” MIT by default + +What this repo is for +- Fast onboarding for GCS tasks (dev & ops). +- Copy‑paste safe commands for local work and quick demos. +- Small example scripts you can adapt for production (with caution). +- Practical patterns for combining many objects (CSV/text) and for ingesting into BigQuery. + +Quickstart (minimum steps) +1. Install Cloud SDK (gcloud, gsutil) and Python client (optional): + ```bash + # Cloud SDK: https://cloud.google.com/sdk + pip install --upgrade google-cloud-storage + ``` + +2. Authenticate (developer / local): + ```bash + gcloud auth application-default login + ``` + +3. For server applications, create and use a service account (least privilege): + ```bash + gcloud iam service-accounts create my-sa --display-name="My SA" + + gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" + ``` + + (Optional for local testing) + ```bash + gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + ``` + +Common commands +- List buckets: + ```bash + gsutil ls gs:// + ``` +- List objects in a prefix: + ```bash + gsutil ls gs://BUCKET/PREFIX/ + ``` +- Download / upload: + ```bash + gsutil cp gs://BUCKET/OBJECT ./local-file + gsutil cp ./local-file gs://BUCKET/OBJECT + ``` +- Get an access token (for HTTP Authorization header): + ```bash + gcloud auth print-access-token + # header: Authorization: Bearer + ``` +- Make an object public (use sparingly; prefer IAM or signed URLs): + ```bash + gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT + ``` + +Sharing & Signed URLs +- Quick: create a signed URL with gsutil using a service account key: + ```bash + gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT + ``` +- Notes: + - V4 signed URLs support up to 7 days expiry. + - Anyone with the URL can access the object while it’s valid β€” treat like a secret. + - For programmatic signing, use google-cloud-storage or google-auth libraries (see docs/signed-urls.md). + +Merging strategies (pick by dataset size) +- Small / moderate (fits in memory) + ```bash + gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv + ``` + - Simple and fast for small sets. Watch memory/network use. + +- In-place compose (no download; up to 32 objects per compose) + ```bash + gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv + ``` + - For >32 objects, use a tree-compose approach (compose in batches, then compose results). See docs/merge-data.md. + +- Large-scale / analytics + - Load directly to BigQuery (no pre-merge): + ```bash + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + ``` + - For heavy transformations or streaming merges, use Dataflow (Apache Beam) or Dataproc (Spark). + +Example: tree-compose helper (pattern) +```bash +# Sketch: group objects into batches of 32, compose each batch to a temp object, +# then compose the temp objects until a single final object remains. +# See docs/merge-data.md for a full script or ask for a ready-made helper. +``` + +Examples included +- examples/merge_csv_gcs.py β€” Merge CSVs by prefix and keep only the first header. Good starting point for small-to-medium datasets. +- cors.json β€” CORS policy example for browser uploads/downloads. + +CORS & browser uploads +- Example cors.json: + ```json + [ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } + ] + ``` +- Apply: + ```bash + gsutil cors set cors.json gs://BUCKET + ``` + +Security & best practices +- Use service accounts with least privilege; do not use personal accounts for long-running services. +- Prefer uniform bucket-level access + IAM roles instead of ACLs. +- Avoid embedding long-lived private keys in clients. Use signed URLs or short-lived tokens for browser access. +- Monitor access with Cloud Audit Logs and consider object versioning and retention. +- If required by policy, use CMEK (customer-managed encryption keys). + +Troubleshooting tips +- Permission denied: verify IAM roles (roles/storage.objectViewer for reads). +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account keys. +- CORS errors: ensure bucket CORS allows your origin, methods, and response headers. +- Large merges: avoid downloading many large files into memory; prefer compose, streaming, or Dataflow. + +Contributing +- PRs and issues welcome. When adding scripts, include: + - Purpose and example usage + - Required permissions and dependencies + - Safety notes (memory/time limits) +- Keep examples small and focused. + +License +- MIT by default. See LICENSE. + +Need a file or script now? +Tell me which of the following you want and I’ll produce it: +- tree-compose shell helper to safely compose >32 objects +- Dataflow (Apache Beam) starter pipeline for large merges +- A localized Malay README or full-site translation +- A small GitHub Actions workflow to lint/test examples + +Alternatively, provide your bucket name, prefix, file type, and approximate size and I’ll generate a ready-to-run script (bash or Python) tailored to your needs. +````````markdown name=README.md +# storage.cloud β€” Google Cloud Storage docs & examples (v5) + +A compact, practical collection of reference notes, copy‑paste commands, and small example scripts for working with Google Cloud Storage (GCS). This v5 README is streamlined for quick onboarding and safe reuse β€” with an included tree‑compose helper pattern for composing >32 objects. + +Status: v5 β€” 2025-11-06 +Maintainer: Sazwanismail + +Table of contents +- About +- Repository layout +- Quickstart (install, auth) +- Common commands +- Sharing & signed URLs +- Merging strategies (small β†’ large) +- Tree‑compose helper (compose >32 objects) +- CORS & browser uploads +- Examples included +- Security & best practices +- Troubleshooting +- Contributing & license + +About +storage.cloud collects minimal, copy‑pasteable guidance and small scripts you can adapt for development and production. Focus: authenticate, inspect buckets, share objects, configure CORS, merge many objects, and ingest to BigQuery. + +Repository layout +- index.html β€” landing page +- docs/ + - quickstart.md + - merge-data.md + - signed-urls.md +- examples/ + - merge_csv_gcs.py +- cors.json +- LICENSE + +Quickstart (minimum steps) +1. Install Google Cloud SDK (gcloud, gsutil) and Python client (optional): + ```bash + # Cloud SDK: https://cloud.google.com/sdk + pip install --upgrade google-cloud-storage + ``` + +2. Authenticate (developer / local): + ```bash + gcloud auth application-default login + ``` + +3. Service account for servers (least privilege): + ```bash + gcloud iam service-accounts create my-sa --display-name="My SA" + + gcloud projects add-iam-policy-binding PROJECT_ID \ + --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" \ + --role="roles/storage.objectViewer" + ``` + + (Optional for local testing) + ```bash + gcloud iam service-accounts keys create key.json \ + --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + ``` + +Common commands +- List buckets: + ```bash + gsutil ls gs:// + ``` +- List objects: + ```bash + gsutil ls gs://BUCKET/PREFIX/ + ``` +- Download / upload: + ```bash + gsutil cp gs://BUCKET/OBJECT ./local-file + gsutil cp ./local-file gs://BUCKET/OBJECT + ``` +- Make object public (use sparingly): + ```bash + gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT + ``` +- Get access token: + ```bash + gcloud auth print-access-token + # HTTP header: Authorization: Bearer + ``` + +Sharing & Signed URLs +- Create a signed URL (gsutil, service account key): + ```bash + gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT + ``` +Notes: +- V4 signed URLs max expiry: 7 days. +- Anyone with the URL can access the object while valid β€” treat it as a secret. +- For programmatic signing, use google-cloud-storage or google-auth libraries (see docs/signed-urls.md). + +Merging strategies β€” choose by dataset size +- Small / moderate (fits memory) + ```bash + gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv + ``` + - Fast for small sets. Watch memory & network. + +- In-place compose (no download; up to 32 objects per compose) + ```bash + gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv + ``` + - Compose merges object bytes; ensure objects end with newline if needed and avoid duplicate headers. + +- Large-scale / analytics + - Load directly to BigQuery (no pre-merge): + ```bash + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + ``` + - For heavy transforms or streaming merges, use Dataflow (Apache Beam) or Dataproc (Spark). + +Tree‑compose helper β€” safe pattern for >32 objects +- Problem: gsutil compose accepts at most 32 sources. Use a tree-compose (batch then reduce) approach. +- Sketch helper (bash) β€” adapt and run in a safe environment. This creates temporary composed objects and composes them until one final object remains. + +```bash +#!/usr/bin/env bash +# tree-compose.sh: Compose many GCS objects into one final object. +# Usage: ./tree-compose.sh BUCKET PREFIX output/final.csv +set -euo pipefail + +BUCKET="$1" # e.g. my-bucket +PREFIX="$2" # e.g. data/prefix/ +FINAL_OBJ="$3" # e.g. output/final.csv +TMP_PREFIX="tmp/compose-$(date +%s)" +BATCH_SIZE=32 + +# list CSVs under prefix +mapfile -t objects < <(gsutil ls "gs://${BUCKET}/${PREFIX}" | grep -E '\.csv$' || true) +if [ "${#objects[@]}" -eq 0 ]; then + echo "No objects found under gs://${BUCKET}/${PREFIX}" + exit 1 +fi + +# create batches of up to 32 and compose each to a temp object +temp_objects=() +i=0 +while [ $i -lt "${#objects[@]}" ]; do + batch=( "${objects[@]:$i:$BATCH_SIZE}" ) + idx=$((i / BATCH_SIZE)) + out="gs://${BUCKET}/${TMP_PREFIX}/part-${idx}.csv" + echo "Composing batch $idx -> $out" + gsutil compose "${batch[@]}" "$out" + temp_objects+=("$out") + i=$((i + BATCH_SIZE)) +done + +# reduce: compose temp objects repeatedly until one remains +while [ "${#temp_objects[@]}" -gt 1 ]; do + new_temp=() + i=0 + while [ $i -lt "${#temp_objects[@]}" ]; do + batch=( "${temp_objects[@]:$i:$BATCH_SIZE}" ) + idx=$((i / BATCH_SIZE)) + out="gs://${BUCKET}/${TMP_PREFIX}/reduce-${idx}.csv" + echo "Composing reduce batch $idx -> $out" + gsutil compose "${batch[@]}" "$out" + new_temp+=("$out") + i=$((i + BATCH_SIZE)) + done + temp_objects=( "${new_temp[@]}" ) +done + +# final rename (copy) to desired location +echo "Final object: ${temp_objects[0]} -> gs://${BUCKET}/${FINAL_OBJ}" +gsutil cp "${temp_objects[0]}" "gs://${BUCKET}/${FINAL_OBJ}" + +# optional: cleanup temp objects +echo "Cleaning up temporary objects under gs://${BUCKET}/${TMP_PREFIX}/" +gsutil -m rm -r "gs://${BUCKET}/${TMP_PREFIX}/" +echo "Done." +``` + +Notes: +- This sketch assumes objects are ready to be concatenated (newline handling, headers). If files contain headers, either remove headers before composing or use a script to write header once (see examples/merge_csv_gcs.py). +- Test on a small subset before running on production data. + +CORS & browser uploads +- Example cors.json (in repo). Apply: + ```bash + gsutil cors set cors.json gs://BUCKET + ``` +- For browser uploads with signed URLs, ensure CORS allows the upload origin and required headers/methods. + +Examples included +- examples/merge_csv_gcs.py β€” merge CSVs by prefix while keeping only the first header (for small/medium sets). +- cors.json β€” example CORS policy. + +Security & best practices +- Use service accounts with least privilege. +- Prefer uniform bucket-level access + IAM roles over ACLs. +- Avoid embedding long-lived keys in client-side code; use signed URLs for browser access. +- Monitor with Cloud Audit Logs; enable object versioning and retention as needed. +- Consider CMEK if required. + +Troubleshooting +- Permission denied: confirm IAM role (roles/storage.objectViewer for reads). +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account keys. +- CORS issues: ensure bucket CORS includes your origin, methods, headers. +- Performance: avoid downloading large numbers of files into memory; prefer compose, streaming, or Dataflow. + +Contributing +- PRs and issues welcome. When adding scripts, include: + - Purpose, usage, permissions, and dependencies. + - Safety notes (memory/time limits). +- Keep examples minimal and testable. + +License +- MIT by default. See LICENSE. + +Need a ready-to-run script or pipeline? +Tell me which you want: +- A fully-tested tree-compose script (I can produce a version with additional safety checks and header handling). +- A Dataflow (Apache Beam) starter pipeline to merge and transform files at scale. +- A Malay-localized README and docs. +Provide bucket name, prefix and approximate size and I’ll generate tailored code. +```` diff --git a/Workflow planning with Ai b/Workflow planning with Ai new file mode 100644 index 0000000..4967646 --- /dev/null +++ b/Workflow planning with Ai @@ -0,0 +1,622 @@ +# AI-Powered Workflow Planning System + +![AI Planning](https://img.shields.io/badge/AI--Powered-Workflow_Planning-FF6B6B.svg) +![Automation](https://img.shields.io/badge/Automation-βœ“-00C9FF.svg) +![Integration](https://img.shields.io/badge/Multi--Tool_Integration-βœ“-45B7D1.svg) + +An intelligent workflow planning system that leverages AI to automate, optimize, and manage complex development workflows across multiple tools and platforms. + +## 🧠 AI Planning Architecture + +```mermaid +graph TB + A[User Input] --> B[AI Planner] + B --> C[Workflow Generator] + B --> D[Dependency Resolver] + B --> E[Optimization Engine] + + C --> F[Tool Integrations] + D --> G[Dependency Graph] + E --> H[Performance Optimizer] + + F --> I[GitHub Actions] + F --> J[Jenkins] + F --> K[GitLab CI] + F --> L[Docker Registry] + F --> M[Xcode Build] + + H --> N[Optimized Workflow] + G --> N + + N --> O[Execution Engine] + O --> P[Monitoring & Feedback] + P --> B +``` + +## πŸš€ Quick Start + +### Prerequisites +- Python 3.9+ +- OpenAI API key or local LLM +- Docker (optional) +- Git + +### Installation + +```bash +# Clone the repository +git clone https://github.com/your-org/ai-workflow-planner.git +cd ai-workflow-planner + +# Install dependencies +pip install -r requirements.txt + +# Setup environment +cp .env.example .env +# Add your API keys to .env + +# Start the planning system +python -m planner.main +``` + +### Basic Usage + +```bash +# Interactive planning session +python -m planner.cli --project-type "ios" --tools "xcode,github" + +# Batch planning from config +python -m planner.batch --config workflows/ios-ci.yaml + +# API server mode +python -m planner.api --host 0.0.0.0 --port 8000 +``` + +## πŸ—οΈ Core Components + +### 1. AI Planning Engine + +```python +# planners/ai_planner.py +class AIPlanner: + def __init__(self, model="gpt-4", temperature=0.3): + self.model = model + self.temperature = temperature + self.workflow_memory = WorkflowMemory() + + async def plan_workflow(self, requirements: ProjectRequirements) -> WorkflowPlan: + """Generate optimized workflow using AI""" + prompt = self._build_planning_prompt(requirements) + response = await self._call_ai(prompt) + return self._parse_workflow_response(response) + + def _build_planning_prompt(self, requirements: ProjectRequirements) -> str: + return f""" + Project Requirements: + - Type: {requirements.project_type} + - Tools: {', '.join(requirements.tools)} + - Team Size: {requirements.team_size} + - Complexity: {requirements.complexity} + + Generate an optimized workflow that: + 1. Integrates all specified tools + 2. Minimizes execution time + 3. Ensures proper dependency ordering + 4. Includes error handling + 5. Provides monitoring and feedback + + Output in YAML format with the following structure: + {WORKFLOW_SCHEMA} + """ +``` + +### 2. Workflow Generator + +```python +# generators/workflow_generator.py +class WorkflowGenerator: + def __init__(self, planner: AIPlanner): + self.planner = planner + self.tool_integrations = ToolRegistry() + + async def generate_workflow(self, project_config: dict) -> GeneratedWorkflow: + """Generate complete workflow configuration""" + + # AI-powered planning phase + plan = await self.planner.plan_workflow(project_config) + + # Tool-specific configuration generation + workflow_configs = {} + for tool in plan.required_tools: + generator = self.tool_integrations.get_generator(tool) + workflow_configs[tool] = await generator.generate(plan) + + return GeneratedWorkflow( + plan=plan, + configurations=workflow_configs, + dependencies=plan.dependencies + ) +``` + +### 3. Dependency Resolver + +```python +# resolvers/dependency_resolver.py +class DependencyResolver: + def __init__(self): + self.dependency_graph = DependencyGraph() + + def resolve_dependencies(self, workflow_plan: WorkflowPlan) -> ExecutionOrder: + """Resolve and optimize execution order""" + graph = self._build_dependency_graph(workflow_plan) + execution_order = self._topological_sort(graph) + return self._optimize_parallel_execution(execution_order) + + def _build_dependency_graph(self, plan: WorkflowPlan) -> Dict[str, List[str]]: + """Build dependency graph from AI-generated plan""" + graph = {} + for step in plan.steps: + graph[step.name] = step.dependencies + return graph +``` + +## βš™οΈ Configuration + +### AI Planning Configuration + +```yaml +# config/ai_planner.yaml +ai: + model: "gpt-4" + temperature: 0.3 + max_tokens: 4000 + retry_attempts: 3 + +planning: + optimization_goals: + - "execution_time" + - "resource_usage" + - "cost_efficiency" + - "reliability" + + constraints: + max_parallel_jobs: 10 + timeout_minutes: 60 + resource_limits: + memory: "8GB" + cpu: "4 cores" + +tool_integrations: + github_actions: + enabled: true + templates_path: "./templates/github" + + jenkins: + enabled: true + templates_path: "./templates/jenkins" + + docker: + enabled: true + registry: "registry.company.com" +``` + +### Project Templates + +```yaml +# templates/ios_project.yaml +project_type: "ios" +default_tools: + - "xcode" + - "github_actions" + - "docker" + - "slack" + +stages: + analysis: + tools: ["xcode_analyze", "swiftlint"] + parallel: false + + build: + tools: ["xcode_build", "carthage", "cocoapods"] + parallel: true + + test: + tools: ["xcode_test", "fastlane_scan"] + parallel: false + + distribution: + tools: ["fastlane", "testflight", "app_center"] + +optimization_rules: + - name: "cache_dependencies" + condition: "dependencies_changed == false" + action: "skip_dependency_installation" + + - name: "parallel_tests" + condition: "test_count > 100" + action: "split_tests_parallel" +``` + +## πŸ”§ Tool Integrations + +### GitHub Actions Integration + +```python +# integrations/github_actions.py +class GitHubActionsIntegration: + async def generate_workflow(self, plan: WorkflowPlan) -> str: + """Generate GitHub Actions workflow from AI plan""" + + workflow = { + "name": f"{plan.project_name} - AI Generated", + "on": self._get_trigger_events(plan), + "jobs": await self._generate_jobs(plan) + } + + return yaml.dump(workflow) + + async def _generate_jobs(self, plan: WorkflowPlan) -> Dict: + jobs = {} + for step in plan.execution_order: + jobs[step.name] = { + "runs-on": self._select_runner(step), + "steps": await self._generate_steps(step), + "needs": step.dependencies + } + return jobs +``` + +### Xcode Build Integration + +```python +# integrations/xcode_build.py +class XcodeBuildIntegration: + async def generate_build_scripts(self, plan: WorkflowPlan) -> List[str]: + """Generate optimized Xcode build scripts""" + + scripts = [] + for build_step in plan.get_steps_by_type("xcode_build"): + script = f""" + # AI-Generated Build Script + set -eo pipefail + + # Dependency checks + {self._generate_dependency_checks(build_step)} + + # Build configuration + {self._generate_build_commands(build_step)} + + # Post-build validation + {self._generate_validation_commands(build_step)} + """ + scripts.append(script) + + return scripts +``` + +## 🎯 Usage Examples + +### iOS Project Workflow Planning + +```python +# examples/ios_workflow.py +async def plan_ios_workflow(): + """Example of AI planning for iOS project""" + + requirements = ProjectRequirements( + project_type="ios", + tools=["xcode", "github_actions", "fastlane", "docker"], + team_size=5, + complexity="medium", + constraints={ + "build_time": "under_15_minutes", + "test_coverage": "minimum_80_percent", + "security_scanning": "required" + } + ) + + planner = AIPlanner() + workflow = await planner.plan_workflow(requirements) + + # Generate configurations + generator = WorkflowGenerator(planner) + full_workflow = await generator.generate_workflow(workflow) + + # Save generated workflows + await full_workflow.save("generated_workflows/") + + return full_workflow +``` + +### Multi-Tool Integration + +```python +# examples/multi_tool_integration.py +async def create_cross_platform_workflow(): + """Workflow spanning multiple tools and platforms""" + + requirements = ProjectRequirements( + project_type="cross_platform", + tools=["github_actions", "jenkins", "docker", "slack"], + integration_points={ + "github_actions": "ci_trigger", + "jenkins": "deployment", + "docker": "containerization", + "slack": "notifications" + } + ) + + planner = AIPlanner() + plan = await planner.plan_workflow(requirements) + + # Generate tool-specific configurations + workflows = {} + for tool in requirements.tools: + integration = ToolIntegrationFactory.create(tool) + workflows[tool] = await integration.generate_config(plan) + + return workflows +``` + +## πŸ”„ AI Feedback Loop + +### Learning from Execution + +```python +# learning/execution_analyzer.py +class ExecutionAnalyzer: + def __init__(self, planner: AIPlanner): + self.planner = planner + self.performance_metrics = PerformanceMetrics() + + async def analyze_execution(self, workflow_execution: WorkflowExecution): + """Analyze workflow execution and provide feedback to AI""" + + metrics = await self._collect_metrics(workflow_execution) + improvements = await self._identify_improvements(metrics) + + # Update AI planner with learnings + await self.planner.incorporate_feedback( + workflow_execution.plan, + metrics, + improvements + ) + + async def _identify_improvements(self, metrics: ExecutionMetrics) -> List[Improvement]: + """Use AI to identify workflow improvements""" + + prompt = f""" + Analyze these workflow execution metrics: + {metrics.to_json()} + + Identify 3-5 specific improvements to: + 1. Reduce execution time + 2. Improve reliability + 3. Optimize resource usage + + Provide concrete suggestions. + """ + + response = await self.planner._call_ai(prompt) + return self._parse_improvements(response) +``` + +## πŸ“Š Monitoring & Analytics + +### Workflow Analytics + +```python +# analytics/workflow_analytics.py +class WorkflowAnalytics: + def __init__(self): + self.metrics_store = MetricsStore() + + async def track_workflow_performance(self, workflow_id: str): + """Track and analyze workflow performance""" + + metrics = await self.metrics_store.get_workflow_metrics(workflow_id) + + analysis = { + "execution_time": self._analyze_execution_time(metrics), + "resource_usage": self._analyze_resource_usage(metrics), + "reliability": self._analyze_reliability(metrics), + "bottlenecks": await self._identify_bottlenecks(metrics) + } + + return analysis + + async def generate_optimization_recommendations(self, analysis: dict): + """Generate AI-powered optimization recommendations""" + + prompt = f""" + Based on this workflow analysis: + {analysis} + + Generate specific, actionable recommendations to optimize this workflow. + Focus on: + - Parallelization opportunities + - Resource allocation + - Dependency optimization + - Cache utilization + """ + + return await self._call_ai(prompt) +``` + +## πŸš€ Advanced Features + +### Dynamic Workflow Adaptation + +```python +# features/dynamic_adaptation.py +class DynamicWorkflowAdapter: + async def adapt_workflow(self, original_plan: WorkflowPlan, + changing_conditions: dict) -> WorkflowPlan: + """Dynamically adapt workflow based on changing conditions""" + + prompt = f""" + Original workflow plan: + {original_plan.to_json()} + + Changing conditions: + {changing_conditions} + + Adapt the workflow to handle these changes while maintaining: + - Functionality + - Performance + - Reliability + + Provide the adapted workflow plan. + """ + + adapted_plan = await self.planner._call_ai(prompt) + return WorkflowPlan.from_json(adapted_plan) +``` + +### Multi-Agent Planning + +```python +# features/multi_agent_planner.py +class MultiAgentPlanner: + def __init__(self): + self.specialized_agents = { + "architecture": ArchitectureAgent(), + "security": SecurityAgent(), + "performance": PerformanceAgent(), + "cost": CostOptimizationAgent() + } + + async def collaborative_planning(self, requirements: ProjectRequirements): + """Use multiple specialized AI agents for planning""" + + # Parallel planning by specialists + agent_tasks = [] + for agent_name, agent in self.specialized_agents.items(): + task = agent.generate_recommendations(requirements) + agent_tasks.append(task) + + recommendations = await asyncio.gather(*agent_tasks) + + # Consolidate recommendations + consolidated_plan = await self._consolidate_recommendations( + requirements, recommendations + ) + + return consolidated_plan +``` + +## πŸ”§ Deployment & Operations + +### Docker Deployment + +```dockerfile +# Dockerfile +FROM python:3.9-slim + +WORKDIR /app + +# Install dependencies +COPY requirements.txt . +RUN pip install -r requirements.txt + +# Copy application +COPY . . + +# Create volume for workflow storage +VOLUME /app/generated_workflows + +# Expose API port +EXPOSE 8000 + +# Health check +HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \ + CMD curl -f http://localhost:8000/health || exit 1 + +CMD ["python", "-m", "planner.api", "--host", "0.0.0.0", "--port", "8000"] +``` + +### Kubernetes Deployment + +```yaml +# k8s/deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: ai-workflow-planner +spec: + replicas: 3 + selector: + matchLabels: + app: workflow-planner + template: + metadata: + labels: + app: workflow-planner + spec: + containers: + - name: planner + image: your-org/ai-workflow-planner:latest + ports: + - containerPort: 8000 + env: + - name: OPENAI_API_KEY + valueFrom: + secretKeyRef: + name: api-secrets + key: openai-api-key + resources: + requests: + memory: "1Gi" + cpu: "500m" + limits: + memory: "2Gi" + cpu: "1000m" +``` + +## πŸ“ˆ Performance Optimization + +### Caching and Optimization + +```python +# optimization/planning_cache.py +class PlanningCache: + def __init__(self): + self.cache = {} + self.similarity_engine = SimilarityEngine() + + async def get_cached_plan(self, requirements: ProjectRequirements) -> Optional[WorkflowPlan]: + """Get cached plan for similar requirements""" + + similar_key = await self.similarity_engine.find_similar(requirements) + if similar_key in self.cache: + cached_plan = self.cache[similar_key] + if await self._is_plan_valid(cached_plan, requirements): + return cached_plan + + return None + + async def cache_plan(self, requirements: ProjectRequirements, plan: WorkflowPlan): + """Cache generated plan for future use""" + + cache_key = self._generate_cache_key(requirements) + self.cache[cache_key] = plan + + # Also cache similar variations + await self._cache_similar_variations(requirements, plan) +``` + +--- + +
+ +## 🧠 Start Planning with AI + +[**Quick Start Guide**](docs/quickstart.md) β€’ +[**API Documentation**](docs/api.md) β€’ +[**Examples Gallery**](docs/examples.md) + +**Intelligent Workflow Planning | Multi-Tool Integration | Continuous Optimization** + +*Transform your development workflows with AI-powered planning* + +
diff --git a/storage.cloud b/storage.cloud new file mode 100644 index 0000000..9b214b1 --- /dev/null +++ b/storage.cloud @@ -0,0 +1,247 @@ +````markdown name=README.md +# storage.cloud β€” Docs & Quickstart + +storage.cloud is a quick-reference resource and starter content for working with Google Cloud Storage (GCS). It includes concise how-tos, examples, and small reusable scripts for common tasks such as authenticated access, signed URLs, CORS configuration, and merging data from many objects. + +Primary goals +- Provide clear, copy-pastable commands for common workflows. +- Give simple code examples (Python) for object operations and CSV merging. +- Offer troubleshooting tips and security reminders. + +Contents +- index.html β€” site landing page (simple overview). +- docs/quickstart.md β€” authentication, common gsutil/gcloud/bq commands, signed URLs. +- docs/merge-data.md β€” guide in Malay for combining files stored in GCS. +- examples/merge_csv_gcs.py β€” Python example to merge CSVs in a GCS prefix while handling headers. +- cors.json β€” example CORS configuration. + +Contributing +- Send issues or PRs with improvements. +- Keep examples minimal and focused. Prefer service accounts and least privilege for production. + +License +- Add your preferred license file (e.g., MIT) to the repo root. +```` + +```html name=index.html + + + + + storage.cloud β€” Google Cloud Storage quickstart + + + + +
+

storage.cloud

+

Quick reference and starter content for Google Cloud Storage (GCS)

+ +
+ +
+

Overview

+

This site contains short guides and example code for:

+
    +
  • Authenticated access (gcloud, service accounts, access tokens)
  • +
  • Signed URLs for temporary, non-account-based access
  • +
  • CORS configuration for browser clients
  • +
  • Merging files stored in GCS (CSV, text, etc.)
  • +
  • Loading directly into BigQuery for analytics
  • +
+
+ +
+

Important links

+ +
+ +
+

Get started

+

Open docs/quickstart.md for a minimal set of commands and examples you can run locally.

+
+ + +``` + +````markdown name=docs/quickstart.md +# Quickstart β€” Google Cloud Storage (GCS) + +This page collects the most-used commands and short examples for getting started with GCS. + +Authentication +- Local user ADC (Application Default Credentials) β€” for development: + gcloud auth application-default login + +- Service account (recommended for servers/apps): + 1. Create service account in Cloud Console or: + gcloud iam service-accounts create my-sa --display-name="My SA" + 2. Grant needed roles (example: objectViewer): + gcloud projects add-iam-policy-binding PROJECT_ID --member="serviceAccount:my-sa@PROJECT_ID.iam.gserviceaccount.com" --role="roles/storage.objectViewer" + 3. Download key (if needed): + gcloud iam service-accounts keys create key.json --iam-account=my-sa@PROJECT_ID.iam.gserviceaccount.com + +Get an access token (for Authorization header): + gcloud auth print-access-token + +Common gsutil commands +- List buckets: + gsutil ls gs:// + +- List objects in a prefix: + gsutil ls gs://BUCKET/PREFIX/ + +- Download an object: + gsutil cp gs://BUCKET/OBJECT ./local-file + +- Upload a file: + gsutil cp ./local-file gs://BUCKET/OBJECT + +- Make an object publicly readable (not recommended for sensitive data): + gsutil acl ch -u AllUsers:R gs://BUCKET/OBJECT + +Signed URLs +- Create a signed URL for temporary access (using gsutil with a service account key): + gsutil signurl -d 1h /path/to/key.json gs://BUCKET/OBJECT +- Signed URLs are valid up to 7 days when using V4 signing. + +CORS example +- Use cors.json (see repo) and apply: + gsutil cors set cors.json gs://BUCKET + +Uploading many files / analytics +- Load CSVs directly to BigQuery (no need to pre-concatenate): + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + +Notes & troubleshooting +- Permission denied: ensure the account has roles/storage.objectViewer (or higher) for read, or objectAdmin for writes. +- Invalid credentials: re-run `gcloud auth application-default login` or refresh service account tokens. +- CORS errors: configure the bucket CORS and allow the necessary origins and response headers. +```` + +````markdown name=docs/merge-data.md +# Gabungkan Semua Data (Combine all data) β€” Panduan ringkas + +Dokumen ini ditulis dalam Bahasa Melayu untuk arahan cepat bagaimana menggabungkan (merge) fail yang disimpan di Google Cloud Storage. + +Soalan awal (jawab jika perlu): +1. Fail di mana? (satu bucket / beberapa bucket) +2. Format fail? (CSV/JSON/Parquet) +3. Saiz anggaran? (MB/GB/TB) +4. Output dikehendaki? (satu fail di GCS, jadual BigQuery) + +Pilihan pantas + +Pilihan A β€” Gabungkan cepat untuk fail CSV kecil/sederhana (one‑shot) +- Menggunakan gsutil cat (berguna untuk fail kecil, ingat had memori): + gsutil cat gs://BUCKET/PATH/*.csv | gsutil cp - gs://BUCKET/PATH/combined.csv + +- Nota: Jika setiap CSV mempunyai header, gunakan skrip untuk membuang header bahagian kedua dan seterusnya (contoh di bawah). + +Pilihan B β€” gsutil compose (gabungkan objek tanpa muat turun) +- gsutil compose gs://BUCKET/part1.csv gs://BUCKET/part2.csv gs://BUCKET/combined.csv +- Had: 32 objek setiap compose step. Untuk >32, jalankan compose berperingkat (tree compose). + +Pilihan C β€” Muat naik terus ke BigQuery (disarankan untuk analitik besar) +- BigQuery boleh menerima wildcard CSVs: + bq load --autodetect --source_format=CSV dataset.table gs://BUCKET/PATH/*.csv + +Pilihan D β€” Pipeline (untuk dataset besar/penukaran) +- Gunakan Dataflow (Apache Beam) atau Dataproc (Spark) untuk transformasi dan penulisan semula ke GCS / BigQuery. + +Contoh skrip Python β€” gabung CSV dan buang header berganda +- Fail contoh: `examples/merge_csv_gcs.py` (berguna jika anda mahu kawalan penuh sebelum muat naik semula). + +Perkara penting +- Pastikan service account/akaun anda mempunyai permission yang sesuai (roles/storage.objectViewer / storage.objectAdmin). +- Untuk perkongsian hasil: pertimbangkan signed URLs (maks 7 hari) atau tetapkan access controls yang sesuai. +- Untuk fail besar, elakkan memuatkan semuanya ke RAM β€” gunakan streaming atau gunakan Dataflow/Dataproc. + +Jika anda beritahu saya: +- lokasi bucket (contoh: gs://my-bucket/data/), format fail, dan ukuran anggaran, saya boleh hasilkan skrip yang diubah suai untuk anda. +```` + +```python name=examples/merge_csv_gcs.py +#!/usr/bin/env python3 +""" +Merge CSV files in a GCS prefix into one CSV while keeping only the first header. +Requirements: + pip install google-cloud-storage +Usage: + export GOOGLE_APPLICATION_CREDENTIALS="/path/to/key.json" + python3 examples/merge_csv_gcs.py my-bucket data/prefix/ output/combined.csv +""" +import sys +import csv +from io import StringIO +from google.cloud import storage + +def merge_csvs(bucket_name, prefix, output_blob_name): + client = storage.Client() + bucket = client.bucket(bucket_name) + blobs = sorted([b for b in bucket.list_blobs(prefix=prefix) if b.name.endswith('.csv')]) + + out_buf = StringIO() + writer = None + first = True + + for blob in blobs: + print("Reading:", blob.name) + data = blob.download_as_text() + reader = csv.reader(StringIO(data)) + header = next(reader, None) + if first: + writer = csv.writer(out_buf) + if header: + writer.writerow(header) + for row in reader: + writer.writerow(row) + first = False + else: + # skip header for subsequent files + for row in reader: + writer.writerow(row) + + out_blob = bucket.blob(output_blob_name) + out_blob.upload_from_string(out_buf.getvalue(), content_type='text/csv') + print(f'Uploaded gs://{bucket_name}/{output_blob_name}') + +if __name__ == "__main__": + if len(sys.argv) != 4: + print("Usage: merge_csv_gcs.py BUCKET PREFIX OUTPUT_BLOB") + print("Example: merge_csv_gcs.py my-bucket data/ output/combined.csv") + sys.exit(1) + merge_csvs(sys.argv[1], sys.argv[2], sys.argv[3]) +``` + +```json name=cors.json +[ + { + "origin": ["https://example.com"], + "method": ["GET", "HEAD", "PUT", "POST"], + "responseHeader": ["Content-Type", "x-goog-meta-custom"], + "maxAgeSeconds": 3600 + } +] +``` + +If you want, I can: +- Generate additional localized guides (complete Malay translation). +- Produce a shell script for large-scale compose (handles >32 parts). +- Create a Dataflow (Beam) starter pipeline to merge/transform files at scale. + +Which follow-up would you like?