Skip to content

feat: package caches #179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -1,14 +1,23 @@
root = true

[*]
indent_style = space
indent_style = tabs
indent_size = 4
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
end_of_line = lf
# editorconfig-tools is unable to ignore longs strings or urls
max_line_length = null
quote_type = single

[{*.yaml, *.yml}]
[*.md]
indent_size = 2

[*.yml]
indent_size = 2
indent_style = spaces

[*.yaml]
indent_size = 2
indent_style = spaces
180 changes: 180 additions & 0 deletions .github/actions/cache-builder/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Based on https://raw.githubusercontent.com/github/gitignore/main/Node.gitignore

# Logs

logs
_.log
npm-debug.log_
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
.pnpm-debug.log*

# Caches

.cache

# Diagnostic reports (https://nodejs.org/api/report.html)

report.[0-9]_.[0-9]_.[0-9]_.[0-9]_.json

# Runtime data

pids
_.pid
_.seed
*.pid.lock

# Directory for instrumented libs generated by jscoverage/JSCover

lib-cov

# Coverage directory used by tools like istanbul

coverage
*.lcov

# nyc test coverage

.nyc_output

# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)

.grunt

# Bower dependency directory (https://bower.io/)

bower_components

# node-waf configuration

.lock-wscript

# Compiled binary addons (https://nodejs.org/api/addons.html)

build/Release

# Dependency directories

node_modules/
jspm_packages/

# Snowpack dependency directory (https://snowpack.dev/)

web_modules/

# TypeScript cache

*.tsbuildinfo

# Optional npm cache directory

.npm

# Optional eslint cache

.eslintcache

# Optional stylelint cache

.stylelintcache

# Microbundle cache

.rpt2_cache/
.rts2_cache_cjs/
.rts2_cache_es/
.rts2_cache_umd/

# Optional REPL history

.node_repl_history

# Output of 'npm pack'

*.tgz

# Yarn Integrity file

.yarn-integrity

# dotenv environment variable files

.env
.env.development.local
.env.test.local
.env.production.local
.env.local

# parcel-bundler cache (https://parceljs.org/)

.parcel-cache

# Next.js build output

.next
out

# Nuxt.js build / generate output

.nuxt
dist

# Gatsby files

# Comment in the public line in if your project uses Gatsby and not Next.js

# https://nextjs.org/blog/next-9-1#public-directory-support

# public

# vuepress build output

.vuepress/dist

# vuepress v2.x temp and cache directory

.temp

# Docusaurus cache and generated files

.docusaurus

# Serverless directories

.serverless/

# FuseBox cache

.fusebox/

# DynamoDB Local files

.dynamodb/

# TernJS port file

.tern-port

# Stores VSCode versions used for testing VSCode extensions

.vscode-test

# yarn v2

.yarn/cache
.yarn/unplugged
.yarn/build-state.yml
.yarn/install-state.gz
.pnp.*

# IntelliJ based IDEs
.idea

# Finder (MacOS) folder config
.DS_Store

# caches
data/npm
data/yarn
data/yarn_tmp
72 changes: 72 additions & 0 deletions .github/actions/cache-builder/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# Version Matrix Actions Script

This folder contains scripts that are used to create Actions matrices for building specific Docker images with the right version combinations of Apify SDK, Playwright/Puppeteer, and Crawlee.

These scripts are ran using the [bun](https://bun.sh) runtime (for no reason other than ease of use).

## Adding a new Node version to the matrix

When a new version of Node is released, just update the `supportedNodeVersions` array in the `src/shares/constants.ts` file.

Then, run `SKIP_CACHE_SET=true bun node:normal` locally to preview the new matrix. (you can append `| jq -r '.include[] | "node-version=\(.["node-version"]) apify-version=\(.["apify-version"]) is-latest=\(.["is-latest"])"'` to get a nicer output from the big JSON blob)

## Adding a new Python version to the matrix

When a new version of Python is released, just update the `supportedPythonVersions` array in the `src/shares/constants.ts` file.

Then, run `SKIP_CACHE_SET=true bun python:normal` locally to preview the new matrix. (you can append `| jq -r '.include[] | "python-version=\(.["python-version"]) playwright-version=\(.["playwright-version"]) apify-version=\(.["apify-version"]) is-latest=\(.["is-latest"])"'` to get a nicer output from the big JSON blob)

## Adding a new Python version range for specific Playwright version ranges

Sometimes, newer Python is not compatible with Playwright versions that were released before a specific one (at the time of writing, this is the case for Playwright 1.48.0 and Python 3.13 -> Python 3.13.x can only run Playwright 1.48.0 and newer).

To add a new Python version range for a specific Playwright version, add a new entry to the `playwrightPythonVersionConstraints` array in the `python.ts` file.

The key represents the Python version range where this starts taking effect. The value is the Playwright version range that is required for the Python version.

## Updating the runtime version that will be used for images that are referenced with just the build tag

When we build images, we also include a specific runtime version in the tag (as an example, we have `apify/actor-node:20`). We also provide images tagged with `latest` or `beta`. These images will default to the "latest" runtime version that is specified in the `src/shares/constants.ts` file under `latestPythonVersion` or `latestNodeVersion`.

When the time comes to bump these, just make a PR, edit those values, and merge it. Next time images get built, the `latest` or `beta` tags will use those new versions for the tag.

## Creating new matrices

The structure for a GitHub Actions matrix is as follows:

```ts
interface Matrix {
include: MatrixEntry[];
}

type MatrixEntry = Record<string, string>;
```

When trying to integrate a new matrix into a flow, you need to follow the following steps:

- have a step that outputs the matrix as a JSON blob

```yaml
matrix:
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}

steps:
- name: Generate matrix
id: set-matrix
run: echo "matrix=$(bun python:normal)" >> $GITHUB_OUTPUT
working-directory: ./.github/actions/version-matrix
```

(optionally you can also add in a print step to ensure the matrix is correct. Feel free to copy it from any that uses previous matrices)

- ensure the actual build step needs the matrix and uses it like this (the if check if optional if the matrix will always have at least one entry):

```yaml
needs: [matrix]
if: ${{ toJson(fromJson(needs.matrix.outputs.matrix).include) != '[]' }}
strategy:
matrix: ${{ fromJson(needs.matrix.outputs.matrix) }}
```

- reference matrix values based on the keys in the objects in the `include` array. For example, to get the Python version, you can use `${{ matrix.python-version }}`.
Empty file.
37 changes: 37 additions & 0 deletions .github/actions/cache-builder/data/npm_state.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"crawlee": [
"3.13.1",
"3.13.2",
"3.13.3",
"3.13.4",
"3.13.5"
],
"apify": [
"3.3.1",
"3.3.2",
"3.4.0",
"3.4.1",
"3.4.2"
],
"playwright": [
"1.50.0",
"1.50.1",
"1.51.0",
"1.51.1",
"1.52.0"
],
"puppeteer": [
"24.7.2",
"24.8.0",
"24.8.1",
"24.8.2",
"24.9.0"
],
"typescript": [
"5.6.3",
"5.7.2",
"5.7.3",
"5.8.2",
"5.8.3"
]
}
37 changes: 37 additions & 0 deletions .github/actions/cache-builder/data/yarn_state.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
"crawlee": [
"3.13.1",
"3.13.2",
"3.13.3",
"3.13.4",
"3.13.5"
],
"apify": [
"3.3.1",
"3.3.2",
"3.4.0",
"3.4.1",
"3.4.2"
],
"playwright": [
"1.50.0",
"1.50.1",
"1.51.0",
"1.51.1",
"1.52.0"
],
"puppeteer": [
"24.8.0",
"24.8.1",
"24.8.2",
"24.9.0",
"24.10.0"
],
"typescript": [
"5.6.3",
"5.7.2",
"5.7.3",
"5.8.2",
"5.8.3"
]
}
24 changes: 24 additions & 0 deletions .github/actions/cache-builder/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"name": "cache-builder",
"type": "module",
"private": true,
"scripts": {
"node:npm": "node src/caches/npm.ts",
"node:yarn": "node src/caches/yarn.ts",
"fmt": "biome format --write ./src",
"typecheck": "tsc --noEmit"
},
"devDependencies": {
"@types/node": "^22.15.32",
"@types/semver": "^7.7.0",
"typescript": "^5.8.3"
},
"dependencies": {
"nano-spawn": "^1.0.2",
"semver": "^7.7.2"
},
"volta": {
"extends": "../../../package.json"
},
"packageManager": "[email protected]"
}
Loading
Loading