Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
76 changes: 76 additions & 0 deletions Dockerfile.multiplatform
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
FROM alpine:3.19

# Install runtime dependencies
RUN apk add --no-cache \
curl \
git \
nodejs \
npm \
bash \
ca-certificates \
libc6-compat

# Install Go 1.22.2 (required by go.mod)
ARG TARGETARCH
RUN if [ "$TARGETARCH" = "arm64" ]; then \
GO_ARCH="arm64"; \
else \
GO_ARCH="amd64"; \
fi && \
curl -L "https://go.dev/dl/go1.22.2.linux-${GO_ARCH}.tar.gz" -o go.tar.gz && \
tar -C /usr/local -xzf go.tar.gz && \
rm go.tar.gz

# Set Go environment variables
ENV GOPATH="/go"
ENV PATH="/usr/local/go/bin:${PATH}"
ENV PATH="${GOPATH}/bin:${PATH}"

# Download Hugo binary directly (much more space efficient than compiling)
ARG TARGETARCH
RUN if [ "$TARGETARCH" = "arm64" ]; then \
HUGO_ARCH="arm64"; \
else \
HUGO_ARCH="amd64"; \
fi && \
echo "Downloading Hugo for architecture: ${HUGO_ARCH}" && \
curl -L "https://github.com/gohugoio/hugo/releases/download/v0.123.7/hugo_extended_0.123.7_linux-${HUGO_ARCH}.tar.gz" -o hugo.tar.gz && \
echo "Extracting Hugo..." && \
tar -xzf hugo.tar.gz && \
echo "Contents after extraction:" && \
ls -la && \
echo "Hugo binary details:" && \
ls -la hugo && \
echo "Moving Hugo binary..." && \
cp hugo /usr/local/bin/hugo && \
chmod +x /usr/local/bin/hugo && \
echo "Hugo binary location and permissions:" && \
ls -la /usr/local/bin/hugo && \
echo "Testing Hugo binary:" && \
ldd /usr/local/bin/hugo && \
/usr/local/bin/hugo version && \
rm hugo.tar.gz hugo

# Install global dependencies
RUN npm install -g postcss postcss-cli autoprefixer

# Copy entrypoint script
COPY scripts/entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/entrypoint.sh

# Create working directory
WORKDIR /src

# Configure Git to trust the working directory
RUN git config --global --add safe.directory /src

# Verify installations
RUN node --version && \
npm --version && \
npx --version && \
hugo version && \
go version

EXPOSE 1313

ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
30 changes: 26 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,20 +1,31 @@
# Hugo configuration
OUTPUT_DIR := output
DOCKER_IMAGE := hvishwanath/hugo:v0.123.7-ext
HUGO_BASE_IMAGE := hvishwanath/hugo:v0.123.7-ext-multiplatform
DOCKER_IMAGE := $(HUGO_BASE_IMAGE)
#PROD_IMAGE := hvishwanath/kafka-site-md:1.2.0
PROD_IMAGE := us-west1-docker.pkg.dev/play-394201/kafka-site-md/kafka-site-md:1.6.0

.PHONY: build serve clean docker-image prod-image prod-run buildx-setup
.PHONY: build serve clean docker-image hugo-base-multi-platform prod-image prod-run buildx-setup ghcr-prod-image

# Setup buildx for multi-arch builds
buildx-setup:
docker buildx create --name multiarch --driver docker-container --use || true
docker buildx inspect multiarch --bootstrap

# Build the Docker image
# Build the Docker image (single platform)
docker-image:
docker build -t $(DOCKER_IMAGE) . --push

# Build and push multi-platform Hugo base image
hugo-base-multi-platform: buildx-setup
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag $(HUGO_BASE_IMAGE) \
--file Dockerfile.multiplatform \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--push \
.

# Build the static site using Docker
build:
docker pull $(DOCKER_IMAGE)
Expand Down Expand Up @@ -48,8 +59,19 @@ prod-run: prod-image
docker pull $(PROD_IMAGE)
docker run --rm -p 8080:80 $(PROD_IMAGE)

# Build and push production image to GHCR
ghcr-prod-image: build buildx-setup
docker buildx build \
--platform linux/amd64,linux/arm64 \
--tag ghcr.io/$(shell basename $(shell git rev-parse --show-toplevel))/kafka-site-md:prod-$(shell git rev-parse --abbrev-ref HEAD) \
--tag ghcr.io/$(shell basename $(shell git rev-parse --show-toplevel))/kafka-site-md:prod-$(shell git rev-parse --short HEAD) \
--tag ghcr.io/$(shell basename $(shell git rev-parse --show-toplevel))/kafka-site-md:prod-$(shell date +%Y%m%d-%H%M%S) \
--file Dockerfile.prod \
--push \
.

# Clean the output directory and remove Docker images
clean:
rm -rf $(OUTPUT_DIR)
docker rmi $(DOCKER_IMAGE) $(PROD_IMAGE)
docker rmi $(DOCKER_IMAGE) $(HUGO_BASE_IMAGE) $(PROD_IMAGE)
docker buildx rm multiarch || true
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,4 +167,4 @@ make clean
4. Test locally using `make serve`
5. Submit a pull request

For more details about the migration to Markdown and the overall architecture, see [KIP-1133](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1133%3A+AK+Documentation+and+Website+in+Markdown).
For more details about the migration to Markdown and the overall architecture, see [KIP-1133](https://cwiki.apache.org/confluence/display/KAFKA/KIP-1133%3A+AK+Documentation+and+Website+in+Markdown).
2 changes: 1 addition & 1 deletion content/en/0110/streams/core-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Kafka Streams allows direct read-only queries of the state stores by methods, th

# Processing Guarantees

In stream processing, one of the most frequently asked question is "does my stream processing system guarantee that each record is processed once and only once, even if some failures are encountered in the middle of processing?" Failing to guarantee exactly-once stream processing is a deal-breaker for many applications that cannot tolerate any data-loss or data duplicates, and in that case a batch-oriented framework is usually used in addition to the stream processing pipeline, known as the [Lambda Architecture](http://lambda-architecture.net/). Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics. In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that no duplicates will be generated throughout the pipeline. Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a [transactional and idempotent manner](/#semantics), and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features. More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations. Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects. To read more details on how this is done inside Kafka Streams, readers are recommended to read [KIP-129](https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics). In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the `processing.guarantee` config value to **exactly_once** (default value is **at_least_once**). More details can be found in the [**Kafka Streams Configs**](/0110/documentation#streamsconfigs) section.
In stream processing, one of the most frequently asked question is "does my stream processing system guarantee that each record is processed once and only once, even if some failures are encountered in the middle of processing?" Failing to guarantee exactly-once stream processing is a deal-breaker for many applications that cannot tolerate any data-loss or data duplicates, and in that case a batch-oriented framework is usually used in addition to the stream processing pipeline, known as the [Lambda Architecture](https://en.wikipedia.org/wiki/Lambda_architecture). Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics. In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that no duplicates will be generated throughout the pipeline. Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a [transactional and idempotent manner](/#semantics), and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features. More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations. Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects. To read more details on how this is done inside Kafka Streams, readers are recommended to read [KIP-129](https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics). In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the `processing.guarantee` config value to **exactly_once** (default value is **at_least_once**). More details can be found in the [**Kafka Streams Configs**](/0110/documentation#streamsconfigs) section.

[Previous](/0110/streams/developer-guide) [Next](/0110/streams/architecture)

Expand Down
2 changes: 1 addition & 1 deletion content/en/10/streams/core-concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Kafka Streams allows direct read-only queries of the state stores by methods, th

# Processing Guarantees

In stream processing, one of the most frequently asked question is "does my stream processing system guarantee that each record is processed once and only once, even if some failures are encountered in the middle of processing?" Failing to guarantee exactly-once stream processing is a deal-breaker for many applications that cannot tolerate any data-loss or data duplicates, and in that case a batch-oriented framework is usually used in addition to the stream processing pipeline, known as the [Lambda Architecture](http://lambda-architecture.net/). Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics. In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that no duplicates will be generated throughout the pipeline. Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a [transactional and idempotent manner](/#semantics), and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features. More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations. Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects. To read more details on how this is done inside Kafka Streams, readers are recommended to read [KIP-129](https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics). In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the `processing.guarantee` config value to **exactly_once** (default value is **at_least_once**). More details can be found in the [**Kafka Streams Configs**](/10/documentation#streamsconfigs) section.
In stream processing, one of the most frequently asked question is "does my stream processing system guarantee that each record is processed once and only once, even if some failures are encountered in the middle of processing?" Failing to guarantee exactly-once stream processing is a deal-breaker for many applications that cannot tolerate any data-loss or data duplicates, and in that case a batch-oriented framework is usually used in addition to the stream processing pipeline, known as the [Lambda Architecture](https://en.wikipedia.org/wiki/Lambda_architecture). Prior to 0.11.0.0, Kafka only provides at-least-once delivery guarantees and hence any stream processing systems that leverage it as the backend storage could not guarantee end-to-end exactly-once semantics. In fact, even for those stream processing systems that claim to support exactly-once processing, as long as they are reading from / writing to Kafka as the source / sink, their applications cannot actually guarantee that no duplicates will be generated throughout the pipeline. Since the 0.11.0.0 release, Kafka has added support to allow its producers to send messages to different topic partitions in a [transactional and idempotent manner](/#semantics), and Kafka Streams has hence added the end-to-end exactly-once processing semantics by leveraging these features. More specifically, it guarantees that for any record read from the source Kafka topics, its processing results will be reflected exactly once in the output Kafka topic as well as in the state stores for stateful operations. Note the key difference between Kafka Streams end-to-end exactly-once guarantee with other stream processing frameworks' claimed guarantees is that Kafka Streams tightly integrates with the underlying Kafka storage system and ensure that commits on the input topic offsets, updates on the state stores, and writes to the output topics will be completed atomically instead of treating Kafka as an external system that may have side-effects. To read more details on how this is done inside Kafka Streams, readers are recommended to read [KIP-129](https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Semantics). In order to achieve exactly-once semantics when running Kafka Streams applications, users can simply set the `processing.guarantee` config value to **exactly_once** (default value is **at_least_once**). More details can be found in the [**Kafka Streams Configs**](/10/documentation#streamsconfigs) section.

[Previous](/10/streams/tutorial) [Next](/10/streams/architecture)

Expand Down
Loading