Skip to content

Commit c9039d1

Browse files
melissawmGoogle-ML-Automation
authored andcommitted
Copybara import of the project:
-- 15b8bf0 by Melissa Weber Mendonça <[email protected]>: Update index with more recent README information -- a1b4422 by Melissa Weber Mendonça <[email protected]>: Update with README contents from #2279 -- 67bc621 by Melissa Weber Mendonça <[email protected]>: Update announcements COPYBARA_INTEGRATE_REVIEW=#2299 from melissawm:update-index 67bc621 PiperOrigin-RevId: 806337467
1 parent d972e35 commit c9039d1

File tree

4 files changed

+79
-34
lines changed

4 files changed

+79
-34
lines changed

docs/guides.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
limitations under the License.
1515
-->
1616

17-
# How-to guides
17+
# How-to Guides
1818

1919
```{toctree}
2020
:maxdepth: 1

docs/guides/monitor_goodput.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
limitations under the License.
1515
-->
1616

17+
(monitor-goodput)=
1718
# ML Goodput Measurement
1819

1920
MaxText supports automatic measurement and upload of workload metrics such as Goodput, Badput Breakdown and Step Time Deviation using the ML Goodput Measurement library.

docs/guides/use_vertex_ai_tensorboard.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
limitations under the License.
1515
-->
1616

17+
(vertex-ai-tensorboard)=
1718
# Use Vertex AI Tensorboard
1819

1920
MaxText supports automatic upload of logs collected in a directory to a Tensorboard instance in Vertex AI. For more information on how MaxText supports this feature, visit [cloud-accelerator-diagnostics](https://pypi.org/project/cloud-accelerator-diagnostics) PyPI package documentation.

docs/index.md

Lines changed: 76 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,96 @@
11
<!--
2-
Copyright 2024 Google LLC
2+
# Copyright 2023–2025 Google LLC
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# https://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
-->
316

4-
Licensed under the Apache License, Version 2.0 (the "License");
5-
you may not use this file except in compliance with the License.
6-
You may obtain a copy of the License at
17+
# MaxText
718

8-
https://www.apache.org/licenses/LICENSE-2.0
19+
MaxText is a high performance, highly scalable, open-source LLM library and reference implementation written in pure Python/[JAX](https://docs.jax.dev/en/latest/jax-101.html) and targeting Google Cloud TPUs and GPUs for training.
920

10-
Unless required by applicable law or agreed to in writing, software
11-
distributed under the License is distributed on an "AS IS" BASIS,
12-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13-
See the License for the specific language governing permissions and
14-
limitations under the License.
15-
-->
21+
MaxText provides a library of high performance models to choose from, including Gemma, Llama, DeepSeek, Qwen, and Mistral. For each of these models, MaxText supports pre-training (up to tens of thousands of chips) and scalable post-training, with popular techniques like Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO, a type of Reinforcement Learning).
1622

17-
```{include} ../README.md
23+
MaxText achieves high Model FLOPs Utilization (MFU) and tokens/second from single host to very large clusters while staying simple and largely "optimization-free" thanks to the power of JAX and the XLA compiler.
1824

19-
```
25+
MaxText is the launching point for ambitious LLM projects both in research and production. We encourage you to start by experimenting with MaxText out of the box and then fork and modify MaxText to meet your needs.
26+
27+
Check out our [Read The Docs site](https://maxtext.readthedocs.io/en/latest/) or directly [Get Started](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/tutorials/first_run.md) with your first MaxText run. If you’re interested in Diffusion models (Wan 2.1, Flux, etc), see the [MaxDiffusion](https://github.com/AI-Hypercomputer/maxdiffusion) repository in our AI Hypercomputer GitHub organization.
28+
29+
## 🔥 Latest news 🔥
30+
31+
* [September 5, 2025] MaxText has moved to an `src` layout as part of [RESTRUCTURE.md]
32+
* [August 13, 2025] The Qwen3 2507 MoE family of models is now supported: MoEs: 235B Thinking & 280B Coder as well as existing dense models: 0.6B, 4B, 8B, 14B, and 32B.
33+
* [July 27, 2025] Updated TFLOPS/s calculation ([PR](https://github.com/AI-Hypercomputer/maxtext/pull/1988)) to account for causal attention, dividing the attention flops in half. Accounted for sliding window and chunked attention reduced attention flops in [PR](https://github.com/AI-Hypercomputer/maxtext/pull/2009) and [PR](https://github.com/AI-Hypercomputer/maxtext/pull/2030). Changes impact large sequence configs, as explained in this [doc](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/guides/performance_metrics.md)
34+
* [July 16, 2025] We will be restructuring the MaxText repository for improved organization and clarity. Please review the [proposed structure](https://github.com/AI-Hypercomputer/maxtext/blob/main/RESTRUCTURE.md) and provide feedback.
35+
* [July 11, 2025] Multi-Token Prediction (MTP) training support\! Adds an auxiliary loss based on predicting multiple future tokens, inspired by [DeepSeek-V3 paper](https://arxiv.org/html/2412.19437v1), to enhance training efficiency.
36+
* [June 25, 2025] DeepSeek R1-0528 variant is now supported
37+
* [April 24, 2025] Llama 4 Maverick models are now supported
38+
39+
## Use cases
40+
41+
MaxText provides a library of models and demonstrates how to perform pre-training or post-training with high performance and scale. MaxText leverages [JAX AI libraries](https://docs.jaxstack.ai/en/latest/getting_started.html) and presents a cohesive and comprehensive demonstration of training at scale by using [Flax](https://flax.readthedocs.io/en/latest/) (neural networks), [Tunix](https://github.com/google/tunix) (post-training), [Orbax](https://orbax.readthedocs.io/en/latest/) (checkpointing), [Optax](https://optax.readthedocs.io/en/latest/) (optimization), and [Grain](https://google-grain.readthedocs.io/en/latest/) (dataloading). In addition to pure text-based LLMs, we also support multi-modal training with Gemma 3 and Llama 4 VLMs.
42+
43+
### Pre-training
44+
45+
If you’re building models from scratch, MaxText can serve as a reference implementation for experimentation, ideation, and inspiration \- just fork and modify MaxText to train your model, whether it’s a small dense model like Llama 8B, or a large MoE like DeepSeek-V3. Experiment with configs and model design to build the most efficient model on TPU or GPU.
46+
47+
MaxText provides opinionated implementations for how to achieve optimal performance across a wide variety of dimensions like sharding, quantization, and checkpointing.
48+
49+
### Post-training
2050

21-
## Learn more
51+
If you are post-training a model, whether it is proprietary or open source, MaxText provides a scalable framework using Tunix. For RL (like GRPO), we leverage vLLM for sampling and Pathways (soon) for multi-host.
2252

23-
::::{grid} 1 1 2 2
24-
:gutter: 2
53+
Our goal is to provide a variety of models (dimension “a”) and techniques (dimension “b”), so you can easily explore (a) \* (b) combinations and efficiently train the perfect model for your use case.
2554

26-
:::{grid-item-card}
27-
:link: full-finetuning
28-
:link-type: ref
29-
:class-card: sd-text-black sd-bg-light
55+
Check out these getting started guides:
3056

31-
{material-regular}`settings;2em` Full finetuning and training with Llama3
32-
:::
57+
* [SFT](https://github.com/AI-Hypercomputer/maxtext/blob/main/end_to_end/tpu/llama3.1/8b/run_sft.sh) (Supervised Fine Tuning)
58+
* [GRPO](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/grpo.md) (Group Relative Policy Optimization)
3359

34-
:::{grid-item-card}
35-
:link: first-run
36-
:link-type: ref
37-
:class-card: sd-text-black sd-bg-light
60+
### Model library
3861

39-
{material-regular}`rocket_launch;2em` First run
40-
:::
41-
::::
62+
MaxText aims to provide you with the best OSS models, whether as a reference implementation, or to post-train and then serve with vLLM.
4263

43-
## Code repository
64+
We “tier” each model for how optimized it is in the MaxText framework for a given hardware platform. When a new model is initially added, it’s not yet optimized and then, as we see demand in the OSS ecosystem, we invest the resources to optimize performance (i.e., MFU & tokens/sec, etc).
4465

45-
You can find the latest version of MaxText at https://github.com/AI-Hypercomputer/maxtext
66+
**Supported JAX models in MaxText**
4667

47-
## In-depth documentation
68+
* Google
69+
* Gemma 3 (4B, 12B, 27B)
70+
* Gemma 2 (2B, 9B, 27B)
71+
* Gemma 1 (2B, 7B)
72+
* Alibaba
73+
* Qwen 3 MoE 2507 (235B, 480B)
74+
* Qwen 3 MoE (30B, 235B)
75+
* Qwen 3 Dense (0.6B, 1.7B, 4B, 8B, 14B, 32B)
76+
* DeepSeek
77+
* DeepSeek-V2 (16B, 236B)
78+
* DeepSeek-V3 0528 (671B)
79+
* Meta
80+
* Llama 4 Scout (109B) & Maverick (400B)
81+
* Llama 3.3 70B, 3.1 (8B, 70B, 405B), 3.0 (8B, 70B, 405B)
82+
* Llama 2 (7B, 13B, 70B)
83+
* Open AI
84+
* GPT3 (52k, 6B, 22B, 175B)
85+
* Mistral
86+
* Mixtral (8x7B, 8x22B)
87+
* Mistral (7B)
88+
* Diffusion Models
89+
* See [MaxDiffusion](https://github.com/AI-Hypercomputer/maxdiffusion) (Wan 2.1, Flux, SDXL, etc)
4890

49-
You can find in-depth documentation at [the MaxText GitHub repository](https://github.com/AI-Hypercomputer/maxtext/blob/main/docs/advanced_docs/).
91+
## Get involved
5092

93+
Please join our [Discord Channel](https://discord.com/invite/2H9PhvTcDU) and if you have feedback, you can file a feature request, documentation request, or bug report [here](https://github.com/AI-Hypercomputer/maxtext/issues/new/choose).
5194

5295
```{toctree}
5396
:maxdepth: 2

0 commit comments

Comments
 (0)