中文 | Website | Tip Overview | Using Tip | More Tip tricks | Youtu-Agent | Youtu-LLM | Performance
Tip is a proactive on-device AI assistant that intelligently understands your current work. As a more user-friendly extension of Youtu-Agent, Tip integrates desktop automation, agent invocation, and more. It is fully open source, supports offline on-device use, and keeps your privacy secure.
Tip is powered by a series of self-developed lightweight models:
- Youtu-LLM: A compact 1.96B model with powerful native agent capabilities.
- Youtu-VL: 4B on-device multimodal large model, comprehensive visual perception capability (soon to be open source)
You are also free to swap out the model for any alternative you prefer.
Click the image below to view the demo video:

Tip focuses on “better interaction, safer privacy, broader capability”:
- One hotkey, as the AI super entry: With minimal interaction, you get the model’s power. Press the hotkey and select text or an image—Tip prepares the context for you. We are building a smarter Spotlight-style entry for a smoother AI experience.
- On-device models for full privacy: We support fully offline calls to local model services. All data and processing can run against your own on-device models. The Youtu-LLM series provides strong performance and agent ability for secure local work.
- Read files, browse pages—no problem: GUI Agent and Youtu Agent capabilities let Tip simulate mouse/keyboard actions for desktop control, connect to agents/MCP servers/tools for complex tasks, and run a multifunction agent locally.
- Data and privacy safety: Many LLM agent apps default to processing data in the cloud. For privacy-sensitive scenarios like social platforms, users may not want screen content sent to cloud models and instead prefer private on-device solutions.
- The last mile of interaction: LLM apps usually start with a chat box and require typing. We want a smarter way to complete context: no manual typing, copy/paste, or image uploads—Tip understands what is on screen, completes context, infers intent, and suggests actions to reduce typing and close the interaction gap.
- On-device agent environment: Most agents live in the cloud, making it hard to run local tasks like “understand and organize local files” or “check chats on a social platform.” We aim to provide a mature framework and environment so users can run a more capable agent locally.
- New Desktop Skills, Learn and Master: We've designed a "GUI skill" mechanism for the GUI Agent, allowing Tip to learn new skills from methods taught to it by users. For example, teaching a large model how to "perform specific data cleanup" or "use user-specific tools to perform tasks," customizing your desktop automation skills.
We provide a download link: GitHub Release
Tip currently supports MacOS devices with Apple Silicon (M-series). More device types are being adapted and packaged quickly.
After downloading, grant the required permissions:
- On first launch, enable screen recording and accessibility permissions so shortcuts and screenshots work correctly.
If Tip is not listed, click the + button, locate Tip, and add it. Permission scope: accessibility is used only to read current selection and simulate keyboard/mouse; screen and audio capture are used only for region screenshots.
- Press
ctrl + shiftto activate Tip and start using it.
In “Settings - Models” you can add models, including on-device offline models (Ollama) or OpenAI SDK-compatible endpoints (local or remote).
Three quick ways to invoke Tip:
- Press
ctrl + shiftto open the chat window and talk directly. - Select some text, then press
ctrl + shift; Tip will pick up the selection and continue the dialog with that context. - Hold
ctrl + shiftto enter screenshot mode: while holding, drag to select a region; release to let Tip read the selected image area and continue the conversation.
We provide Claude-style “skills”: you can teach the model how to operate the computer and let it remember those actions for future use. For example, teach “find the cheapest flights”: open the site, click “sale flights,” then sort by price.
Add more skills under “Settings - GUI Agent” to help Tip operate the desktop more effectively.
Tip integrates Youtu Agent to give the model more abilities. In “Settings - Youtu Agent,” switch to a config file. Two demo configs are available: “File manager” (bash/file management) and “File manager plus” (adds some format-parsing ability).
When selecting a file, use “Right click - Open with - Tip” so Tip gets the file path. Click “Agent Execute” to have Tip interpret the file contents.
Our on-device model service supports two entry points:
Install and start Ollama, pull, and run a local model:
- Download: visit ollama.com and click “Download macOS.”
- Unzip the file, drag
Ollama.appinto Applications, run it, and finish setup (Next -> Install). - Open Terminal and run:
ollama serve - Open another Terminal window and run:
ollama pull <model-name>
Once running, connect Tip:
- In “Settings - Models,” click Add.
- In “Channel,” choose “ollama” and enter the model name.
- Save, then connect it in “Settings - General.”
Our Youtu-LLM on-device models are applying for official Ollama endpoints and will be downloadable soon.
We also support the standard OpenAI SDK entry. You can use any online provider or local services like llama-server.
- In “Settings - Models,” click Add.
- In “Channel,” choose “OpenAI SDK” and fill in
base_url,api_key,model, etc. - Save, then connect it in “Settings - General.”
Due to the limited number of parameters, edge models have relatively limited performance. They may not be able to complete some tasks, and the accuracy of their output text may be lower compared to larger models. We provide a simple introductory table to easily distinguish the current capabilities of the edge model:
| Task Name | Specific Example | Edge Model | Large Model |
|---|---|---|---|
| Search Content | “Search xxx on this page” | ✅ | ✅ |
| Simple Visual Location | “Click the xxx button and enter xxx” | ✅ | ✅ |
| Single-Step Logic Task | “Fill out a form” | ❌ | ✅ |
| Multi-Step Reasoning Planning | “Search for flight tickets and compare prices” | ❌ | ✅ |
| Cross-Application Collaboration | “Copy content from application xx to application xx” | ❌ | ✅ |
| Anomaly Self-Correction | “Retry when an error is encountered” | ✅ | ✅ |
If you encounter a problem that the edge model cannot solve, we recommend deploying a model with a larger number of parameters and a trusted access point to improve the user experience.
The full source code and architecture are open. You can develop and package locally to customize any feature. See: README
We proudly introduce Youtu-LLM: a compact yet powerful LLM with 1.96B parameters, 128K context, and native agent ability. In general evaluations, Youtu-LLM significantly outperforms peers of similar size in commonsense, STEM, coding, and long-context tasks. In agent benchmarks, Youtu-LLM surpasses larger models and completes multiple end-to-end agent tasks.
Youtu-LLM’s main contributions:
- Designed for STEM capability: vocabulary, data mix, and multi-stage curriculum center on STEM and agent performance.
- Native agent ability: trained with 128K context plus Agentic Mid-training to enable more rounds of interaction on-device.
- SOTA performance: based on a dense MLA architecture, Youtu-LLM achieves SOTA results on lightweight LLMs, outperforming traditional dense GQA/MHA. MLA also makes integration into DSV3-oriented ecosystems straightforward.
We provide Base and Instruct models with strong results across benchmarks, plus evaluation code to reproduce scores. See README for details.
| Type | Benchmark (Metric) | # Shots | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base |
|---|---|---|---|---|---|---|---|---|
| Commonsense | MMLU-Pro (EM) | 5 | 34.9% | 35.3% | 29.4% | 46.1% | 36.2% | 48.4% |
| MLQA-Zh (EM) | 3 | 38.1% | 38.0% | 40.3% | 47.2% | 43.0% | 43.5% | |
| MMLU-ProX-Zh (EM) | 5 | 32.5% | 26.7% | 24.2% | 45.2% | 25.4% | 40.7% | |
| STEM | GSM8K (EM) | 8 | 68.2% | 67.3% | 38.5% | 80.8% | 47.8% | 77.6% |
| MGSM-Zh (EM) | 8 | 57.1% | 40.7% | 33.0% | 69.7% | 35.9% | 68.9% | |
| MATH (EM) | 4 | 28.1% | 40.8% | 24.4% | 44.8% | 21.5% | 44.4% | |
| BBH (EM) | 3 | 53.0% | 59.8% | 51.6% | 70.8% | 62.9% | 59.8% | |
| GPQA-MC (Acc. Norm) | 5 | 30.4% | 26.6% | 28.6% | 37.8% | 30.1% | 33.3% | |
| HLE-MC (Acc. Norm) | 3 | 10.7% | 3.1% | 8.0% | 15.0% | 11.5% | 17.4% | |
| Coding | MBPP (Pass@1) | 3 | 55.6% | 51.0% | 45.8% | 67.5% | 49.4% | 66.6% |
| MBPP+ (Pass@1) | 3 | 71.0% | 66.1% | 61.9% | 80.8% | 62.7% | 81.8% | |
| HumanEval (Pass@1) | 0 | 49.9% | 34.8% | 36.6% | 57.6% | 36.0% | 64.6% | |
| HumanEval+ (Pass@1) | 0 | 41.3% | 28.1% | 28.1% | 49.9% | 28.1% | 57.3% | |
| LiveCodeBench v6 (Pass@1) | 3 | 5.1% | 2.9% | 2.9% | 6.9% | 3.4% | 9.7% | |
| CRUXEval (Pass@1) | 1 | 40.6% | 42.1% | 39.7% | 54.8% | 42.3% | 55.9% | |
| RepoBench (EM) | 3 | 21.0% | 21.8% | 23.0% | 25.3% | 25.2% | 22.7% | |
| Long Context | LongBench v2 (Acc.) | 3 | 28.0% | 28.8% | 26.6% | 25.8% | 27.8% | 27.2% |
| NIAH (Acc.) | / | 79.8% | 75.0% | 99.5% | 83.0% | 99.8% | 98.8% |
We takes APTBench for evaluating the agentic capabilities of base model.
| Category | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base |
|---|---|---|---|---|---|---|
| Code | 25.1% | 24.3% | 32.8% | 41.9% | 23.6% | 37.9% |
| Deep Research | 28.5% | 27.2% | 36.4% | 40.5% | 30.0% | 38.6% |
| Math | 59.9% | 60.7% | 59.8% | 70.5% | 60.1% | 68.0% |
| Tool | 56.7% | 59.1% | 61.7% | 65.8% | 64.1% | 64.2% |
| Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen3-1.7B | SmolLM3-3B | Qwen3-4B | DeepSeek-R1-Distill-Llama-8B | Youtu-LLM-2B |
|---|---|---|---|---|---|---|
| Commonsense Knowledge Reasoning | ||||||
| MMLU-Redux | 53.0% | 74.1% | 75.6% | 83.8% | 78.1% | 75.8% |
| MMLU-Pro | 36.5% | 54.9% | 53.0% | 69.1% | 57.5% | 61.6% |
| Instruction Following & Text Reasoning | ||||||
| IFEval | 29.4% | 70.4% | 60.4% | 83.6% | 34.6% | 81.2% |
| DROP | 41.3% | 72.5% | 72.0% | 82.9% | 73.1% | 86.7% |
| MUSR | 43.8% | 56.6% | 54.1% | 60.5% | 59.7% | 57.4% |
| STEM | ||||||
| MATH-500 | 84.8% | 89.8% | 91.8% | 95.0% | 90.8% | 93.7% |
| AIME 24 | 30.2% | 44.2% | 46.7% | 73.3% | 52.5% | 65.4% |
| AIME 25 | 23.1% | 37.1% | 34.2% | 64.2% | 34.4% | 49.8% |
| GPQA-Diamond | 33.6% | 36.9% | 43.8% | 55.2% | 45.5% | 48.0% |
| BBH | 31.0% | 69.1% | 76.3% | 87.8% | 77.8% | 77.5% |
| Coding | ||||||
| HumanEval | 64.0% | 84.8% | 79.9% | 95.4% | 88.1% | 95.9% |
| HumanEval+ | 59.5% | 76.2% | 74.7% | 87.8% | 82.5% | 89.0% |
| MBPP | 51.5% | 80.5% | 66.7% | 92.3% | 73.9% | 85.0% |
| MBPP+ | 44.2% | 67.7% | 56.7% | 77.6% | 61.0% | 71.7% |
| LiveCodeBench v6 | 19.8% | 30.7% | 30.8% | 48.5% | 36.8% | 43.7% |
| Benchmark | Qwen3-1.7B | SmolLM3-3B | Qwen3-4B | Youtu-LLM-2B |
|---|---|---|---|---|
| Deep Research | ||||
| GAIA | 11.4% | 11.7% | 25.5% | 33.9% |
| xbench | 11.7% | 13.9% | 18.4% | 19.5% |
| Code | ||||
| SWE-Bench-Verified | 0.6% | 7.2% | 5.7% | 17.7% |
| EnConda-Bench | 10.8% | 3.5% | 16.1% | 21.5% |
| Tool | ||||
| BFCL V3 | 55.5% | 31.5% | 61.7% | 58.0% |
| τ²-Bench | 2.6% | 9.7% | 10.9% | 15.0% |
Usage:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tencent/Youtu-LLM-2B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
"tencent/Youtu-LLM-2B",
device_map="auto",
trust_remote_code=True
)We provide a quick start covering “inference with transformers,” “configure thinking mode,” “tune decoding params,” and “deploy with vLLM and tool use.” See: README
Youtu-Tip and Youtu-LLM are open-sourced under LICENSE.
If you find this work useful, please consider citing:
@article{youtu-agent,
title={Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization},
author={Tencent Youtu Lab},
year={2025},
eprint={2512.24615},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.24615},
}
@article{youtu-llm,
title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models},
author={Tencent Youtu Lab},
year={2025},
eprint={2512.24618},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.24618},
}
