Skip to content

Tiamat-Tech/youtu-tip

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Youtu Tip Header

中文 | Website | Tip Overview | Using Tip | More Tip tricks | Youtu-Agent | Youtu-LLM | Performance

Tip is a proactive on-device AI assistant that intelligently understands your current work. As a more user-friendly extension of Youtu-Agent, Tip integrates desktop automation, agent invocation, and more. It is fully open source, supports offline on-device use, and keeps your privacy secure.

Tip is powered by a series of self-developed lightweight models:

You are also free to swap out the model for any alternative you prefer.

Click the image below to view the demo video: Tip Demo


What is Tip

Tip’s core traits

Tip focuses on “better interaction, safer privacy, broader capability”:

  • One hotkey, as the AI super entry: With minimal interaction, you get the model’s power. Press the hotkey and select text or an image—Tip prepares the context for you. We are building a smarter Spotlight-style entry for a smoother AI experience.
  • On-device models for full privacy: We support fully offline calls to local model services. All data and processing can run against your own on-device models. The Youtu-LLM series provides strong performance and agent ability for secure local work.
  • Read files, browse pages—no problem: GUI Agent and Youtu Agent capabilities let Tip simulate mouse/keyboard actions for desktop control, connect to agents/MCP servers/tools for complex tasks, and run a multifunction agent locally.

Why Tip was built

  • Data and privacy safety: Many LLM agent apps default to processing data in the cloud. For privacy-sensitive scenarios like social platforms, users may not want screen content sent to cloud models and instead prefer private on-device solutions.
  • The last mile of interaction: LLM apps usually start with a chat box and require typing. We want a smarter way to complete context: no manual typing, copy/paste, or image uploads—Tip understands what is on screen, completes context, infers intent, and suggests actions to reduce typing and close the interaction gap.
  • On-device agent environment: Most agents live in the cloud, making it hard to run local tasks like “understand and organize local files” or “check chats on a social platform.” We aim to provide a mature framework and environment so users can run a more capable agent locally.
  • New Desktop Skills, Learn and Master: We've designed a "GUI skill" mechanism for the GUI Agent, allowing Tip to learn new skills from methods taught to it by users. For example, teaching a large model how to "perform specific data cleanup" or "use user-specific tools to perform tasks," customizing your desktop automation skills.

How to use Tip

Installer

We provide a download link: GitHub Release

Tip currently supports MacOS devices with Apple Silicon (M-series). More device types are being adapted and packaged quickly.

After downloading, grant the required permissions:

  • On first launch, enable screen recording and accessibility permissions so shortcuts and screenshots work correctly.

    If Tip is not listed, click the + button, locate Tip, and add it. Permission scope: accessibility is used only to read current selection and simulate keyboard/mouse; screen and audio capture are used only for region screenshots.

  • Press ctrl + shift to activate Tip and start using it.

Permissions screenshot

Quick start

In “Settings - Models” you can add models, including on-device offline models (Ollama) or OpenAI SDK-compatible endpoints (local or remote).

Three quick ways to invoke Tip:

  • Press ctrl + shift to open the chat window and talk directly.
  • Select some text, then press ctrl + shift; Tip will pick up the selection and continue the dialog with that context.
  • Hold ctrl + shift to enter screenshot mode: while holding, drag to select a region; release to let Tip read the selected image area and continue the conversation.

More Tip tricks

GUI skills

We provide Claude-style “skills”: you can teach the model how to operate the computer and let it remember those actions for future use. For example, teach “find the cheapest flights”: open the site, click “sale flights,” then sort by price.

Add more skills under “Settings - GUI Agent” to help Tip operate the desktop more effectively.

Youtu Agent

Tip integrates Youtu Agent to give the model more abilities. In “Settings - Youtu Agent,” switch to a config file. Two demo configs are available: “File manager” (bash/file management) and “File manager plus” (adds some format-parsing ability).

When selecting a file, use “Right click - Open with - Tip” so Tip gets the file path. Click “Agent Execute” to have Tip interpret the file contents.

Connect on-device models

Our on-device model service supports two entry points:

Use the Ollama endpoint

Install and start Ollama, pull, and run a local model:

  1. Download: visit ollama.com and click “Download macOS.”
  2. Unzip the file, drag Ollama.app into Applications, run it, and finish setup (Next -> Install).
  3. Open Terminal and run: ollama serve
  4. Open another Terminal window and run: ollama pull <model-name>

Once running, connect Tip:

  1. In “Settings - Models,” click Add.
  2. In “Channel,” choose “ollama” and enter the model name.
  3. Save, then connect it in “Settings - General.”

Our Youtu-LLM on-device models are applying for official Ollama endpoints and will be downloadable soon.

Use the OpenAI endpoint

We also support the standard OpenAI SDK entry. You can use any online provider or local services like llama-server.

  1. In “Settings - Models,” click Add.
  2. In “Channel,” choose “OpenAI SDK” and fill in base_url, api_key, model, etc.
  3. Save, then connect it in “Settings - General.”

Capability Description

Due to the limited number of parameters, edge models have relatively limited performance. They may not be able to complete some tasks, and the accuracy of their output text may be lower compared to larger models. We provide a simple introductory table to easily distinguish the current capabilities of the edge model:

Task Name Specific Example Edge Model Large Model
Search Content “Search xxx on this page”
Simple Visual Location “Click the xxx button and enter xxx”
Single-Step Logic Task “Fill out a form”
Multi-Step Reasoning Planning “Search for flight tickets and compare prices”
Cross-Application Collaboration “Copy content from application xx to application xx”
Anomaly Self-Correction “Retry when an error is encountered”

If you encounter a problem that the edge model cannot solve, we recommend deploying a model with a larger number of parameters and a trusted access point to improve the user experience.

Local development

The full source code and architecture are open. You can develop and package locally to customize any feature. See: README


Youtu-LLM: Small and powerful

We proudly introduce Youtu-LLM: a compact yet powerful LLM with 1.96B parameters, 128K context, and native agent ability. In general evaluations, Youtu-LLM significantly outperforms peers of similar size in commonsense, STEM, coding, and long-context tasks. In agent benchmarks, Youtu-LLM surpasses larger models and completes multiple end-to-end agent tasks.

Highlights

Youtu-LLM’s main contributions:

  • Designed for STEM capability: vocabulary, data mix, and multi-stage curriculum center on STEM and agent performance.
  • Native agent ability: trained with 128K context plus Agentic Mid-training to enable more rounds of interaction on-device.
  • SOTA performance: based on a dense MLA architecture, Youtu-LLM achieves SOTA results on lightweight LLMs, outperforming traditional dense GQA/MHA. MLA also makes integration into DSV3-oriented ecosystems straightforward.

Performance comparison

We provide Base and Instruct models with strong results across benchmarks, plus evaluation code to reproduce scores. See README for details.

Base Model

General Benchmarks

Type Benchmark (Metric) # Shots Qwen3-1.7B-Base SmoLM3-3B-Base Gemma3-4B-Base Qwen3-4B-Base Llama3.1-8B Youtu-LLM-2B-Base
Commonsense MMLU-Pro (EM) 5 34.9% 35.3% 29.4% 46.1% 36.2% 48.4%
MLQA-Zh (EM) 3 38.1% 38.0% 40.3% 47.2% 43.0% 43.5%
MMLU-ProX-Zh (EM) 5 32.5% 26.7% 24.2% 45.2% 25.4% 40.7%
STEM GSM8K (EM) 8 68.2% 67.3% 38.5% 80.8% 47.8% 77.6%
MGSM-Zh (EM) 8 57.1% 40.7% 33.0% 69.7% 35.9% 68.9%
MATH (EM) 4 28.1% 40.8% 24.4% 44.8% 21.5% 44.4%
BBH (EM) 3 53.0% 59.8% 51.6% 70.8% 62.9% 59.8%
GPQA-MC (Acc. Norm) 5 30.4% 26.6% 28.6% 37.8% 30.1% 33.3%
HLE-MC (Acc. Norm) 3 10.7% 3.1% 8.0% 15.0% 11.5% 17.4%
Coding MBPP (Pass@1) 3 55.6% 51.0% 45.8% 67.5% 49.4% 66.6%
MBPP+ (Pass@1) 3 71.0% 66.1% 61.9% 80.8% 62.7% 81.8%
HumanEval (Pass@1) 0 49.9% 34.8% 36.6% 57.6% 36.0% 64.6%
HumanEval+ (Pass@1) 0 41.3% 28.1% 28.1% 49.9% 28.1% 57.3%
LiveCodeBench v6 (Pass@1) 3 5.1% 2.9% 2.9% 6.9% 3.4% 9.7%
CRUXEval (Pass@1) 1 40.6% 42.1% 39.7% 54.8% 42.3% 55.9%
RepoBench (EM) 3 21.0% 21.8% 23.0% 25.3% 25.2% 22.7%
Long Context LongBench v2 (Acc.) 3 28.0% 28.8% 26.6% 25.8% 27.8% 27.2%
NIAH (Acc.) / 79.8% 75.0% 99.5% 83.0% 99.8% 98.8%

Agentic Benchmarks

We takes APTBench for evaluating the agentic capabilities of base model.

Category Qwen3-1.7B-Base SmoLM3-3B-Base Gemma3-4B-Base Qwen3-4B-Base Llama3.1-8B Youtu-LLM-2B-Base
Code 25.1% 24.3% 32.8% 41.9% 23.6% 37.9%
Deep Research 28.5% 27.2% 36.4% 40.5% 30.0% 38.6%
Math 59.9% 60.7% 59.8% 70.5% 60.1% 68.0%
Tool 56.7% 59.1% 61.7% 65.8% 64.1% 64.2%

Instruct Model

General Benchmarks

Benchmark DeepSeek-R1-Distill-Qwen-1.5B Qwen3-1.7B SmolLM3-3B Qwen3-4B DeepSeek-R1-Distill-Llama-8B Youtu-LLM-2B
Commonsense Knowledge Reasoning
MMLU-Redux 53.0% 74.1% 75.6% 83.8% 78.1% 75.8%
MMLU-Pro 36.5% 54.9% 53.0% 69.1% 57.5% 61.6%
Instruction Following & Text Reasoning
IFEval 29.4% 70.4% 60.4% 83.6% 34.6% 81.2%
DROP 41.3% 72.5% 72.0% 82.9% 73.1% 86.7%
MUSR 43.8% 56.6% 54.1% 60.5% 59.7% 57.4%
STEM
MATH-500 84.8% 89.8% 91.8% 95.0% 90.8% 93.7%
AIME 24 30.2% 44.2% 46.7% 73.3% 52.5% 65.4%
AIME 25 23.1% 37.1% 34.2% 64.2% 34.4% 49.8%
GPQA-Diamond 33.6% 36.9% 43.8% 55.2% 45.5% 48.0%
BBH 31.0% 69.1% 76.3% 87.8% 77.8% 77.5%
Coding
HumanEval 64.0% 84.8% 79.9% 95.4% 88.1% 95.9%
HumanEval+ 59.5% 76.2% 74.7% 87.8% 82.5% 89.0%
MBPP 51.5% 80.5% 66.7% 92.3% 73.9% 85.0%
MBPP+ 44.2% 67.7% 56.7% 77.6% 61.0% 71.7%
LiveCodeBench v6 19.8% 30.7% 30.8% 48.5% 36.8% 43.7%

Agentic Benchmarks

Benchmark Qwen3-1.7B SmolLM3-3B Qwen3-4B Youtu-LLM-2B
Deep Research
GAIA 11.4% 11.7% 25.5% 33.9%
xbench 11.7% 13.9% 18.4% 19.5%
Code
SWE-Bench-Verified 0.6% 7.2% 5.7% 17.7%
EnConda-Bench 10.8% 3.5% 16.1% 21.5%
Tool
BFCL V3 55.5% 31.5% 61.7% 58.0%
τ²-Bench 2.6% 9.7% 10.9% 15.0%

Using Youtu-LLM

Usage:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/Youtu-LLM-2B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "tencent/Youtu-LLM-2B",
    device_map="auto",
    trust_remote_code=True
)

We provide a quick start covering “inference with transformers,” “configure thinking mode,” “tune decoding params,” and “deploy with vLLM and tool use.” See: README

License

Youtu-Tip and Youtu-LLM are open-sourced under LICENSE.

📚 Citation

If you find this work useful, please consider citing:

@article{youtu-agent,
  title={Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization}, 
  author={Tencent Youtu Lab},
  year={2025},
  eprint={2512.24615},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2512.24615}, 
}

@article{youtu-llm,
  title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models},
  author={Tencent Youtu Lab},
  year={2025},
  eprint={2512.24618},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2512.24618}, 
}

About

Youtu-Tip: Tap for Intelligence, Keep on Device.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 49.6%
  • TypeScript 43.2%
  • HTML 5.8%
  • Other 1.4%