GitHub - Tiamat-Tech/youtu-tip: Youtu-Tip: Tap for Intelligence, Keep on Device.

Tip is a proactive on-device AI assistant that intelligently understands your current work. As a more user-friendly extension of Youtu-Agent, Tip integrates desktop automation, agent invocation, and more. It is fully open source, supports offline on-device use, and keeps your privacy secure.

Tip is powered by a series of self-developed lightweight models:

Youtu-LLM: A compact 1.96B model with powerful native agent capabilities.

🤗 Model | 📑 Technical Report | 🚀 Quick Start Guide
Youtu-VL: 4B on-device multimodal large model, comprehensive visual perception capability (soon to be open source)

You are also free to swap out the model for any alternative you prefer.

Click the image below to view the demo video:

What is Tip

Tip’s core traits

Tip focuses on “better interaction, safer privacy, broader capability”:

One hotkey, as the AI super entry: With minimal interaction, you get the model’s power. Press the hotkey and select text or an image—Tip prepares the context for you. We are building a smarter Spotlight-style entry for a smoother AI experience.
On-device models for full privacy: We support fully offline calls to local model services. All data and processing can run against your own on-device models. The Youtu-LLM series provides strong performance and agent ability for secure local work.
Read files, browse pages—no problem: GUI Agent and Youtu Agent capabilities let Tip simulate mouse/keyboard actions for desktop control, connect to agents/MCP servers/tools for complex tasks, and run a multifunction agent locally.

Why Tip was built

Data and privacy safety: Many LLM agent apps default to processing data in the cloud. For privacy-sensitive scenarios like social platforms, users may not want screen content sent to cloud models and instead prefer private on-device solutions.
The last mile of interaction: LLM apps usually start with a chat box and require typing. We want a smarter way to complete context: no manual typing, copy/paste, or image uploads—Tip understands what is on screen, completes context, infers intent, and suggests actions to reduce typing and close the interaction gap.
On-device agent environment: Most agents live in the cloud, making it hard to run local tasks like “understand and organize local files” or “check chats on a social platform.” We aim to provide a mature framework and environment so users can run a more capable agent locally.
New Desktop Skills, Learn and Master: We've designed a "GUI skill" mechanism for the GUI Agent, allowing Tip to learn new skills from methods taught to it by users. For example, teaching a large model how to "perform specific data cleanup" or "use user-specific tools to perform tasks," customizing your desktop automation skills.

How to use Tip

Installer

We provide a download link: GitHub Release

Tip currently supports MacOS devices with Apple Silicon (M-series). More device types are being adapted and packaged quickly.

After downloading, grant the required permissions:

On first launch, enable screen recording and accessibility permissions so shortcuts and screenshots work correctly.

If Tip is not listed, click the + button, locate Tip, and add it. Permission scope: accessibility is used only to read current selection and simulate keyboard/mouse; screen and audio capture are used only for region screenshots.
Press ctrl + shift to activate Tip and start using it.

Quick start

In “Settings - Models” you can add models, including on-device offline models (Ollama) or OpenAI SDK-compatible endpoints (local or remote).

Three quick ways to invoke Tip:

Press ctrl + shift to open the chat window and talk directly.
Select some text, then press ctrl + shift; Tip will pick up the selection and continue the dialog with that context.
Hold ctrl + shift to enter screenshot mode: while holding, drag to select a region; release to let Tip read the selected image area and continue the conversation.

More Tip tricks

GUI skills

We provide Claude-style “skills”: you can teach the model how to operate the computer and let it remember those actions for future use. For example, teach “find the cheapest flights”: open the site, click “sale flights,” then sort by price.

Add more skills under “Settings - GUI Agent” to help Tip operate the desktop more effectively.

Youtu Agent

Tip integrates Youtu Agent to give the model more abilities. In “Settings - Youtu Agent,” switch to a config file. Two demo configs are available: “File manager” (bash/file management) and “File manager plus” (adds some format-parsing ability).

When selecting a file, use “Right click - Open with - Tip” so Tip gets the file path. Click “Agent Execute” to have Tip interpret the file contents.

Connect on-device models

Our on-device model service supports two entry points:

Use the Ollama endpoint

Install and start Ollama, pull, and run a local model:

Download: visit ollama.com and click “Download macOS.”
Unzip the file, drag Ollama.app into Applications, run it, and finish setup (Next -> Install).
Open Terminal and run: ollama serve
Open another Terminal window and run: ollama pull <model-name>

Once running, connect Tip:

In “Settings - Models,” click Add.
In “Channel,” choose “ollama” and enter the model name.
Save, then connect it in “Settings - General.”

Our Youtu-LLM on-device models are applying for official Ollama endpoints and will be downloadable soon.

Use the OpenAI endpoint

We also support the standard OpenAI SDK entry. You can use any online provider or local services like llama-server.

In “Settings - Models,” click Add.
In “Channel,” choose “OpenAI SDK” and fill in base_url, api_key, model, etc.
Save, then connect it in “Settings - General.”

Capability Description

Due to the limited number of parameters, edge models have relatively limited performance. They may not be able to complete some tasks, and the accuracy of their output text may be lower compared to larger models. We provide a simple introductory table to easily distinguish the current capabilities of the edge model:

Task Name	Specific Example	Edge Model	Large Model
Search Content	“Search xxx on this page”	✅	✅
Simple Visual Location	“Click the xxx button and enter xxx”	✅	✅
Single-Step Logic Task	“Fill out a form”	❌	✅
Multi-Step Reasoning Planning	“Search for flight tickets and compare prices”	❌	✅
Cross-Application Collaboration	“Copy content from application xx to application xx”	❌	✅
Anomaly Self-Correction	“Retry when an error is encountered”	✅	✅

If you encounter a problem that the edge model cannot solve, we recommend deploying a model with a larger number of parameters and a trusted access point to improve the user experience.

Local development

The full source code and architecture are open. You can develop and package locally to customize any feature. See: README

Youtu-LLM: Small and powerful

We proudly introduce Youtu-LLM: a compact yet powerful LLM with 1.96B parameters, 128K context, and native agent ability. In general evaluations, Youtu-LLM significantly outperforms peers of similar size in commonsense, STEM, coding, and long-context tasks. In agent benchmarks, Youtu-LLM surpasses larger models and completes multiple end-to-end agent tasks.

Highlights

Youtu-LLM’s main contributions:

Designed for STEM capability: vocabulary, data mix, and multi-stage curriculum center on STEM and agent performance.
Native agent ability: trained with 128K context plus Agentic Mid-training to enable more rounds of interaction on-device.
SOTA performance: based on a dense MLA architecture, Youtu-LLM achieves SOTA results on lightweight LLMs, outperforming traditional dense GQA/MHA. MLA also makes integration into DSV3-oriented ecosystems straightforward.

Performance comparison

We provide Base and Instruct models with strong results across benchmarks, plus evaluation code to reproduce scores. See README for details.

Base Model

General Benchmarks

Type	Benchmark (Metric)	# Shots	Qwen3-1.7B-Base	SmoLM3-3B-Base	Gemma3-4B-Base	Qwen3-4B-Base	Llama3.1-8B	Youtu-LLM-2B-Base
Commonsense	MMLU-Pro (EM)	5	34.9%	35.3%	29.4%	46.1%	36.2%	48.4%
	MLQA-Zh (EM)	3	38.1%	38.0%	40.3%	47.2%	43.0%	43.5%
	MMLU-ProX-Zh (EM)	5	32.5%	26.7%	24.2%	45.2%	25.4%	40.7%
STEM	GSM8K (EM)	8	68.2%	67.3%	38.5%	80.8%	47.8%	77.6%
	MGSM-Zh (EM)	8	57.1%	40.7%	33.0%	69.7%	35.9%	68.9%
	MATH (EM)	4	28.1%	40.8%	24.4%	44.8%	21.5%	44.4%
	BBH (EM)	3	53.0%	59.8%	51.6%	70.8%	62.9%	59.8%
	GPQA-MC (Acc. Norm)	5	30.4%	26.6%	28.6%	37.8%	30.1%	33.3%
	HLE-MC (Acc. Norm)	3	10.7%	3.1%	8.0%	15.0%	11.5%	17.4%
Coding	MBPP (Pass@1)	3	55.6%	51.0%	45.8%	67.5%	49.4%	66.6%
	MBPP+ (Pass@1)	3	71.0%	66.1%	61.9%	80.8%	62.7%	81.8%
	HumanEval (Pass@1)	0	49.9%	34.8%	36.6%	57.6%	36.0%	64.6%
	HumanEval+ (Pass@1)	0	41.3%	28.1%	28.1%	49.9%	28.1%	57.3%
	LiveCodeBench v6 (Pass@1)	3	5.1%	2.9%	2.9%	6.9%	3.4%	9.7%
	CRUXEval (Pass@1)	1	40.6%	42.1%	39.7%	54.8%	42.3%	55.9%
	RepoBench (EM)	3	21.0%	21.8%	23.0%	25.3%	25.2%	22.7%
Long Context	LongBench v2 (Acc.)	3	28.0%	28.8%	26.6%	25.8%	27.8%	27.2%
	NIAH (Acc.)	/	79.8%	75.0%	99.5%	83.0%	99.8%	98.8%

Agentic Benchmarks

We takes APTBench for evaluating the agentic capabilities of base model.

Category	Qwen3-1.7B-Base	SmoLM3-3B-Base	Gemma3-4B-Base	Qwen3-4B-Base	Llama3.1-8B	Youtu-LLM-2B-Base
Code	25.1%	24.3%	32.8%	41.9%	23.6%	37.9%
Deep Research	28.5%	27.2%	36.4%	40.5%	30.0%	38.6%
Math	59.9%	60.7%	59.8%	70.5%	60.1%	68.0%
Tool	56.7%	59.1%	61.7%	65.8%	64.1%	64.2%

Instruct Model

General Benchmarks

Benchmark	DeepSeek-R1-Distill-Qwen-1.5B	Qwen3-1.7B	SmolLM3-3B	Qwen3-4B	DeepSeek-R1-Distill-Llama-8B	Youtu-LLM-2B
Commonsense Knowledge Reasoning
MMLU-Redux	53.0%	74.1%	75.6%	83.8%	78.1%	75.8%
MMLU-Pro	36.5%	54.9%	53.0%	69.1%	57.5%	61.6%
Instruction Following & Text Reasoning
IFEval	29.4%	70.4%	60.4%	83.6%	34.6%	81.2%
DROP	41.3%	72.5%	72.0%	82.9%	73.1%	86.7%
MUSR	43.8%	56.6%	54.1%	60.5%	59.7%	57.4%
STEM
MATH-500	84.8%	89.8%	91.8%	95.0%	90.8%	93.7%
AIME 24	30.2%	44.2%	46.7%	73.3%	52.5%	65.4%
AIME 25	23.1%	37.1%	34.2%	64.2%	34.4%	49.8%
GPQA-Diamond	33.6%	36.9%	43.8%	55.2%	45.5%	48.0%
BBH	31.0%	69.1%	76.3%	87.8%	77.8%	77.5%
Coding
HumanEval	64.0%	84.8%	79.9%	95.4%	88.1%	95.9%
HumanEval+	59.5%	76.2%	74.7%	87.8%	82.5%	89.0%
MBPP	51.5%	80.5%	66.7%	92.3%	73.9%	85.0%
MBPP+	44.2%	67.7%	56.7%	77.6%	61.0%	71.7%
LiveCodeBench v6	19.8%	30.7%	30.8%	48.5%	36.8%	43.7%

Agentic Benchmarks

Benchmark	Qwen3-1.7B	SmolLM3-3B	Qwen3-4B	Youtu-LLM-2B
Deep Research
GAIA	11.4%	11.7%	25.5%	33.9%
xbench	11.7%	13.9%	18.4%	19.5%
Code
SWE-Bench-Verified	0.6%	7.2%	5.7%	17.7%
EnConda-Bench	10.8%	3.5%	16.1%	21.5%
Tool
BFCL V3	55.5%	31.5%	61.7%	58.0%
τ²-Bench	2.6%	9.7%	10.9%	15.0%

Using Youtu-LLM

Usage:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("tencent/Youtu-LLM-2B", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    "tencent/Youtu-LLM-2B",
    device_map="auto",
    trust_remote_code=True
)

We provide a quick start covering “inference with transformers,” “configure thinking mode,” “tune decoding params,” and “deploy with vLLM and tool use.” See: README

License

Youtu-Tip and Youtu-LLM are open-sourced under LICENSE.

📚 Citation

If you find this work useful, please consider citing:

@article{youtu-agent,
  title={Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization}, 
  author={Tencent Youtu Lab},
  year={2025},
  eprint={2512.24615},
  archivePrefix={arXiv},
  primaryClass={cs.AI},
  url={https://arxiv.org/abs/2512.24615}, 
}

@article{youtu-llm,
  title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models},
  author={Tencent Youtu Lab},
  year={2025},
  eprint={2512.24618},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2512.24618}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
youtu-llm		youtu-llm
youtu-tip		youtu-tip
.gitignore		.gitignore
CNAME		CNAME
LICENSE		LICENSE
README.md		README.md
README_CN.md		README_CN.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is Tip

Tip’s core traits

Why Tip was built

How to use Tip

Installer

Quick start

More Tip tricks

GUI skills

Youtu Agent

Connect on-device models

Use the Ollama endpoint

Use the OpenAI endpoint

Capability Description

Local development

Youtu-LLM: Small and powerful

Highlights

Performance comparison

Base Model

General Benchmarks

Agentic Benchmarks

Instruct Model

General Benchmarks

Agentic Benchmarks

Using Youtu-LLM

License

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Tiamat-Tech/youtu-tip

Folders and files

Latest commit

History

Repository files navigation

What is Tip

Tip’s core traits

Why Tip was built

How to use Tip

Installer

Quick start

More Tip tricks

GUI skills

Youtu Agent

Connect on-device models

Use the Ollama endpoint

Use the OpenAI endpoint

Capability Description

Local development

Youtu-LLM: Small and powerful

Highlights

Performance comparison

Base Model

General Benchmarks

Agentic Benchmarks

Instruct Model

General Benchmarks

Agentic Benchmarks

Using Youtu-LLM

License

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages