Youtu-Embedding-V1 Logo

Youtu-Embedding-V1

Introduction

Youtu-Embedding-V1 is a powerful, general-purpose text representation model with excellent capabilities. It is accessible via an API and stands out from competitors for the following reasons:

  1. Innovative: Youtu-Embedding-V1 is trained on CoDi, a proprietary framework developed by Tencent. Our innovations in unified formatting, loss functions, and sampling strategies enable the model to achieve comprehensive convergence on multiple tasks with a smaller parameter size and at a lower training cost. For more details, please refer to our paper.
  2. Versatile: Youtu-Embedding-V1 demonstrates strong generalization across a range of tasks, including information retrieval, semantic textual similarity, natural language inference, text classification, and text clustering. Furthermore, we support custom instructions to adapt the model to specific downstream scenarios.
  3. Extensible: We plan to open-source the training framework used for Youtu-Embedding-V1 in the near future, which will allow you to easily transfer it to other types of tasks.

Performance

C-MTEB
Model Mean(Task) Mean(Type) Class. Clust. Pair Class. Rerank. Retr. STS
gte-Qwen2-1.5B-instruct67.1267.7972.5354.6179.5068.2171.8660.05
bge-multilingual-gemma267.6468.5275.3159.3086.6768.2873.7355.19
ritrieve_zh_v172.7173.8576.8866.5085.9872.8676.9763.92
Qwen3-Embedding-4B72.2773.5175.4677.8983.3466.0577.0361.26
Qwen3-Embedding-8B73.8475.0076.9780.0884.2366.9978.2163.53
Conan-embedding-v274.2475.9976.4768.8492.4474.4178.3165.48
Seed1.6-embedding75.6376.6877.9873.1188.7171.6579.6968.94
QZhou-Embedding76.9978.5879.9970.9195.0774.8578.8071.89
Youtu-Embedding-V1 77.46 78.74 78.04 79.67 89.69 73.85 80.95 70.28

Usage

API Integration Guide: Tencent Cloud API


import os
import json
import types
import numpy as np
from typing import List
from tencentcloud.common import credential
from tencentcloud.common.profile.client_profile import ClientProfile
from tencentcloud.common.profile.http_profile import HttpProfile
from tencentcloud.common.exception.tencent_cloud_sdk_exception import TencentCloudSDKException
from tencentcloud.lkeap.v20240522 import lkeap_client, models


def encode(client, inputs, is_query=False):
    if is_query:
        instruction = "Instruction: Given a search query, retrieve web passages that answer the question \nQuery: "
    else:
        instruction = ""

    params = {
        "Model": model_name,
        "Inputs": inputs,
        "Instruction": instruction
    }

    req = models.GetEmbeddingRequest()
    req.from_json_string(json.dumps(params))

    resp = client.GetEmbedding(req)
    resp = json.loads(resp.to_json_string())
    outputs =[item["Embedding"] for item in resp["Data"]]
    return outputs

secret_id = os.getenv("TENCENTCLOUD_SECRET_ID")
secret_key = os.getenv("TENCENTCLOUD_SECRET_KEY")

cred = credential.Credential(secret_id, secret_key)

httpProfile = HttpProfile()
httpProfile.endpoint = "lkeap.test.tencentcloudapi.com"

clientProfile = ClientProfile()
clientProfile.httpProfile = httpProfile
client = lkeap_client.LkeapClient(cred, "ap-guangzhou", clientProfile)
model_name = "youtu-embedding-v1"

inputs = ["Regular exercise is the key to staying healthy."]
embeddings = encode(client, inputs, is_query=False)
print(embeddings)