TokenizerType: {
    Enum: Tokenizer;
} | {
    String: string;
}

Specifies which tokenizer to use for the chunking process.

This type supports two ways of specifying a tokenizer:

  1. Using a predefined tokenizer from the Tokenizer enum
  2. Using any Hugging Face tokenizer by providing its model ID as a string (e.g. "facebook/bart-large", "Qwen/Qwen-tokenizer", etc.)

When using a string, any valid Hugging Face tokenizer ID can be specified, which will be loaded using the Hugging Face tokenizers library.

Type declaration

  • String: string

    Use any Hugging Face tokenizer by specifying its model ID Examples: "Qwen/Qwen-tokenizer", "facebook/bart-large"

MMNEPVFCICPMFPCPTTAAATR