🎯 Fine-tuning Guide

Overview

OKai supports multiple AI model providers and offers extensive configuration options for fine-tuning model behavior, embedding generation, and performance optimization.

Model Providers

OKai supports multiple model providers through a flexible configuration system:

enum ModelProviderName {
  OPENAI,
  ANTHROPIC,
  CLAUDE_VERTEX,
  GROK,
  GROQ,
  LLAMACLOUD,
  LLAMALOCAL,
  GOOGLE,
  REDPILL,
  OPENROUTER,
  HEURIST,
}

Provider Configuration

Each provider has specific settings:

const models = {
  [ModelProviderName.ANTHROPIC]: {
    settings: {
      stop: [],
      maxInputTokens: 200000,
      maxOutputTokens: 8192,
      frequency_penalty: 0.0,
      presence_penalty: 0.0,
      temperature: 0.3,
    },
    endpoint: "https://api.anthropic.com/v1",
    model: {
      [ModelClass.SMALL]: "claude-3-5-haiku",
      [ModelClass.MEDIUM]: "claude-3-5-sonnet-20241022",
      [ModelClass.LARGE]: "claude-3-5-opus-20240229",
    },
  },
  // ... other providers
};

Model Classes

Models are categorized into different classes based on their capabilities:

enum ModelClass {
    SMALL,    // Fast, efficient for simple tasks
    MEDIUM,   // Balanced performance and capability
    LARGE,    // Most capable but slower/more expensive
    EMBEDDING // Specialized for vector embeddings
    IMAGE     // Image generation capabilities
}

Embedding System

Configuration

const embeddingConfig = {
  dimensions: 1536,
  modelName: "text-embedding-3-small",
  cacheEnabled: true,
};

Implementation

async function embed(runtime: IAgentRuntime, input: string): Promise<number[]> {
  // Check cache first
  const cachedEmbedding = await retrieveCachedEmbedding(runtime, input);
  if (cachedEmbedding) return cachedEmbedding;

  // Generate new embedding
  const response = await runtime.fetch(
    `${runtime.modelProvider.endpoint}/embeddings`,
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${runtime.token}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        input,
        model: runtime.modelProvider.model.EMBEDDING,
        dimensions: 1536,
      }),
    },
  );

  const data = await response.json();
  return data?.data?.[0].embedding;
}

Fine-tuning Options

Temperature Control

Configure model creativity vs. determinism:

const temperatureSettings = {
  creative: {
    temperature: 0.8,
    frequency_penalty: 0.7,
    presence_penalty: 0.7,
  },
  balanced: {
    temperature: 0.5,
    frequency_penalty: 0.3,
    presence_penalty: 0.3,
  },
  precise: {
    temperature: 0.2,
    frequency_penalty: 0.0,
    presence_penalty: 0.0,
  },
};

Context Window

Manage token limits:

const contextSettings = {
  OPENAI: {
    maxInputTokens: 128000,
    maxOutputTokens: 8192,
  },
  ANTHROPIC: {
    maxInputTokens: 200000,
    maxOutputTokens: 8192,
  },
  LLAMALOCAL: {
    maxInputTokens: 32768,
    maxOutputTokens: 8192,
  },
};

Performance Optimization

Caching Strategy

class EmbeddingCache {
  private cache: NodeCache;
  private cacheDir: string;

  constructor() {
    this.cache = new NodeCache({ stdTTL: 300 }); // 5 minute TTL
    this.cacheDir = path.join(__dirname, "cache");
  }

  async get(key: string): Promise<number[] | null> {
    // Check memory cache first
    const cached = this.cache.get<number[]>(key);
    if (cached) return cached;

    // Check disk cache
    return this.readFromDisk(key);
  }

  async set(key: string, embedding: number[]): Promise<void> {
    this.cache.set(key, embedding);
    await this.writeToDisk(key, embedding);
  }
}

Model Selection

async function selectOptimalModel(
  task: string,
  requirements: ModelRequirements,
): Promise<ModelClass> {
  if (requirements.speed === "fast") {
    return ModelClass.SMALL;
  } else if (requirements.complexity === "high") {
    return ModelClass.LARGE;
  }
  return ModelClass.MEDIUM;
}

Provider-Specific Optimizations

OpenAI

const openAISettings = {
  endpoint: "https://api.openai.com/v1",
  settings: {
    stop: [],
    maxInputTokens: 128000,
    maxOutputTokens: 8192,
    frequency_penalty: 0.0,
    presence_penalty: 0.0,
    temperature: 0.6,
  },
  model: {
    [ModelClass.SMALL]: "gpt-4o-mini",
    [ModelClass.MEDIUM]: "gpt-4o",
    [ModelClass.LARGE]: "gpt-4o",
    [ModelClass.EMBEDDING]: "text-embedding-3-small",
    [ModelClass.IMAGE]: "dall-e-3",
  },
};

Anthropic

const anthropicSettings = {
  endpoint: "https://api.anthropic.com/v1",
  settings: {
    stop: [],
    maxInputTokens: 200000,
    maxOutputTokens: 8192,
    temperature: 0.3,
  },
  model: {
    [ModelClass.SMALL]: "claude-3-5-haiku",
    [ModelClass.MEDIUM]: "claude-3-5-sonnet-20241022",
    [ModelClass.LARGE]: "claude-3-5-opus-20240229",
  },
};

Local LLM

const llamaLocalSettings = {
  settings: {
    stop: ["<|eot_id|>", "<|eom_id|>"],
    maxInputTokens: 32768,
    maxOutputTokens: 8192,
    repetition_penalty: 0.0,
    temperature: 0.3,
  },
  model: {
    [ModelClass.SMALL]: "NousResearch/Hermes-3-Llama-3.1-8B-GGUF",
    [ModelClass.MEDIUM]: "NousResearch/Hermes-3-Llama-3.1-8B-GGUF",
    [ModelClass.LARGE]: "NousResearch/Hermes-3-Llama-3.1-8B-GGUF",
    [ModelClass.EMBEDDING]: "togethercomputer/m2-bert-80M-32k-retrieval",
  },
};

Heurist Provider

const heuristSettings = {
  settings: {
    stop: [],
    maxInputTokens: 32768,
    maxOutputTokens: 8192,
    repetition_penalty: 0.0,
    temperature: 0.7,
  },
  imageSettings: {
    steps: 20,
  },
  endpoint: "https://llm-gateway.heurist.xyz",
  model: {
    [ModelClass.SMALL]: "hermes-3-llama3.1-8b",
    [ModelClass.MEDIUM]: "mistralai/mixtral-8x7b-instruct",
    [ModelClass.LARGE]: "nvidia/llama-3.1-nemotron-70b-instruct",
    [ModelClass.EMBEDDING]: "", // Add later
    [ModelClass.IMAGE]: "FLUX.1-dev",
  },
};

Testing and Validation

Embedding Tests

async function validateEmbedding(
  embedding: number[],
  expectedDimensions: number = 1536,
): Promise<boolean> {
  if (!Array.isArray(embedding)) return false;
  if (embedding.length !== expectedDimensions) return false;
  if (embedding.some((n) => typeof n !== "number")) return false;
  return true;
}

Model Performance Testing

async function benchmarkModel(
  runtime: IAgentRuntime,
  modelClass: ModelClass,
  testCases: TestCase[],
): Promise<BenchmarkResults> {
  const results = {
    latency: [],
    tokenUsage: [],
    accuracy: [],
  };

  for (const test of testCases) {
    const start = Date.now();
    const response = await runtime.generateText({
      context: test.input,
      modelClass,
    });
    results.latency.push(Date.now() - start);
    // ... additional metrics
  }

  return results;
}

Best Practices

Model Selection Guidelines

Task Complexity
- Use SMALL for simple, quick responses
- Use MEDIUM for balanced performance
- Use LARGE for complex reasoning
Context Management
- Keep prompts concise and focused
- Use context windows efficiently
- Implement proper context truncation
Temperature Adjustment
- Lower for factual responses
- Higher for creative tasks
- Balance based on use case

Performance Optimization

Caching Strategy
- Cache embeddings for frequently accessed content
- Implement tiered caching (memory/disk)
- Regular cache cleanup
Resource Management
- Monitor token usage
- Implement rate limiting
- Optimize batch processing

Troubleshooting

Common Issues

Token Limits

function handleTokenLimit(error: Error) {
  if (error.message.includes("token limit")) {
    return truncateAndRetry();
  }
}

Embedding Errors

function handleEmbeddingError(error: Error) {
  if (error.message.includes("dimension mismatch")) {
    return regenerateEmbedding();
  }
}

Model Availability

async function handleModelFailover(error: Error) {
  if (error.message.includes("model not available")) {
    return switchToFallbackModel();
  }
}

Overview​

Model Providers​

Provider Configuration​

Model Classes​

Embedding System​

Configuration​

Implementation​

Fine-tuning Options​

Temperature Control​

Context Window​

Performance Optimization​

Caching Strategy​

Model Selection​

Provider-Specific Optimizations​

OpenAI​

Anthropic​

Local LLM​

Heurist Provider​

Testing and Validation​

Embedding Tests​

Model Performance Testing​

Best Practices​

Model Selection Guidelines​

Performance Optimization​

Troubleshooting​

Common Issues​

Overview

Model Providers

Provider Configuration

Model Classes

Embedding System

Configuration

Implementation

Fine-tuning Options

Temperature Control

Context Window

Performance Optimization

Caching Strategy

Model Selection

Provider-Specific Optimizations

OpenAI

Anthropic

Local LLM

Heurist Provider

Testing and Validation

Embedding Tests

Model Performance Testing

Best Practices

Model Selection Guidelines

Performance Optimization

Troubleshooting

Common Issues