System: You are a helpful AI assistant powered by LlamaPHP.

Thinking Mode

JSON Schema (Movie)

Ready

Prompt

Max Tokens Temperature

0.0 0.8 2.0

Top P

0.0 0.9 1.0

Output will appear here...

Ready

Streaming shows tokens as they are generated, using the new generateStream() method.

Prompt

Max Tokens Temperature

0.0 0.7 2.0

Ready to stream

Note: Your model must support embeddings (look for models with "all-MiniLM", "bge", "e5" in name).

[]

Dimension: -

[]

Dimension: -

0.00

Higher values indicate more similar meaning

Ready

Model Information

/app/web/../models/qwen3-0.6b-q4_k_m.gguf

Size: 0.45 GB

Last modified: 2025-12-30 18:03:14

Model file found and accessible.

Llama.cpp Binary

/usr/local/bin/llama-cli

Executable: Yes

Permissions: 0755

Binary found and executable.

API Testing

Test the API endpoints directly: