Token Jockey

Status

Running: GLM-4.6V-Flash 9B

Loading model...

Uptime: 5m 43s

System

GPU Usage

39°C

GPU Temp

VRAM 10.6 / 11.9 GB

Models

+ Add New Model

Configure a new LLM model

Logs

[18:41:12] llama_context: flash_attn    = auto
[18:41:12] llama_context: kv_unified    = true
[18:41:12] llama_context: freq_base     = 500000.0
[18:41:12] llama_context: freq_scale    = 1
[18:41:12] llama_context: n_ctx_seq (50688) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
[18:41:12] llama_context:  CUDA_Host  output buffer size =     2.31 MiB
[18:41:12] llama_kv_cache:      CUDA0 KV buffer size =  1980.00 MiB
[18:41:12] llama_kv_cache: size = 1980.00 MiB ( 50688 cells,  40 layers,  4/1 seqs), K (f16):  990.00 MiB, V (f16):  990.00 MiB
[18:41:12] sched_reserve: reserving ...
[18:41:12] sched_reserve: Flash Attention was auto, set to enabled
[18:41:12] sched_reserve:      CUDA0 compute buffer size =   392.25 MiB
[18:41:12] sched_reserve:  CUDA_Host compute buffer size =   107.02 MiB
[18:41:12] sched_reserve: graph nodes  = 1487
[18:41:12] sched_reserve: graph splits = 2
[18:41:12] sched_reserve: reserve took 21.77 ms, sched copies = 1
[18:41:12] common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
[18:41:12] clip_model_loader: model name:   Glm-4.6V
[18:41:12] clip_model_loader: description:
[18:41:12] clip_model_loader: GGUF version: 3
[18:41:12] clip_model_loader: alignment:    32
[18:41:12] clip_model_loader: n_tensors:    182
[18:41:12] clip_model_loader: n_kv:         33
[18:41:12] clip_model_loader: has vision encoder
[18:41:12] clip_ctx: CLIP using CUDA0 backend
[18:41:12] load_hparams: projector:          glm4v
[18:41:12] load_hparams: n_embd:             1536
[18:41:12] load_hparams: n_head:             12
[18:41:12] load_hparams: n_ff:               10944
[18:41:12] load_hparams: n_layer:            24
[18:41:12] load_hparams: ffn_op:             silu
[18:41:12] load_hparams: projection_dim:     4096
[18:41:12] --- vision hparams ---
[18:41:12] load_hparams: image_size:         336
[18:41:12] load_hparams: patch_size:         14
[18:41:12] load_hparams: has_llava_proj:     0
[18:41:12] load_hparams: minicpmv_version:   0
[18:41:12] load_hparams: n_merge:            2
[18:41:12] load_hparams: n_wa_pattern: 0
[18:41:12] load_hparams: image_min_pixels:   6272
[18:41:12] load_hparams: image_max_pixels:   3211264
[18:41:12] load_hparams: model size:         1639.67 MiB
[18:41:12] load_hparams: metadata size:      0.06 MiB
[18:41:17] warmup: warmup with image size = 1288 x 1288
[18:41:17] alloc_compute_meta:      CUDA0 compute buffer size =   515.05 MiB
[18:41:17] alloc_compute_meta:        CPU compute buffer size =    19.11 MiB
[18:41:17] alloc_compute_meta: graph splits = 1, nodes = 632
[18:41:17] warmup: flash attention is enabled
[18:41:17] srv    load_model: loaded multimodal model, 'C:\Users\marcv\.lmstudio\models\unsloth\GLM-4.6V-Flash-GGUF\mmproj-F16.gguf'
[18:41:17] srv    load_model: initializing slots, n_slots = 4
[18:41:17] slot   load_model: id  0 | task -1 | new slot, n_ctx = 50688
[18:41:17] slot   load_model: id  1 | task -1 | new slot, n_ctx = 50688
[18:41:17] slot   load_model: id  2 | task -1 | new slot, n_ctx = 50688
[18:41:17] slot   load_model: id  3 | task -1 | new slot, n_ctx = 50688
[18:41:17] srv    load_model: prompt cache is enabled, size limit: 8192 MiB
[18:41:17] srv    load_model: use `--cache-ram 0` to disable the prompt cache
[18:41:17] srv    load_model: for more info see https://github.com/ggml-org/llama.cpp/pull/16391
[18:41:17] init: chat template, example_format: '[gMASK]<|system|>
[18:41:17] You are a helpful assistant<|user|>
[18:41:17] Hello<|assistant|>
[18:41:17] 
[18:41:17] Hi there<|user|>
[18:41:17] How are you?<|assistant|>
[18:41:17] '
[18:41:17] srv          init: init: chat template, thinking = 0
[18:41:17] main: model loaded
[18:41:17] main: server is listening on http://0.0.0.0:8080
[18:41:17] main: starting the main loop...
[18:41:17] srv  update_slots: all slots are idle
[18:41:28] srv  params_from_: Chat format: GLM 4.5
[18:41:28] slot get_availabl: id  3 | task -1 | selected slot by LRU, t_last = -1
[18:41:28] slot launch_slot_: id  3 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
[18:41:28] slot launch_slot_: id  3 | task 0 | processing task, is_child = 0
[18:41:28] slot update_slots: id  3 | task 0 | new prompt, n_ctx_slot = 50688, n_keep = 0, task.n_tokens = 275
[18:41:28] slot update_slots: id  3 | task 0 | n_tokens = 0, memory_seq_rm [0, end)
[18:41:28] slot update_slots: id  3 | task 0 | prompt processing progress, n_tokens = 275, batch.n_tokens = 275, progress = 1.000000
[18:41:28] slot update_slots: id  3 | task 0 | prompt done, n_tokens = 275, batch.n_tokens = 275
[18:41:28] slot init_sampler: id  3 | task 0 | init sampler, took 0.06 ms, tokens: text = 275, total = 275
[18:41:32] slot print_timing: id  3 | task 0 |
[18:41:32] prompt eval time =     189.57 ms /   275 tokens (    0.69 ms per token,  1450.64 tokens per second)
[18:41:32] eval time =    3576.15 ms /   312 tokens (   11.46 ms per token,    87.24 tokens per second)
[18:41:32] total time =    3765.72 ms /   587 tokens
[18:41:32] slot      release: id  3 | task 0 | stop processing: n_tokens = 586, truncated = 0
[18:41:32] srv  update_slots: all slots are idle
[18:41:32] srv  log_server_r: request: POST /v1/chat/completions 100.x.x.x 200
[18:41:51] srv  params_from_: Chat format: GLM 4.5
[18:41:51] slot get_availabl: id  2 | task -1 | selected slot by LRU, t_last = -1
[18:41:51] slot launch_slot_: id  2 | task -1 | sampler chain: logits -> penalties -> ?dry -> ?top-n-sigma -> top-k -> ?typical -> top-p -> ?min-p -> ?xtc -> temp-ext -> dist
[18:41:51] slot launch_slot_: id  2 | task 313 | processing task, is_child = 0
[18:41:51] slot update_slots: id  2 | task 313 | new prompt, n_ctx_slot = 50688, n_keep = 0, task.n_tokens = 53
[18:41:51] slot update_slots: id  2 | task 313 | n_tokens = 0, memory_seq_rm [0, end)
[18:41:51] slot update_slots: id  2 | task 313 | prompt processing progress, n_tokens = 53, batch.n_tokens = 53, progress = 1.000000
[18:41:51] slot update_slots: id  2 | task 313 | prompt done, n_tokens = 53, batch.n_tokens = 53
[18:41:51] slot init_sampler: id  2 | task 313 | init sampler, took 0.01 ms, tokens: text = 53, total = 53
[18:41:59] slot print_timing: id  2 | task 313 |
[18:41:59] prompt eval time =     160.95 ms /    53 tokens (    3.04 ms per token,   329.29 tokens per second)
[18:41:59] eval time =    7080.81 ms /   612 tokens (   11.57 ms per token,    86.43 tokens per second)
[18:41:59] total time =    7241.76 ms /   665 tokens
[18:41:59] slot      release: id  2 | task 313 | stop processing: n_tokens = 664, truncated = 0
[18:41:59] srv  update_slots: all slots are idle
[18:41:59] srv  log_server_r: request: POST /v1/chat/completions 100.x.x.x 200

Command History

[2026-02-20 18:41:03] MODEL_START: Started model 'GLM-4.6V-Flash 9B' (glm-4-6v-flash) → running [pid: 37492]

[2026-02-20 18:41:03] MODEL_START: Starting model 'GLM-4.6V-Flash 9B' (glm-4-6v-flash)

[2026-02-18 19:42:00] MODEL_STOP: Stopped model 'GLM-4.6V-Flash 9B' (glm-4-6v-flash) -> stopped (exit: 1) [pid: 35872]

[2026-02-18 19:42:00] MODEL_STOP: Stopping model 'GLM-4.6V-Flash 9B' (glm-4-6v-flash) → stopping [pid: 35872]

[2026-02-18 19:41:33] MODEL_START: Started model 'GLM-4.6V-Flash 9B' (glm-4-6v-flash) → running [pid: 35872]

[2026-02-18 19:41:33] MODEL_START: Starting model 'GLM-4.6V-Flash 9B' (glm-4-6v-flash)

[2026-02-08 16:31:12] MODEL_STOP: Stopped model 'ministral-3-8b' (ministral-3-8b) -> stopped (exit: 1) [pid: 11164]

[2026-02-08 16:31:11] MODEL_STOP: Stopping model 'ministral-3-8b' (ministral-3-8b) → stopping [pid: 11164]

[2026-02-08 16:05:21] MODEL_START: Started model 'ministral-3-8b' (ministral-3-8b) → running [pid: 11164]

[2026-02-08 16:05:21] MODEL_START: Starting model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 16:01:53] MODEL_START: Started model 'ministral-3-8b' (ministral-3-8b) → running [pid: 25884]

[2026-02-08 16:01:53] MODEL_START: Starting model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 16:01:53] MODEL_STOP: Stopped model 'ministral-3-8b' (ministral-3-8b) -> stopped (exit: 1) [pid: 6072]

[2026-02-08 16:01:53] MODEL_STOP: Stopping model 'ministral-3-8b' (ministral-3-8b) → stopping [pid: 6072]

[2026-02-08 16:01:52] CONFIG_UPDATE: Updated model 'ministral-3-8b' (ministral-3-8b) → completed

[2026-02-08 16:01:52] CONFIG_UPDATE: Updating model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 16:01:18] MODEL_START: Started model 'ministral-3-8b' (ministral-3-8b) → running [pid: 6072]

[2026-02-08 16:01:18] MODEL_START: Starting model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 16:01:17] CONFIG_UPDATE: Updated model 'ministral-3-8b' (ministral-3-8b) → completed

[2026-02-08 16:01:17] CONFIG_UPDATE: Updating model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 16:01:11] MODEL_STOP: Stopped model 'ministral-3-8b' (ministral-3-8b) -> stopped (exit: 1) [pid: 22552]

[2026-02-08 16:01:11] MODEL_STOP: Stopping model 'ministral-3-8b' (ministral-3-8b) → stopping [pid: 22552]

[2026-02-08 16:01:10] MODEL_START: Started model 'ministral-3-8b' (ministral-3-8b) → running [pid: 22552]

[2026-02-08 16:01:10] MODEL_START: Starting model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 16:01:10] MODEL_STOP: Stopped model 'ministral-3-8b' (ministral-3-8b) -> stopped (exit: 1) [pid: 6428]

[2026-02-08 16:01:10] MODEL_STOP: Stopping model 'ministral-3-8b' (ministral-3-8b) → stopping [pid: 6428]

[2026-02-08 15:58:38] MODEL_START: Started model 'ministral-3-8b' (ministral-3-8b) → running [pid: 6428]

[2026-02-08 15:58:38] MODEL_START: Starting model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 15:58:37] CONFIG_UPDATE: Updated model 'ministral-3-8b' (ministral-3-8b) → completed

[2026-02-08 15:58:37] CONFIG_UPDATE: Updating model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 15:58:25] MODEL_STOP: Stopped model 'ministral-3-8b' (ministral-3-8b) -> stopped (exit: 1) [pid: 25852]

[2026-02-08 15:58:25] MODEL_STOP: Stopping model 'ministral-3-8b' (ministral-3-8b) → stopping [pid: 25852]

[2026-02-08 15:58:24] MODEL_START: Started model 'ministral-3-8b' (ministral-3-8b) → running [pid: 25852]

[2026-02-08 15:58:24] MODEL_START: Starting model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 15:58:24] MODEL_STOP: Stopped model 'ministral-3-8b' (ministral-3-8b) -> stopped (exit: 1) [pid: 10156]

[2026-02-08 15:58:24] MODEL_STOP: Stopping model 'ministral-3-8b' (ministral-3-8b) → stopping [pid: 10156]

[2026-02-08 15:57:25] MODEL_START: Started model 'ministral-3-8b' (ministral-3-8b) → running [pid: 10156]

[2026-02-08 15:57:25] MODEL_START: Starting model 'ministral-3-8b' (ministral-3-8b)

[2026-02-08 15:57:23] CONFIG_CREATE: Created model 'ministral-3-8b' (ministral-3-8b) → completed

[2026-02-08 15:57:23] CONFIG_CREATE: Creating new model 'ministral-3-8b' (ministral-3-8b)

[2026-02-03 19:40:58] MODEL_STOP: Stopped model 'Nemotron-3-Nano-30B-A3B' (nemotron-3-nano-30b-a3b) -> stopped (exit: 1) [pid: 104016]

[2026-02-03 19:40:58] MODEL_STOP: Stopping model 'Nemotron-3-Nano-30B-A3B' (nemotron-3-nano-30b-a3b) → stopping [pid: 104016]

[2026-02-03 16:50:51] MODEL_START: Started model 'Nemotron-3-Nano-30B-A3B' (nemotron-3-nano-30b-a3b) → running [pid: 104016]

[2026-02-03 16:50:51] MODEL_START: Starting model 'Nemotron-3-Nano-30B-A3B' (nemotron-3-nano-30b-a3b)

[2026-01-30 11:45:05] MODEL_STOP: Stopped model 'qwen-3-4b-2507-think' (qwen-3-4b-2507-think) -> stopped (exit: 1) [pid: 34076]

[2026-01-30 11:45:05] MODEL_STOP: Stopping model 'qwen-3-4b-2507-think' (qwen-3-4b-2507-think) → stopping [pid: 34076]

[2026-01-30 09:56:07] MODEL_START: Started model 'qwen-3-4b-2507-think' (qwen-3-4b-2507-think) → running [pid: 34076]

[2026-01-30 09:56:07] MODEL_START: Starting model 'qwen-3-4b-2507-think' (qwen-3-4b-2507-think)

[2026-01-30 09:56:05] CONFIG_UPDATE: Updated model 'qwen-3-4b-2507-think' (qwen-3-4b-2507-think) → completed

[2026-01-30 09:56:05] CONFIG_UPDATE: Updating model 'qwen-3-4b-2507-think' (qwen-3-4b-2507-think)

Token Jockey

Status

System

Models

+ Add New Model

GPT-OSS 20B

GLM-4.6V-Flash 9B

Nemotron-3-Nano-30B-A3B

qwen-3-4b-2507-instruct

qwen-3-4b-2507-think

ministral-3-8b

Logs

Command History

⚠️ Delete Model

Status

System

Models

+ Add New Model

GPT-OSS 20B

GLM-4.6V-Flash 9B

Nemotron-3-Nano-30B-A3B

qwen-3-4b-2507-instruct

qwen-3-4b-2507-think

ministral-3-8b

Logs

Command History

Edit Model

Basic Information

Performance Settings

Sampling Parameters

Advanced

⚠️ Delete Model