mixvideo-v2/cargos/tvai-v2/视觉-语言模型配置/ViT-M-16-alt.json

17 lines
325 B
JSON

{
"embed_dim": 384,
"vision_cfg": {
"image_size": 224,
"layers": 12,
"width": 512,
"patch_size": 16,
"ls_init_value": 1e-4
},
"text_cfg": {
"context_length": 77,
"vocab_size": 49408,
"width": 384,
"heads": 6,
"layers": 12
}
}