mixvideo-v2/cargos/tvai-v2/视觉-语言模型配置/mt5-base-ViT-B-32.json

15 lines
305 B
JSON

{
"embed_dim": 512,
"vision_cfg": {
"image_size": 224,
"layers": 12,
"width": 768,
"patch_size": 32
},
"text_cfg": {
"hf_model_name": "google/mt5-base",
"hf_tokenizer_name": "google/mt5-base",
"hf_pooler_type": "mean_pooler"
}
}