Microsoft brings out a small language model that can look at pictures

We’re seeing more ‘multimodal’ models – that can listen, or view images. Microsoft just introduced one at their Build event. From The Verge:

Microsoft says Phi-3-vision, now available on preview, is a 4.2 billion parameter model (parameters refer to how complex a model is and how much of its training it understands) that can do general visual reasoning tasks like asking questions about charts or images…Small models can be used to power AI features on devices like phones and laptops without the need to take up too much computer memory.

Expect something like this to be integrated into next generation smartphones and desktops.

Read their report here.

Leave a comment