MedGemma 1.5: New Medical AI for Images & MedASR Speech Model

AI is being used more and more in health care. Google offers AI tools called MedGemma models. These help developers build new health apps. Last year, Google shared the first MedGemma models. They are free for developers to use. You can use them on Google Cloud. Many people have downloaded these models. Many new versions have been made by the community.

MedGemma workflow showing steps from use case definition and model selection to scaling on Google Cloud.

This chart shows how to use MedGemma. It starts with your idea. Then you pick a model. Finally, you can use it on Google Cloud.

We are now releasing MedGemma 1.5 4B. We are also holding a contest called the MedGemma Impact Challenge. This is on Kaggle. We listened to what users wanted. This new model helps with many kinds of medical images. It works well with:

Detailed medical images like CT scans, MRI scans, and tissue samples.
Longer medical images over time, like chest X-rays taken over months.
Finding where things are in chest X-rays.
Understanding medical lab reports to get important data.

MedGemma 1.5 4B is better than the old version. It works better with text, health records, and 2D images. The 4B model is small and can run without the internet. You can still use our 27B model for harder text tasks. Learn more about MedGemma 1.5 4B in its model card. We also released MedASR. This is a model that turns speech into text for medical use. MedASR works well with MedGemma for complex tasks.

MedGemma 1.5, MedASR, and other Health AI Developer Foundations models are free. You can download them from Hugging Face. You can also use them on Google Cloud. These models are for research and business use.

MedGemma Collection infographic showing multimodal AI models for 2D imaging, text, speech, and advanced radiology.

This image shows the different MedGemma models. They can help with images, text, and speech.

The MedGemma Impact Challenge

We want to see how people can use MedGemma. We are hosting the MedGemma Impact Challenge. It is a contest on Kaggle with $100,000 in prizes. Anyone can join. You can build new AI tools for health. We are excited to see what you create!

Better medical image analysis

MedGemma was made to understand many types of medical data. It can read 2D medical images like X-rays and skin images. It can also read tissue slides.

MedGemma 1.5 can now read 3D images. This includes CT scans and MRI scans. It can also read whole tissue slides. You can give it many slices of an image. You can also ask it to find things in the images.

MedGemma 1.5 is more accurate than the first version. It is 3% better at finding issues in CT scans. It is 14% better at finding issues in MRI scans. For tissue slides, it is much better too. Its score is 0.49, which is close to another model called PolyPath.

This new 3D image feature is based on our old tool for CT scans. We think MedGemma 1.5 is the first public AI model that can read 3D medical data. It can also read 2D images and text. These features are new and can still be improved. Developers can make MedGemma better by training it with their own data. We have guides to help you use this new feature for CT scans and tissue images.

Diagram of MedGemma 1.5 analyzing abdominal CT slices, followed by an AI response and a radiologist’s evaluation.

This example shows how MedGemma 1.5 reads CT scans. A doctor then checks the AI's answer. Remember, developers must test and change the model for their needs.

MedGemma 1.5 4B also works better for other image tasks:

Finding things in X-rays: It is 35% better at finding body parts in chest X-rays. See our guide on how to do this.
Looking at X-rays over time: It is 5% better at checking chest X-rays taken over time. See the example below and our guide.
Reading general medical images: It is 3% better at reading X-rays, skin images, tissue slides, and eye images.
Getting data from lab reports: It is 18% better at finding lab types, values, and units in lab reports.

Bar chart comparing MedGemma 1 and 1.5, showing performance improvements across various medical imaging and report metrics.

MedGemma 1.5 4B is better at many tasks. It helps with 3D images, finding things in X-rays, and reading lab reports.

Diagram of MedGemma 1.5 analyzing chest X-rays, followed by an AI response and a radiologist’s evaluation.

This shows how MedGemma 1.5 looks at X-rays over time. A doctor checks the AI's answer. Remember, developers must test and change the model for their needs.

You can now use DICOM files with MedGemma on Google Cloud. This makes it easier to use MedGemma for medical images.

Better text understanding

MedGemma 1.5 4B is also better at reading medical text. It is 5% better at medical question answering. It is 22% better at finding information in electronic health records.

Bar chart showing MedGemma 1.5 4B outperforming version 1 in MedQA and EHRQA medical text accuracy scores.

MedGemma 1.5 4B is better at reading and understanding medical text.

MedASR: Listen to medical speech

Talking is important in health care. We made MedASR to turn medical speech into text. This model knows medical words. MedASR can be used to write down what doctors say. It can also be used to talk to MedGemma. We compared MedASR to another speech model. MedASR had 58% fewer mistakes on X-ray dictations. It had 82% fewer mistakes on other medical dictations. We have guides to help you use MedASR with MedGemma. Learn more in the MedASR model card.

Flowcharts showing MedASR converting spoken dictation to transcripts and spoken prompts into answers via MedGemma 1.5.

MedASR can turn speech into text (top). It can also help you talk to MedGemma (bottom).

How people use MedGemma

Many companies are using MedGemma to build new health tools. For example, Qmed Asia uses MedGemma. They made a chat tool for medical guidelines. Doctors find it very helpful. The image features in MedGemma have been popular.

Taiwan's health care group used MedGemma. They looked at reports for lung cancer surgery. They found key data from over 30,000 reports. This helps them make better decisions for patients. It can improve patient results.

Many research papers mention MedGemma. It is used for reading medical text. It helps with team decisions. It is also used for reading mammograms and in other health situations.

Get started

You can find all MedGemma models on Hugging Face. You can also use them on Vertex AI. MedASR is on Hugging Face and Vertex AI. Join the MedGemma Impact Challenge to show your ideas for new medical AI.

Visit our GitHub page for more guides. You can learn how to use the models and how to train them. We have a new guide for using learning methods. These methods help models learn new things well.

Visit the Health AI Developer Foundations site for more information. Sign up for our news to stay updated. If you need help, use our forum.

We are excited to see what you build. Please share your thoughts with us.

This table compares models MedGemma 1.5 4B, MedSigLIP, and MedASR, categorizing them by typical use cases, such as image interpretation or speech dictation, and ranks their compute requirements from low to high.

This table helps you choose the right model for your needs. It shows what each model can do.

About the data

We used many data sources to train and test these models. We used public and private data. All data is kept private and safe. We protect patient privacy.

Important note

Health AI Developer Foundations models, like MedGemma and MedASR, are meant to help developers create new health apps. They are a starting point. Developers must test and change these models for their specific use. These models are not for direct medical advice. They should not be used to make health decisions. The results from these models are not perfect. They need to be checked by experts. Always follow research and development rules. See the terms of use and policies for more details.

Commentaires

عدد التعليقات : 0

إضافة تعليق جديد

💬 We’d Love to Hear From You!
Your thoughts and feedback matter to us. Please keep your comments respectful, helpful, and relevant to the topic.
🚫 No spam or promotional links.
🔒 Your email address will not be published.
✍️ Required fields are marked.
Thank you for contributing to the discussion, we look forward to your comment! 😊

DeepGeek

<span data-i18n="pages">الصفحات</span>