Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

The Ultimate Guide to nnU-Net for State of the Art Image Segmentation

François PorcherFollowTowards Data Science--ListenShareDuring my Research internship in Deep Learning and Neurosciences at Cambridge University, I used the nnU-Net a lot, which is an extremely strong baseline in Semantic Image Segmentation.However, I struggled a little to fully understand the Model and how to train it, and did not find so much help on internet. Now that I am comfortable with it, I created this tutorial to help you, either in your quest to understand better what is behind this model, or how to use it in your own dataset.Throughout this guide, you will:All code available on this Google Collab notebookThis work took me a significant amount of time and effort. If you find this content valuable, please consider following me to increase its visibility and help support the creation of more such tutorials!Recognized as a state-of-the-art model in Image Segmentation, the nnU-Net is an indomitable force when it comes to both 2D and 3D image processing. Its performance is so robust that it serves as a strong baseline against which new computer vision architectures are benchmarked. In essence, if you are venturing into the world of developing novel computer vision models, consider the nnU-Net as your ‘target to surpass’.This powerful tool is based on the U-Net model (You can find one of my tutorials here: Cook your first U-Net), which made its debut in 2015. The appellation “nnU-Net” stands for “No New U-Net”, a nod to the fact that its design doesn’t introduce revolutionary architectural alterations. Instead, it takes the existing U-Net structure and squeezes out its full potential using a set of ingenious optimization strategies.Contrary to many modern neural networks, the nnU-Net doesn’t rely on residual connections, dense connections, or attention mechanisms. Its strength lies in its meticulous optimization strategy, which includes techniques like resampling, normalization, judicious choice of loss function, optimiser settings, data augmentation, patch-based inference, and ensembling across models. This holistic approach allows the nnU-Net to push the boundaries of what’s achievable with the original U-Net architecture.While it might seem like a singular entity, the nnU-Net is in fact an umbrella term for three distinct types of U-Nets:Each of these architectures brings its unique strengths to the table and, inevitably, has certain limitations.For instance, employing a 2D U-Net for 3D image segmentation might seem counterintuitive, but in practice, it can still be highly effective. This is achieved by slicing the 3D volume into 2D planes.While a 3D U-Net may seem more sophisticated, given its higher parameter count, it isn’t always the most efficient solution. Particularly, 3D U-Nets often struggle with anisotropy, which occurs when spatial resolutions differ along different axes (for example, 1mm along the x-axis and 1.2 mm along the z-axis).The U-Net Cascade variant becomes particularly handy when dealing with large image sizes. It employs a preliminary model to condense the image, followed by a standard 3D U-Net that outputs low-resolution segmentations. The generated predictions are then upscaled, resulting in a refined, comprehensive output.Typically, the methodology involves training all three model variants within the nnU-Net framework. The subsequent step may be to either choose the best performer among the three or employ ensembling techniques. One such technique might involve integrating the predictions of both the 2D and 3D U-Nets.However, it’s worth noting that this procedure can be quite time-consuming (and also money because you need GPU credits). If your constraints only allow for the training of a single model, fret not. You can choose to only train one model, since the ensembling model only brings very marginal gains.This table illustrates the best-performing model variant in relation to specific datasets:Given the significant discrepancies in image size (consider the median shape of 482 × 512 × 512 for liver images versus 36 × 50 × 35 for hippocampus images), the nnU-Net intelligently adapts the input patch size and the number of pooling operations per axis. This essentially implies an automatic adjustment of the number of convolutional layers per dataset, facilitating the effective aggregation of spatial information. In addition to adapting to the varied image geometries, this model takes into account technical constraints, such as available memory.It’s crucial to note that the model doesn’t perform segmentation directly on the entire image but instead on carefully extracted patches with overlapping regions. The predictions on these patches are subsequently averaged, leading to the final segmentation output.But having a large patch means more memory usage, and the batch size also consumes memory. The tradeoff taken is to always prioritize the patch size (the model’s capacity) rather than the batch size (only useful for optimization).Here is the Heuristic algorithm used to compute the optimal patch size and batch size:And this is what it looks like for different Datasets and input dimensions:Great! Now Let’s quickly go over all the techniques used in nnU-Net:All models are trained from scratch and evaluated using five-fold cross-validation on the training set, meaning that the original training dataset is randomly divided into five equal parts, or ‘folds’. In this cross-validation process, four of these folds are used for the training of the model, and the remaining one fold is used for the evaluation or testing. This process is then repeated five times, with each of the five folds being used exactly once as the evaluation set.For the loss, we use a combination of Dice and Cross Entropy Loss. This is a very frequent loss in Image Segmentation. More details on the Dice Loss in V-Net, the U-Net big’s brotherThe nnU-Net have a very strong Data Augmentation pipeline. The authors use random rotations, random scaling, random elastic deformation, gamma correction and mirroring.NB: You can add your own transformations by modifying the source codeSo as we said, the model does not predict directly on the full resolution image, it does that on extracted patches and then aggregates the prediction.This is what it looks like:NB: The patches in the center of the picture are given more weight than the ones on the side, because they contain more information and the model performs better on themSo if you remember well, we can train up to 3 different models, 2D, 3D, and cascade. But when we make inference we can only use one model at a time right?Well turns out that no, different models have different strengths and weaknesses. So we can actually combine the predictions of several models so that if one model is very confident, we prioritize its prediction.nnU-Net tests every combination of 2 models among the 3 available models and picks up the best one.In Practice, there are 2 ways to do that:Hard voting: For each pixel, we look at all the probabilities outputted by the 2 models, and we take the class with the highest probability.Soft Voting: For each pixel, we average the probability of the models, and then we take the class with the maximum probability.Before we begin, you can download the dataset here and follow the Google Collab notebook.If you did not understand anything about the first part, no worries, this is the practical part, you just need to follow me, and you are still going to get the best results.You need a GPU to train the model otherwise it does not work. You can either do it locally, or on Google Collab, don’t forget to change the runtime > GPUSo, first of all, you need to have a dataset ready with input images and their corresponding segmentation. You can follow my tutorial by downloading this ready dataset for 3D Brain segmentation, and then you can replace it with your own dataset.First of all you should download your data and place them in the data folder, by naming the two folders “input” and “ground_truth” which contains the segmentation.For the rest of the tutorial I will use the MindBoggle dataset for image segmentation. You can download it on this Google Drive:We are given 3D MRI scans of the Brain and we want to segment the White and Gray matter:It should look like this:If you run this on Google Colab, set collab = True, otherwise collab = FalseNow we are going to define a function that creates folders for us:And we use this function to create our “my_nnunet” folder where everything is going to be savedNow we are going to install all the requirements. First let’s install the nnunet library. If you are in a notebook run this in a cell:Otherwise you can install nnunet directly from the terminal withNow we are going to clone the nnUnet git repository and NVIDIA apex. This contains the training scripts as well as a GPU accelerator.nnUnet requires a very specific structure for the folders.Originally the nnU-Net was designed for a decathlon challenge with different tasks. If you have different tasks just run this cell for all your tasks.You should have a structure like that now:The script needs to know where you put your raw_data, where it can find the preprocessed data, and where it had to save the results.We define a function that will move our images to the right repositories in the nnunet folder:Now let’s run this function for the input and ground truth images:Now we have to rename the files to be accepted by the nnUnet format, for example subject.nii.gz will become subject_0000.nii.gzWe are almost done!You mostly need to modify 2 things:This creates the dataset for the nnU-Net formatWe are now ready to train the models!To train the 3D U-Net:To train the 2D U-Net:To train the cascade model:Note: If you pause the traning and want to resume it, add a “-c” in the end for “continue”.For example:Now we can run the inference:First let’s check the training loss. This looks very healthy, and we have a Dice Score > 0.9 (green curve).This is truly excellent for so little work and a 3D Neuroimaging segmentation task.Let’s look at one sample:The results are indeed impressive! It’s clear that the model has effectively learned how to segment brain images with high accuracy. While there may be minor imperfections, it’s important to remember that the field of image segmentation is advancing rapidly, and we’re making significant strides towards perfection.In the future, there’s scope to further optimize the performance of nnU-Net, but that will be for an other articleIf you found this article insightful and beneficial, please consider following me for more in-depth explorations into the world of deep learning. Your support helps me continue producing content that aids our collective understanding.Whether you have feedback, ideas to share, wanna work with me, or simply want to say hello, please fill out the form below, and let’s start a conversation.Say Hello 🌿Don’t hesitate to leave a clap or follow me for more!----Towards Data ScienceAI Research Scientist at Meta | UC Berkeley X Cambridge. https://github.com/FrancoisPorcherFrançois PorcherinTowards Data Science--2Dominik PolzerinTowards Data Science--35Kenneth LeunginTowards Data Science--22François PorcherinTowards Data Science--2Hennie de HarderinTowards Data Science--7Beatriz StollnitzinTowards Data Science--6Dominik PolzerinTowards Data Science--35Ignacio de Gregorio--13ColeinCantor’s Paradise--3Papers in 100 Lines of Code--1HelpStatusWritersBlogCareersPrivacyTermsAboutText to speechTeams



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

The Ultimate Guide to nnU-Net for State of the Art Image Segmentation

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×