October 5th 2023

Posted on Oct 5 • Originally published at Medium In the field of machine learning, Vision Transformers (ViT) are a type of model used for image classification. Unlike traditional convolutional neural networks, ViTs use the transformer architecture, which was originally designed for natural language processing tasks, to process images. Fine-tuning these models, for optimal performance can be a complex process.In a previous article, I used an Animation to demonstrate changes in the embeddings during the fine-tuning process. This was achieved by performing Principal Component Analysis (PCA) on the embeddings. These embeddings were generated from models at various stages of fine-tuning and their corresponding checkpoints.Projection of embeddings with PCA during fine-tuning of a Vision Transformer (ViT) model [1] on CIFAR10 [3]; Source: created by the author The animation received over 200,000 impressions. It was well-received, with many readers expressing interest in how it was created. This article is here to support those readers and anyone else interested in creating similar visualizations.In this article, I aim to provide a comprehensive guide on how to create such an animation, detailing the steps involved: fine-tuning, creation of embeddings, outlier detection, PCA, Procrustes, review, and creation of the animation.The complete code for the animation is also available in the accompanying notebook on GitHub.The first step is to fine-tune the google/vit-base-patch16–224-in21k Vision Transformer (ViT) model [1], which is pre-trained. We use the CIFAR-10 dataset [2] for this, containing 60,000 images classified into ten different classes: airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks.You can follow the steps outlined in the Hugging Face tutorial for image classification with transformers to execute the fine-tuning process also for CIFAR-10. Additionally, we utilize a TrainerCallback to store the loss values during training into a CSV file for later use in the animation.It’s important to increase the save interval for checkpoints by setting save_strategy="step" and a low value for save_step in TrainingArguments to ensure enough checkpoints for the animation. Each frame in the animation corresponds to one checkpoint. A folder for each checkpoint and the CSV file are created during the training and are ready for further use.We use the AutoFeatureExtractor and AutoModel from the Transformers library to generate embeddings from the CIFAR-10 dataset’s test split using different model checkpoints.Each embedding is a 768-dimensional vector representing one of the 10,000 test images for one model checkpoint. These embeddings can be stored in the same folder as the checkpoints to maintain a good overview.We can use the OutOfDistribution class provided by the Cleanlab library to identify outliers based on the embeddings for each checkpoint. The resulting scores can then identify the top 10 outliers for the animation.def get_ood(sorted_checkpoint_folder, df): ... ood = OutOfDistribution() ood_train_feature_scores = ood.fit_score(features=embedding_np) df["scores"] = ood_train_feature_scoresWith a Principal Component Analysis (PCA) for the scikit-learn package, we visualize the embeddings in a 2D space by reducing the 768-dimensional vectors to 2 dimensions. When recalculating PCA for each timestep, large jumps in the animation due to axis-flips or rotations can occur. To address this issue, we apply an additional Procrustes Analysis [3] from the SciPy package to geometrically transform each frame onto the last frame, which involves only translation, rotation, and uniform scaling. This enables smoother transitions in the animation.def make_pca(sorted_checkpoint_folder, pca_np): ... embedding_np_flat = embedding_np.reshape(-1, 768) pca = PCA(n_components=2) pca_np_new = pca.fit_transform(embedding_np_flat) _, pca_np_new, disparity = procrustes(pca_np, pca_np_new)Before finalizing the entire animation, we conduct a review in Spotlight. In this process, we utilize the first and last checkpoints to perform embedding generation, PCA, and outlier detection. We load the resulting DataFrame in Spotlight:Embeddings for CIFAR-10: PCA and 8 worst outliers for the first and the last checkpoint of a short fine-tuning— visualized with spotlight, source: created by the author Spotlight provides a comprehensive table in the top left, showcasing all the fields present in the dataset. On the top right, two PCA representations are displayed: one for the embeddings generated using the first checkpoint and one for the last checkpoint. Finally, in the bottom section, selected images are presented.Disclaimer: The author of this article is also one of the developers of Spotlight.For each checkpoint, we create an image, which we then store alongside its corresponding checkpoint.This is achieved through the utilization of the make_pca(...) and get_ood(...) functions, which generate the 2D points representing the embedding and extract the top 8 outliers, respectively. The 2D points are plotted with colors corresponding to their respective classes.The outliers are sorted based on their score, and their corresponding images are displayed in a highscore leaderboard. The training loss is loaded from a CSV file and plotted as a line graph.Finally, all the images can be compiled into a GIF using libraries such as imageio or similar.This article has provided a detailed guide on how to create an animation that visualizes the fine-tuning process of a Vision Transformer (ViT) model. We’ve walked through the steps of generating and analyzing embeddings, visualizing the results, and creating an animation that brings these elements together.Creating such an animation not only helps in understanding the complex process of fine-tuning a ViT model but also serves as a powerful tool for communicating these concepts to others.The complete code for the animation is available in the accompanying notebook on GitHub.I am a professional with expertise in creating advanced software solutions for the interactive exploration of unstructured data. I write about unstructured data and use powerful visualization tools to analyze and make informed decisions.[1] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby, An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (2020), arXiv[2] Alex Krizhevsky, Learning Multiple Layers of Features from Tiny Images (2009), University Toronto[3] Gower, John C. Generalized procrustes analysis (1975), PsychometrikaTemplates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse chintanonweb - Sep 27 Francisco Inoque - Sep 26 zvone187 - Oct 3 Ronak Munjapara - Sep 27 Once suspended, markusstoll will not be able to comment or publish posts until their suspension is removed. Once unsuspended, markusstoll will be able to comment and publish posts again. Once unpublished, all posts by markusstoll will become hidden and only accessible to themselves. If markusstoll is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to Markus Stoll. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag markusstoll: markusstoll consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging markusstoll will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.

The Ultimate Guide to Cloud Gaming: D…
best projectors for home

This post first appeared on VedVyas Articles, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

best projectors for home

How I Created an Animation Of the Embeddings During Fine-Tuning

Related Articles

How I Created an Animation Of the Embeddings During Fine-Tuning

Related Articles

Share the post

Subscribe to Vedvyas Articles

Thank you for your subscription