Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

NEW REFERENCE ARCHITECTURE: Distributed training of deep learning models on Azure

Our sixth AI reference architecture (on the Azure Architecture Center) comes from AzureCAT Mike Wasson.

  • Distributed training of deep learning models on Azure

Reference architectures provide a consistent approach and best practices for a given solution. Each architecture includes recommended practices, along with considerations for scalability, availability, manageability, security, and more. This architecture includes a deployable solution as well. The full array of reference architectures is available on the Azure Architecture Center.

This reference architecture shows how to conduct distributed training of Deep Learning Models across clusters of GPU-enabled virtual machines (VMs). The scenario is image classification, but the solution can be generalized for other deep-learning scenarios, such as segmentation and object detection.

This architecture consists of the following components:

  • Azure Batch AI plays the central role in this architecture by scaling resources up and down according to need.
  • Blob storage is used to stage the data.
  • Azure Files is used to store the scripts, logs, and the final results from the training.
  • Batch AI file server is a single-node NFS share used in this architecture to store the training data.
  • Docker Hub is used to store the Docker image that Batch AI uses to run the training. Azure Container Registry can also be used.

Topics covered include:

  • Performance considerations
  • Scalability considerations
  • Storage considerations
  • Security considerations
    • Restrict access to Azure Blob Storage
    • Encrypt data at rest and in motion
    • Secure data in a virtual network
  • Monitoring considerations
  • Deployment

Head over to the Azure Architecture Center to learn more about the Distributed training of deep Learning Models on Azure reference architecture.

See Also

Additional related AI reference architectures:

  • Batch scoring on Azure for deep learning models
  • Batch scoring of Python models on Azure
  • Real-time scoring of Python Scikit-Learn and deep learning models on Azure
  • Real-time scoring of R machine learning models
  • Build a real-time recommendation API on Azure

Find all our reference architectures here.

AzureCAT Guidance

"Hands-on solutions, with our heads in the Cloud!"

Share the post

NEW REFERENCE ARCHITECTURE: Distributed training of deep learning models on Azure

×

Subscribe to Msdn Blogs | Get The Latest Information, Insights, Announcements, And News From Microsoft Experts And Developers In The Msdn Blogs.

Get updates delivered right to your inbox!

Thank you for your subscription

×