The audience of this guide are system architects, field engineers, and development teams of customers, Microsoft, and Databricks. We follow a logical path of planning the infrastructure, provisioning the workspaces,developing Azure Databricks applications, and finally, running Azure Databricks in production. This short guide summarizes these patterns into prescriptive and actionable best practices for Azure Databricks. Unsurprisingly, these patterns are also in-line with modern Cloud-centric development best practices. While each ADB deployment is unique to an organization's needs we have found that some patterns are common across most successful ADB projects. Planning, deploying, and running Azure Databricks (ADB) at scale requires one to make many architectural decisions. "A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away."
Step 1 - Create a Log Analytics Workspace.Installation for being able to capture VM metrics in Log Analytics.Cost Management, Chargeback and Analysis.Querying VM metrics in Log Analytics once you have started the collection using the above document.Collect resource utilization metrics across Azure Databricks cluster in a Log Analytics workspace.Running ADB Applications Smoothly: Guidelines on Observability and Monitoring.Arrive at Correct Cluster Size by Iterative Performance Testing.Use Cluster Log Delivery Feature to Manage Logs.Favor Cluster Scoped Init scripts over Global and Named scripts.Support Batch ETL Workloads with Single User Ephemeral Standard Clusters.Support Interactive analytics using Shared High Concurrency Clusters.Deploying Applications on ADB: Guidelines for Selecting, Sizing, and Optimizing Clusters Performance.Do not Store any Production Data in Default DBFS Folders.Azure Databricks Deployment with limited private IP addresses.
Consider Isolating Each Workspace in its own VNet.Deploy Workspaces in Multiple Subscriptions to Honor Azure Capacity Limits.
Scalable ADB Deployments: Guidelines for Networking, Security, and Capacity Planning.Written by: Priya Aswani, WW Data Engineering & AI Technical Lead Table of Contents Bhanu Prakash, Azure Databricks PM, Microsoft.Premal Shah, Azure Databricks PM, Microsoft.Dhruv Kumar, Senior Solutions Architect, Databricks.