Databricks Integration
Integrating Databricks with Anaconda Package Security Manager (Cloud) enables organizations to maintain security and compliance while leveraging the power of both platforms.
For data science teams working in regulated environments, this integration provides essential security controls over Python package usage. Your organization can enforce security policies and maintain consistent environments across development and production. This helps prevent the use of unauthorized or vulnerable packages while providing comprehensive audit trails of package usage across your Databricks workspaces.
This guide explains how to set up a secure, customized Python environment in Databricks using packages from Anaconda’s Package Security Manager (Cloud).
Prerequisites
Before starting, ensure you have:
- Administrator access to an Anaconda organization
- An Anaconda organization access token
- Docker installed on your local machine
- A Databricks workspace with admin privileges
Setup and configuration
Create a Channel
- Sign in to Anaconda.com.
- Click Channels.
- Click Add Channel.
- Name your channel
databricks
. - Set the channel’s Type to Virtual.
- Open the Source dropdown and select main.
- Set the channel’s Access to Internal.
- Click Save.
Create and apply a policy
-
Click Create under POLICIES.
-
Name your policy
databricks
. -
Configure the policy filter as follows:
Exclude package if:
Platform
Is not
linux-64
and
Platform
Is not
noarch
-
Click Save.
-
Apply your policy to the
databricks
channel you created earlier. For more information, see Applying a Policy.
Build a Custom Docker Image
To create a secure Python environment in Databricks, you’ll need to build a custom Docker image using Databricks Container Service. This image will contain your conda-based environment and can be used when launching your Databricks cluster.
For more information, see Customize containers with Databricks Container Service and GitHub - databricks/containers.
-
Create a directory on your local machine called
dcs-conda
by running the following command: -
Enter your new
dcs-conda
directory and create aDockerfile
file inside thedcs-conda
directory: -
Add the following content to the
Dockerfile
file: -
Create an
env.yml
file inside thedcs-conda
directory: -
Add the following content to the
env.yml
file:Please check the recommended package versions in the System environment section of the Databricks Runtime release notes and compatibility documentation.
-
Build the Docker image:
-
Tag and push your custom image to a Docker registry by running the following commands:
Launch a Cluster using Databricks Container Service
Clients must be authorized to access Databricks resources using a Databricks account with appropriate permissions. Without proper access, CLI commands and REST API calls will fail. Permissions can be configured by a workspace administrator.
Databricks recommends using OAuth for authorization instead of Personal Access Tokens (PATs). OAuth tokens refresh automatically and reduce security risks associated with token leaks or misuse. For more information, see Authorizing access to Databricks resources.
-
Open your Databricks workspace.
-
Select Compute from the left-hand navigation, then click Create compute.
-
On the New compute page, specify the Cluster Name.
-
Under Performance, set the Databricks Runtime Version to a version that supports Databricks Container Service. For example - Runtime: 15.4-LTS.
This version is under long-term support (LTS). For more information, see Databricks support lifecycles.
Databricks Runtime for Machine Learning
does not support Databricks Container Service. -
Open the Advanced options dropdown and select the Spark tab.
-
Add the following Spark configurations:
To access volumes on Databricks Container Service, add the following configuration to the compute’s Spark config field as well:
spark.databricks.unityCatalog.volumes.enabled true
. -
Select the Docker tab.
-
Select the Use your own Docker container checkbox.
-
Enter your custom Docker image in the Docker Image URL field.
-
Open the Authentication dropdown and select an authentication method.
-
Click Create compute.
Create a Notebook and connect it to your cluster
-
Click New in the top-left corner, then click Notebook.
-
Specify a name for the notebook.
-
Click Connect, then select your cluster from the resource list.
Verify your conda installation
-
In your notebook, run one of the following commands to check that conda is installed:
Both commands run shell code from the notebook.
!conda --help
runs the command in the current shell.%sh conda --help
starts a subshell, which is useful for multi-line scripts, but might not have the same environment or path. -
In your notebook, run the following command to check your source channels:
Install MLflow from your Anaconda organization channel
MLflow is available through your Anaconda organization channel for use in your Databricks environment.
-
In your notebook, install MLflow from your Anaconda organization channel:
This command installs MLflow and all of its dependencies from your Package Security Manager channel.
-
In your notebook, verify the installation: