- Before you begin, ensure that you have the conda installed on your machine. You can install
conda
using either Anaconda Distribution or Miniconda. - You must have a
sentence-similarity
type model downloaded onto your local machine.
Setting up your environment
When working on a new conda project, it is recommended that you create a new environment for development. Follow these steps to set up an environment for your embedding application:-
Open Anaconda Prompt (Terminal on macOS/Linux).
This terminal can be opened from within an (JupyterLab, PyCharm, VSCode, Spyder), if preferred.
-
Create the to develop your embedding application and install the packages you’ll need by running the following command:
-
Activate your newly created conda environment by running the following command:
Building the text comparator
Below, you’ll find the necessary code snippets to build your text comparator, with explanations for each step to help you understand how the application works. The text comparator combines two methods for comparing text: semantic similarity using embeddings, and structural similarity using Levenshtein distance. Semantic similarity tells us how close the meanings of two texts are, while Levenshtein distance looks at how similar the actual characters are by counting the edits needed to turn one string into the other. Together, these methods help us understand how similar two text strings are—whether they look alike, mean the same thing, or both. Using your preferred IDE, create a new file and name itsimilarian.py
.
Importing libraries
The application we are building requires libraries to handle HTTP requests, numerical operations, and string similarity calculations. Add the following lines of code to the top of yoursimilarian.py
file:
Setting the base_url
In order for the application to programmatically process text inputs to run server health checks, generate embeddings, and perform other actions, it is crucial that you properly structure your application to interact with the API server and its endpoints.
The URLs for these API endpoints are constructed by combining a base_url
with a specific /endpoint
for each function. The base_URL
can be constructed by combining the Server Address and Server Port specified in Anaconda AI Navigator, like this: http://<SERVER_ADDRESS>:<SERVER_PORT>
.
Set the base_url
to point to the default server address by adding the following line to your file.
localhost
and 127.0.0.1
are semantically identical.Adding the API calls
AI Navigator utilizes llama.cpp’s specifications for interacting with the API server’s/embedding
endpoint.
The API server is also compatible with OpenAI’s
/embeddings
API specifications.GET /health
Before sending any requests to the server, it’s a good idea to verify that the server is operational. This function sends a GET request to the/health
endpoint and returns a JSON response that tells you the server’s status.
Add the following lines to your similarian.py
file:
POST /embedding
To interact with asentence-similarity
model, you must have a function that hits the server’s /embedding
endpoint. This function processes input text and returns its vector representation (embedding).
Add the following lines to your similarian.py
file:
Constructing the functions
Now that we have added the API calls to communicate with the API server, we’ll need to construct the core functionality of our application: comparing two strings of text. This involves measuring their semantic (meaning-based) and structural (character-based) similarities.compare_texts
This function takes the two text inputs from the main
function and calculates the semantic and structural similarity scores.
Add the following lines to your similarian.py
file:
main
The main
function ties the rest of the functions together and handles user input. It takes two inputs from the user and displays the results from the similarity calculations.
Add the following lines to your similarian.py
file:
Interacting with the API server
With your text comparator constructed, it’s time to compare some text!-
Open Anaconda AI Navigator and load a model into the API server.
This must be a
sentence-similarity
type model! - Leave the Server Address and Server Port at the default values and click Start.
-
Open a terminal and navigate to the directory where you stored your
similarian.py
file.Make sure you are still in yourcontent-compare
conda environment. -
Initiate the text comparator by running the following command:
You’ll need to run this command every time you want to run the text comparator.
- Enter a string of text and press Enter (Windows)/Return (Mac).
- Enter a string of text that you want to compare to the previous string and press Enter (Windows)/Return (Mac) again.
- View the Anaconda AI Navigator API server logs. If everything is set up correctly, the server logs will populate with traffic from the application, starting with a health check.
similarian.py
file:
Comparing sentences
Here are some examples that you can use to get a better understanding and feel for how text comparisons work:Synonyms and rephrasing
Synonyms and rephrasing
Try writing the same phrase in two different ways to see how the semantic meaning similarity remains high, even when the structural similarity differs greatly.
Effects of typos
Effects of typos
Experiment with minor typos to observe how semantic similarity remains consistent, while structural similarity drops due to increased edit distance.
Opposite meanings
Opposite meanings
Compare sentences that are structurally similar but have opposite meanings. This highlights how semantic similarity can drop even when Levenshtein similarity remains high.