Create A Custom Skill For Azure AI Search
Create A Custom Skill For Azure AI Search
In this exercise, you'll create a custom skill that tabulates the frequency of individual
words in a document to generate a list of the top five most used words, and add it to a
search solution for Margie's Travel - a fictitious travel agency.
You'll develop your search app using Visual Studio Code. The code files for your app
have been provided in a GitHub repo.
Note: If you have previously completed the Create an Azure AI Search solution
exercise, and still have these Azure resources in your subscription, you can skip this
section and start at the Create a search solution section. Otherwise, follow the steps
below to provision the required Azure resources.
Now that you have the necessary Azure resources, you can create a search solution
that consists of the following components:
● A data source that references the documents in your Azure storage container.
● A skillset that defines an enrichment pipeline of skills to extract AI-generated
fields from the documents.
● An index that defines a searchable set of document records.
● An indexer that extracts the documents from the data source, applies the
skillset, and populates the index.
In this exercise, you'll use the Azure AI Search REST interface to create these
components by submitting JSON requests.
1. At the top of the blade for your Azure AI Search resource, select Search
explorer.
2. In Search explorer, in the Query string box, enter the following query string,
and then select Search.
3. search=London&$select=url,sentiment,keyphrases&$filter=metadata_a
uthor eq 'Reviewer' and sentiment eq 'positive'
4. This query retrieves the url, sentiment, and keyphrases for all documents that
mention London authored by Reviewer that have a positive sentiment label (in
other words, positive reviews that mention London)
The search solution includes a number of built-in AI skills that enrich the index with
information from the documents, such as the sentiment scores and lists of key phrases
seen in the previous task.
You can enhance the index further by creating custom skills. For example, it might be
useful to identify the words that are used most frequently in each document, but no
built-in skill offers this functionality.
To implement the word count functionality as a custom skill, you'll create an Azure
Function in your preferred language.
Note: In this exercise, you'll create a simple Node.JS function using the code editing
capabilities in the Azure portal. In a production solution, you would typically use a
development environment such as Visual Studio Code to create a function app in your
preferred language (for example C#, Python, Node.JS, or Java) and publish it to Azure
as part of a DevOps process.
1. In the Azure Portal, on the Home page, create a new Function App resource
with the following settings:
○ Hosting Plan: Consumption
○ Subscription: Your subscription
○ Resource Group: The same resource group as your Azure AI Search
resource
○ Function App name: A unique name
○ Runtime stack: Node.js
○ Version: 18 LTS
○ Region: The same region as your Azure AI Search resource
○ Operating system: Windows
2. Wait for deployment to complete, and then go to the deployed Function App
resource.
3. On the Overview page select Create function at the bottom of the page to
create a new function with the following settings:
○ Select a template
■ Template: HTTP Trigger
○ Template details:
■ Function name: wordcount
■ Authorization level: Function
4. Wait for the wordcount function to be created. Then in its page, select the Code
+ Test tab.
5. Replace the default function code with the following code:
javascript
module.exports = async function (context, req) {
context.log('JavaScript HTTP trigger function processed a request.');
vals = req.body.values;
// Array of stop words to be ignored
var stopwords = ['', 'i', 'me', 'my', 'myself', 'we', 'our',
'ours', 'ourselves', 'you',
"youre", "youve", "youll", "youd", 'your', 'yours', 'yourself',
'yourselves', 'he', 'him', 'his', 'himself', 'she', "shes", 'her',
'hers', 'herself', 'it', "its", 'itself', 'they', 'them',
'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom',
'this', 'that', "thatll", 'these', 'those', 'am', 'is', 'are',
'was',
'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having',
'do',
'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if',
'or',
'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for',
'with',
'about', 'against', 'between', 'into', 'through', 'during',
'before',
'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in',
'out',
'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once',
'here',
'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both',
'each',
'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor',
'not',
'only', 'own', 'same', 'so', 'than', 'too', 'very', 'can', 'will',
'just', "dont", 'should', "shouldve", 'now', "arent", "couldnt",
"didnt", "doesnt", "hadnt", "hasnt", "havent", "isnt", "mightnt",
"mustnt",
"neednt", "shant", "shouldnt", "wasnt", "werent", "wont",
"wouldnt"];
res = {"values":[]};
// Get the first ten words from the first array dimension
resVal.data.text = topWords.slice(0,9)
.map(function(value,index) { return value[0]; });
res.values[rec] = resVal;
};
context.res = {
body: JSON.stringify(res),
headers: {
'Content-Type': 'application/json'
}
};
}
else {
context.res = {
status: 400,
body: {"errors":[{"message": "Invalid input"}]},
headers: {
'Content-Type': 'application/json'
}
};
}
};
{
"values": [
{
"recordId": "a1",
"data":
{
"text": "Tiger, tiger burning bright in the darkness of the
night.",
"language": "en"
}
},
{
"recordId": "a2",
"data":
{
"text": "The rain in spain stays mainly in the plains! That's
where you'll find the rain!",
"language": "en"
}
}
]
8. }
9. Click Run and view the HTTP response content that is returned by your function.
This reflects the schema expected by Azure AI Search when consuming a skill, in
which a response for each document is returned. In this case, the response
consists of up to 10 terms in each document in descending order of how
frequently they appear:
json
{
"values": [
{
"recordId": "a1",
"data": {
"text": [
"tiger",
"burning",
"bright",
"darkness",
"night"
]
}
},
{
"recordId": "a2",
"data": {
"text": [
"rain",
"spain",
"stays",
"mainly",
"plains",
"thats",
"youll",
"find"
]
}
}
]
10.}
11.Close the Test/Run pane and in the wordcount function blade, click Get
function URL. Then copy the URL for the default key to the clipboard. You'll
need this in the next procedure.
Now you need to include your function as a custom skill in the search solution skillset,
and map the results it produces to a field in the index.
1. At the top of the blade for your Azure AI Search resource, select Search
explorer.
2. In Search explorer, change the view to JSON view, and then submit the
following search query:
json
{
"search": "Las Vegas",
"select": "url,top_words"
3. }
4. This query retrieves the url and top_words fields for all documents that
mention Las Vegas.
Clean-up
Now that you've completed the exercise, delete all the resources you no longer need.
Delete the Azure resources:
More information
To learn more about creating custom skills for Azure AI Search, see the Azure AI Search
documentation.