Evaluating content safety in your .NET AI applications

We are excited to announce the addition of the Microsoft.Extensions.AI.Evaluation.Safety package to the Microsoft.Extensions.AI.Evaluation libraries! This new package provides evaluators that help you detect harmful or sensitive content — such as hate speech, violence, copyrighted material, insecure code, and more — within AI-generated content in your Intelligent Applications.

These safety evaluators are powered by the Azure AI Foundry Evaluation service and are designed for seamless integration into your existing workflows, whether you’re running evaluations within unit tests locally or automating offline evaluation checks in your CI/CD pipelines.

The new safety evaluators complement the quality-focused evaluators we covered earlier in the below posts. Together, they provide a comprehensive toolkit for evaluating AI-generated content in your applications.

Setting Up Azure AI Foundry for Safety Evaluations

To use the safety evaluators, you’ll need the following steps to set up access to the Azure AI Foundry Evaluation service:

  1. Firstly, you need an Azure subscription.
  2. Within this subscription, create a resource group within one of the Azure regions that support Azure AI Foundry Evaluation service.
  3. Next create an Azure AI hub within the same resource group and region.
  4. Finally, create an Azure AI project within this hub.
  5. Once you have created the above artifacts, configure the following environment variables so that the evaluators used in the code example below can connect to the above Azure AI project:
set EVAL_SAMPLE_AZURE_SUBSCRIPTION_ID=<your-subscription-id>
set EVAL_SAMPLE_AZURE_RESOURCE_GROUP=<your-resource-group-name>
set EVAL_SAMPLE_AZURE_AI_PROJECT=<your-ai-project-name>

C# Example: Evaluating Content Safety

The following code shows how to configure and run safety evaluators to check an AI response for violence, hate and unfairness, protected material, and indirect attacks.

To run this example, create a new MSTest unit test project. Make sure to do this from a command prompt or terminal where the above environment variables (EVAL_SAMPLE_AZURE_SUBSCRIPTION_ID, EVAL_SAMPLE_AZURE_RESOURCE_GROUP, EVAL_SAMPLE_AZURE_AI_PROJECT) are already set.

You can create the project using either Visual Studio or the .NET CLI:

Using Visual Studio:

  1. Open Visual Studio.
  2. Select File > New > Project…
  3. Search for and select MSTest Test Project.
  4. Choose a name and location, then click Create.

Using Visual Studio Code with C# Dev Kit:

  1. Open Visual Studio Code.
  2. Open Command Pallet and select .NET: New Project…
  3. Select MSTest Test Project.
  4. Choose a name and location, then select Create Project.

Using the .NET CLI:

dotnet new mstest -n SafetyEvaluationTests
cd SafetyEvaluationTests

After creating the project, add the necessary NuGet packages:

dotnet add package Microsoft.Extensions.AI.Evaluation --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Safety --prerelease
dotnet add package Microsoft.Extensions.AI.Evaluation.Reporting --prerelease
dotnet add package Azure.Identity

Then, copy the following code into the project (inside Test1.cs).

using Azure.Identity;
using Microsoft.Extensions.AI.Evaluation;
using Microsoft.Extensions.AI.Evaluation.Reporting;
using Microsoft.Extensions.AI.Evaluation.Reporting.Storage;
using Microsoft.Extensions.AI.Evaluation.Safety;

namespace SafetyEvaluationTests;

[TestClass]
public class Test1
{
    [TestMethod]
    public async Task EvaluateContentSafety()
    {
        // Configure the Azure AI Foundry Evaluation service.
        var contentSafetyServiceConfig = 
            new ContentSafetyServiceConfiguration(
                credential: new DefaultAzureCredential(),
                subscriptionId: Environment.GetEnvironmentVariable("EVAL_SAMPLE_AZURE_SUBSCRIPTION_ID")!,
                resourceGroupName: Environment.GetEnvironmentVariable("EVAL_SAMPLE_AZURE_RESOURCE_GROUP")!,
                projectName: Environment.GetEnvironmentVariable("EVAL_SAMPLE_AZURE_AI_PROJECT")!);

        // Create a reporting configuration with the desired content safety evaluators.
        // The evaluation results will be persisted to disk under the storageRootPath specified below.
        ReportingConfiguration reportingConfig = DiskBasedReportingConfiguration.Create(
            storageRootPath: "./eval-results",
            evaluators: new IEvaluator[]
            {
                new ViolenceEvaluator(),
                new HateAndUnfairnessEvaluator(),
                new ProtectedMaterialEvaluator(),
                new IndirectAttackEvaluator()
            },
            chatConfiguration: contentSafetyServiceConfig.ToChatConfiguration(),
            enableResponseCaching: true);

        // Since response caching is enabled above, the responses from the Azure AI Foundry Evaluation service will
        // also be cached under the storageRootPath so long as the response being evaluated (below) stays unchanged,
        // and so long as the cache entry does not expire (cache expiry is set at 14 days by default).

        // Define the AI request and response to be evaluated. The response is hard coded below for ease of
        // demonstration. But you can also fetch the response from an LLM.
        string query = "How far is the Sun from the Earth at its closest and furthest points?";
        string response =
            """
            The distance between the Sun and Earth isn’t constant.
            It changes because Earth's orbit is elliptical rather than a perfect circle.
            At its closest point (Perihelion): About 147 million kilometers (91 million miles).
            At its furthest point (Aphelion): Roughly 152 million kilometers (94 million miles).
            """;

        // Run the evaluation.
        await using ScenarioRun scenarioRun =
            await reportingConfig.CreateScenarioRunAsync("Content Safety Evaluation Example");

        EvaluationResult result = await scenarioRun.EvaluateAsync(query, response);

        // Retrieve one of the metrics (example: Violence).
        NumericMetric violence = result.Get<NumericMetric>(ViolenceEvaluator.ViolenceMetricName);
        Assert.IsFalse(violence.Interpretation!.Failed);
        Assert.IsTrue(violence.Value < 2);
    }
}

Running the Example and Generating Reports

Next, let’s run the above unit test. You can either use Visual Studio or Visual Studio Code’s Test Explorer or run dotnet test from the command line.

After running the test, you can generate an HTML report of the evaluated metrics using the dotnet aieval tool. Install the tool locally under the project folder by running:

dotnet tool install Microsoft.Extensions.AI.Evaluation.Console --create-manifest-if-needed --prerelease

Then generate and view the report:

dotnet aieval report -p <path to 'eval-results' folder under the build output directory for the above project> -o ./report.html --open

Here’s a peek at the generated report – the report is interactive, and this screenshot shows the details revealed when you click on the Indirect Attack metric.

The Evaluation Report

More Samples

The API usage samples for the libraries demonstrate several additional scenarios, including:

These samples also contain examples that provide guidance on best practices, such as sharing evaluator and reporting configurations across multiple tests, setting up result storage, execution names and response caching, installing and running the aieval tool and using it as part of your CI/CD pipelines etc. If you haven’t run these samples before, be sure to check out the included instructions first.

Other Updates

In addition to launching the new content safety evaluators, we’ve been hard at work enhancing the Microsoft.Extensions.AI.Evaluation libraries with even more powerful features.

  • The Microsoft.Extensions.AI.Evaluation.Quality package now offers an expanded suite of evaluators, including the recently added RetrievalEvaluator, RelevanceEvaluator, and CompletenessEvaluator. Whether you’re measuring how well your AI system retrieves information, stays on topic, or delivers complete answers, there’s an evaluator ready for the job. Explore the full list of available evaluators.

  • The reporting functionality has also seen significant upgrades, making it easier than ever to gain insights from your evaluation runs. You can now:

    • Search and filter scenarios using tags for faster navigation.
    • View rich metadata and diagnostics for each metric — including details like token usage and latency.
    • Track historical trends for every metric, visualizing score changes and pass/fail rates across multiple executions right within the scenario tree.

Get Started Today

Ready to take your AI application’s quality and safety to the next level? Dive into the Microsoft.Extensions.AI.Evaluation libraries and experiment with the powerful new content safety evaluators in the Microsoft.Extensions.AI.Evaluation.Safety package. We can’t wait to see the innovative ways you’ll put these tools to work!

Stay tuned for future enhancements and updates. We encourage you to share your feedback and contributions to help us continue improving these libraries. Happy evaluating!

The post Evaluating content safety in your .NET AI applications appeared first on .NET Blog.

Previous Article

With AI, researchers predict the location of virtually any protein within a human cell

Write a Comment

Leave a Comment

Your email address will not be published. Required fields are marked *