Giskard’s open source framework evaluates AI models before putting them into production | TechCrunch

Giscard is a French start-up company dedicated to developing an open source testing framework for large language models. It can alert developers to risks of bias, security vulnerabilities, and the model’s ability to generate harmful or toxic content.

While there is a lot of hype surrounding AI models, with the EU Artificial Intelligence Act and other countries on the verge of implementing regulations, machine learning testing systems will soon become a hot topic. Companies developing AI models will have to prove they comply with a set of rules and mitigate risks so they don’t have to pay hefty fines.

Giskard is a regulated AI startup and one of the first examples of development tools specifically focused on testing in a more efficient way.

“I’ve worked at Dataiku before, specifically on NLP model integration. And I could see that when I was in charge of testing, both of those things didn’t work well when you wanted to apply them to real cases, and that It’s hard to compare performance between vendors,” Giskard co-founder and CEO Alex Combessie told me.

Giskard’s testing framework consists of three components: First, the company releases An open source Python library Can be integrated into LLM projects, more specifically Retrieval Enhanced Generation (RAG) projects. It is already very popular on GitHub and is compatible with other tools in the ML ecosystem, such as Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain.

After initial setup, Giskard will help you generate a test suite that will be used regularly on your model. These tests cover a wide range of issues such as performance, illusions, error messages, non-factual output, bias, data exfiltration, harmful content generation and prompt injection.

“There are several aspects: you’re going to have the performance aspect, which is going to be the first thing that data scientists think about. But increasingly, both from a brand image ethics perspective and now from a They all have that aspect. The regulatory perspective,” Combesi said.

Developers can then integrate the tests into a continuous integration and continuous delivery (CI/CD) pipeline so that the tests are executed with every new iteration of the code base. If an issue occurs, developers receive a scan report in their GitHub repository, for example.

Tests are customized based on the model’s final use case. Companies working on RAG can provide Giskard with access to the vector database and knowledge repository so that test suites can be as relevant as possible. For example, if you are building a chatbot that can provide you with information about climate change based on the latest IPCC report and using OpenAI’s LLM, the Giskard test will check whether the model produces incorrect messages about climate change, contradicts itself, etc.

Image Source: Giscard

Giskard’s second product is the AI ​​Quality Center, which helps you debug large language models and compare them with other models.This Quality Center is part of Giskard quality productIn the future, the startup hopes to be able to produce documentation proving that the model complies with regulations.

“We started selling AI quality centers to companies like the Bank of France and L’Oréal to help them debug and find the cause of errors. In the future, this is where we will put all regulatory features,” Combesi said.

The company’s third product is called LLMon, an instant monitoring tool that evaluates LLM answers to the most common questions (toxicity, hallucinations, fact-checking…) before sending responses back to the user.

Currently, it works with companies that use OpenAI’s API and LLM as base models, but the company is working on integrations with Hugging Face, Anthropic, and others.

Specification use cases

There are many ways to regulate artificial intelligence models, and based on conversations with people in the artificial intelligence ecosystem, it is unclear whether the bill applies to the underlying models of OpenAI, Anthropic, Mistral, etc., or only to application cases.

In the latter case, Giskard seems particularly well-placed to alert developers to the potential misuse of externally-rich LLMs (or, as AI researchers call them, retrieval-augmented generation, or RAG).

Giskard currently has 20 employees. Combessie said: “We see a market that is a good fit for LLM customers, so we will double the size of our team to become the best LLM antivirus software on the market.”

Source link

Leave a Comment