Data-driven drug development: applied AI
21
Mar
2024
The EPO’s requirements for sufficiency in applied AI and machine learning inventions

AI is now a feature of all stages of the drug development pipeline, from rational drug design right through to diagnosis and prognosis. In our last article, we talked about the use of in silico data to support life sciences patent applications, but this is not the only way computer processing and data-related inventions are being used in the life sciences industry.

In this article, we provide some brief practical guidance for applicants preparing applications at the interface between applied AI and life sciences.

What we mean by applied AI

As brief background, applied AI is usually a form of machine learning model that has been “trained” such that, when presented with appropriate data, it can make a decision about that data. As part of creating that trained model, an initial model with a series of parameters is continually adjusted by showing it examples in the form of training data.

One example of a machine learning model used in MedTech is when making analyte measurement determinations, e.g. for glucose in managing diabetes. As an example, a trained classifier, could be trained on 1000 glucose measurements and asked to determine whether each measurement indicates a hyperglycaemic condition, or not. In this sort of scenario, after setting up an initial model, the training process involves inputting the 1000 labelled glucose measurements, to train the classifier to correctly identify the condition as being present or not. Based on whether the model is correct, it adjusts itself for the next glucose measurement. Once the training phase is complete, the now trained classifier that has learned to differentiate glucose measurements, is input with the next measurement, 1001, and is able to decide whether measurement 1001 is hyperglycaemic, or not, thereby replicating the activity of a healthcare professional.

It is fair to say that this form of applied AI is, in general, very good at making ‘decisions’ on large data sets. Since many pharmaceutical or biotech applications require decisions about data to be made, the potential of applied AI being used in innovations in this sector seems high. Nevertheless, there are certain pitfalls that need to be navigated when obtaining patents in Europe.

Sufficiency

Article 83 EPC requires that any “European patent application shall disclose the invention in a manner sufficiently clear and complete for it to be carried out by a person skilled in the art.”

For claims with an applied AI or machine learning component, the machine learning model is usually defined functionally by the effect that it produces, for example “a trained classifier for discriminating whether disease Y has been treated”. When such a classifier is presented with data of a similar type to the data on which it was trained, it will be able to discriminate between data representative of successful treatment and data that indicates unsuccessful treatment. In this way, it makes a decision about the data with which it is presented.

The EPO has recognised that functional definitions can be problematic in this technical area because the model is something of a black box. In other words, we don’t know exactly how it works because we don’t know the various weights and thresholds that have been given to the different processes within that model during its training, nor do we know how those weights and thresholds arrive at the ‘decision’ to provide a particular output.

The nature of a machine learning model means that it is difficult to predict whether a machine learning model will provide a particular effect. Furthermore, if a machine learning model is trained differently (i.e., on different training data), the trained model will have different weights and thresholds, which will lead to a different black box. In practice, this means that when the same input is given to two different machine learning models (say one trained on a first set of data, and another identically set up but trained on a second set of data) it can lead to two different outputs. This has significant implications for sufficiency (as noted in decision T 161/18 because it can be difficult or even impossible to reliably reproduce what the applicant was claiming unless enough information has been given about how to set up and train the model.

Whilst this is a developing area, the trend so far is that a mere functional definition of a machine learning model is not enough, by itself, to enable the skilled person to reproduce the invention. Rather, since the skilled person needs more information to reliably make the applied AI portion of the claim, and thereby meet the sufficiency requirement, it is helpful to consider what information would be needed in a patent application which is to be successfully prosecuted at the EPO and meet the sufficiency requirement.

What information is needed?

So, what does need to be provided? There is sadly no one-size fits all approach, since the information required to meet the sufficiency requirement will differ depending on the technology in question so as to enable the skilled person to put the invention into effect. However, there is certain information that applications will require to sufficiently disclose an applied AI portion of a claim:

  • A definition of the applied AI to give some indication of the inner workings of the black box: One example would be disclosing that the model is an artificial neural network with a specified number of hidden layers, each having a specified number of nodes, alongside an input layer and an output layer. For example, if the machine learning model is being trained to recognise features in an image (say a photo of a patient’s skin), each node of the input layer might represent a single pixel in that image.
  • Information about the training data: It is not practical to include the whole set of training data, but some details are needed. These could be in the form of:
    • a reference to a publicly available dataset (i.e., a specific dataset) if available.
    • a description of the set of training data in enough detail to allow it to be recreated e.g. hourly glucose measurements of a single patient, taken continuously over a period of 1 month, those measurements categorised by a healthcare practitioner.
    • Training parameters, such as the number of training epochs, the regularization strength, and the optimization algorithm (the hyperparameters of the applied AI model).

Overall, the question to ask is whether the description of the model and the description of the training data are each specific enough that someone can reproduce that model or dataset, and thereby reproduce the claimed machine learning model with the same function. The training data doesn’t itself need to be identical to the data that the applicant used to train its own model. However, after training as described in the application, a third party must be able to arrive at the machine learning-based model having the same effect recited in the claim. In this respect, the information requirement for applied AI is generally higher than for other computer-implemented inventions (where it is generally assumed that any functionally defined claim feature may be programmed into a computer by a sufficiently competent skilled person). Further, from experience, it appears that the information burden is higher in Europe than in the US.

Practice points

The pharma and biotech spaces offer many opportunities for the development of applied AI inventions. If these innovations are to be protected by patents then, as ever, the onus is on the applicant to hold up its end of the bargain and explain how to arrive at the claimed invention, including any applied AI component, and in doing so, will need to address the EPO’s sufficiency of disclosure requirements

As is often the case this is an evolving area and we await further guidance from the EPO and national courts as to what the precise requirements on patent applications will be, and how those requirements will develop when protection for claims involving applied AI is sought. Nevertheless, it seems safe to say that the points highlighted above will form at least some of the minimum requirements that are needed for a sufficient disclosure. For now and as ever, if you have further questions then please get in touch with your usual Carpmaels contact.