Data-driven drug development: applied AI - Carpmaels & Ransford

AI is now a feature of all stages of the drug development pipeline, from rational drug design right through to diagnosis and prognosis. In our last article, we talked about the use of in silico data to support life sciences patent applications, but this is not the only way computer processing and data-related inventions are being used in the life sciences industry.

In this article, we provide some brief practical guidance for applicants preparing applications at the interface between applied AI and life sciences.

What we mean by applied AI

As brief background, applied AI is usually a form of machine learning model that has been “trained” such that, when presented with appropriate data, it can make a decision about that data. As part of creating that trained model, an initial model with a series of parameters is continually adjusted by showing it examples in the form of training data.

One example of a machine learning model used in MedTech is when making analyte measurement determinations, e.g. for glucose in managing diabetes. As an example, a trained classifier, could be trained on 1000 glucose measurements and asked to determine whether each measurement indicates a hyperglycaemic condition, or not. In this sort of scenario, after setting up an initial model, the training process involves inputting the 1000 labelled glucose measurements, to train the classifier to correctly identify the condition as being present or not. Based on whether the model is correct, it adjusts itself for the next glucose measurement. Once the training phase is complete, the now trained classifier that has learned to differentiate glucose measurements, is input with the next measurement, 1001, and is able to decide whether measurement 1001 is hyperglycaemic, or not, thereby replicating the activity of a healthcare professional.

It is fair to say that this form of applied AI is, in general, very good at making ‘decisions’ on large data sets. Since many pharmaceutical or biotech applications require decisions about data to be made, the potential of applied AI being used in innovations in this sector seems high. Nevertheless, there are certain pitfalls that need to be navigated when obtaining patents in Europe.

Sufficiency

Article 83 EPC requires that any “European patent application shall disclose the invention in a manner sufficiently clear and complete for it to be carried out by a person skilled in the art.”

For claims with an applied AI or machine learning component, the machine learning model is usually defined functionally by the effect that it produces, for example “a trained classifier for discriminating whether disease Y has been treated”. When such a classifier is presented with data of a similar type to the data on which it was trained, it will be able to discriminate between data representative of successful treatment and data that indicates unsuccessful treatment. In this way, it makes a decision about the data with which it is presented.

The EPO has recognised that functional definitions can be problematic in this technical area because the model is something of a black box. In other words, we don’t know exactly how it works because we don’t know the various weights and thresholds that have been given to the different processes within that model during its training, nor do we know how those weights and thresholds arrive at the ‘decision’ to provide a particular output.

The nature of a machine learning model means that it is difficult to predict whether a machine learning model will provide a particular effect. Furthermore, if a machine learning model is trained differently (i.e., on different training data), the trained model will have different weights and thresholds, which will lead to a different black box. In practice, this means that when the same input is given to two different machine learning models (say one trained on a first set of data, and another identically set up but trained on a second set of data) it can lead to two different outputs. This has significant implications for sufficiency (as noted in decision T 161/18 because it can be difficult or even impossible to reliably reproduce what the applicant was claiming unless enough information has been given about how to set up and train the model.

Whilst this is a developing area, the trend so far is that a mere functional definition of a machine learning model is not enough, by itself, to enable the skilled person to reproduce the invention. Rather, since the skilled person needs more information to reliably make the applied AI portion of the claim, and thereby meet the sufficiency requirement, it is helpful to consider what information would be needed in a patent application which is to be successfully prosecuted at the EPO and meet the sufficiency requirement.

What information is needed?

So, what does need to be provided? There is sadly no one-size fits all approach, since the information required to meet the sufficiency requirement will differ depending on the technology in question so as to enable the skilled person to put the invention into effect. However, there is certain information that applications will require to sufficiently disclose an applied AI portion of a claim:

A definition of the applied AI to give some indication of the inner workings of the black box: One example would be disclosing that the model is an artificial neural network with a specified number of hidden layers, each having a specified number of nodes, alongside an input layer and an output layer. For example, if the machine learning model is being trained to recognise features in an image (say a photo of a patient’s skin), each node of the input layer might represent a single pixel in that image.
Information about the training data: It is not practical to include the whole set of training data, but some details are needed. These could be in the form of:
- a reference to a publicly available dataset (i.e., a specific dataset) if available.
- a description of the set of training data in enough detail to allow it to be recreated e.g. hourly glucose measurements of a single patient, taken continuously over a period of 1 month, those measurements categorised by a healthcare practitioner.
- Training parameters, such as the number of training epochs, the regularization strength, and the optimization algorithm (the hyperparameters of the applied AI model).

Overall, the question to ask is whether the description of the model and the description of the training data are each specific enough that someone can reproduce that model or dataset, and thereby reproduce the claimed machine learning model with the same function. The training data doesn’t itself need to be identical to the data that the applicant used to train its own model. However, after training as described in the application, a third party must be able to arrive at the machine learning-based model having the same effect recited in the claim. In this respect, the information requirement for applied AI is generally higher than for other computer-implemented inventions (where it is generally assumed that any functionally defined claim feature may be programmed into a computer by a sufficiently competent skilled person). Further, from experience, it appears that the information burden is higher in Europe than in the US.

Practice points

The pharma and biotech spaces offer many opportunities for the development of applied AI inventions. If these innovations are to be protected by patents then, as ever, the onus is on the applicant to hold up its end of the bargain and explain how to arrive at the claimed invention, including any applied AI component, and in doing so, will need to address the EPO’s sufficiency of disclosure requirements

As is often the case this is an evolving area and we await further guidance from the EPO and national courts as to what the precise requirements on patent applications will be, and how those requirements will develop when protection for claims involving applied AI is sought. Nevertheless, it seems safe to say that the points highlighted above will form at least some of the minimum requirements that are needed for a sufficient disclosure. For now and as ever, if you have further questions then please get in touch with your usual Carpmaels contact.

Cookie	Duration	Description
__cf_bm	1 hour	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
_GRECAPTCHA	6 months	Google Recaptcha service sets this cookie to identify bots to protect the website against malicious spam attacks.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_hjSession_*	1 hour	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjSessionUser_*	1 year	Hotjar sets this cookie to ensure data from subsequent visits to the same site is attributed to the same user ID, which persists in the Hotjar User ID, which is unique to that site.
_hjTLDTest	session	To determine the most generic cookie path that has to be used instead of the page hostname, Hotjar sets the _hjTLDTest cookie to store different URL substring alternatives until it fails.

What we mean by applied AI

Sufficiency

What information is needed?

Practice points

Authors & Experts

Other news

Offices

Careers

Our Firm

Resources

Offices

Careers