
Training data is the foundation of many AI-implemented innovations. The model which has been trained on the data set, plus the training method and use of the trained model for a particular application are all crucial but, in many cases, pulling together the training data represents one of the most substantial investments in the road to delivering an AI-implemented innovation.
This is particularly true for MedTech AI innovations, where the training data often consists of large datasets including huge volumes of real-world data, for example patient records in medical diagnosis or wet experiments in drug discovery.
Innovators invest significant resources to generate this data and it requires careful protection.
In our previous article, we considered a number of unregistered rights which might provide protection for training data for AI-based MedTech innovations.
In particular, we looked at database rights, copyrights, and trade secret provisions. Each of these are valuable rights, but importantly these rights can only be enforced where there has been copying of the training data in question.
In other words, these rights would not prevent a comparably well-resourced competitor from assembling its own training data set.
Patent rights on the other hand could be used to prevent a competitor from exploiting an independently obtained set of training data and/or model trained thereon. However, careful consideration must be given to drafting claims to capture this.
In this article, we will first touch on how the selection of particular training data might support an inventive step for a patent claim. We will then look at the different claim forms which might be available for AI-implemented MedTech inventions, and what they might cover.
Can training data support an inventive step?
As discussed in previous articles, the requirement for an inventive step can be met by AI inventions in a number of ways. For example, we might consider the case of an applied MedTech AI invention in which a model, trained on a particular set of training data, performs a medical purpose.
Let us consider a hypothetical invention, where an AI model is used to identify the occurrence of a particular disease based on a combination of heart rate and skin temperature readings. The invention is based on a classifier model which has been trained on a large volume of heart rate and skin temperature readings, collated over a number of years across different patients, with meticulous record keeping as to which patients did and did not develop the disease.
Different aspects of the innovation might provide an inventive step which would support a patent claim. For example, the inventors might have identified a particular (previously known) classifier model which is particularly well-suited to identifying the disease in question – that is to say the invention might lie in the selection of a particular model.
Alternatively, the fact that the disease in question is associated with heart rate and skin temperature data might be previously unknown. In these circumstances, the selection of these data in particular for the purpose of training a model might support an inventive step. As a further alternative, it might be that the heart rate and skin temperature data has been acquired through some new and innovative means (e.g. an inventive new device).
In these circumstances, it may be that a patentable invention lies in the training a model using that particular training data (irrespective of the specifics of the model itself).
Claims to protect training data inventions
Broadly speaking, patent claims can be directed to new and inventive products or methods. Under European practice the term ‘products’ includes computer program products. Protection conferred by a method or process claim is extended to include the direct products of the claimed method or process.
With this in mind, we can consider a few examples of different patent claims which could be used to protect inventions which rely on the use of an innovative training data set.
Claims directed to a trained model
One option would be to claim a computer program product which is defined by a trained model trained on a particular training data set. Another option might be to claim a method of obtaining a trained model, by training the model on a training data set.
In principle, if the specified training data can be used to establish novelty and inventive step, then it may be possible to establish patentability without being overly specific about the model architecture. Claims of this kind could cover the activities of any parties dealing in a model which has been trained on the training data – either by virtue of dealing in a patented product (in the case of the computer program product claim) or dealing in the product of a patented method (in the case of the claim to a method of obtaining the trained model).
Claims directed to the training data set – method claims
Whilst the claims described above could cover the activities of third parties dealing in a trained model, these claims are unlikely to cover a third party dealing only in the data itself. It might also be difficult to determine whether a particular trained model was derived from a training data set as defined in the claim.
Ideally, claims should be included which cover the training data set itself (without requiring it to be used in training a model). This could be helpful in instances where an intermediary was acquiring the training data to be used by a third party in training a model, or where a patent holder was only able to provide evidence for the collection of the training data (without being able to evidence that it was being used to train a model).
We could consider including a claim to a method of obtaining a training data set, setting out the steps used to acquire the data. If the steps used to acquire the data were technical and supported an inventive step, then the method itself should be patentable. The claim may cover a third party obtaining their own training data through the claimed method and moreover, because the direct product of this method is the training data set itself, the claim may cover any other party dealing in the training data set.
Claims directed to the training data set – product claims?
The downside to relying on method claims in respect of third parties dealing in the products of those methods is that, to enforce the claim, one would generally have to provide evidence that the product was obtained through the patented method (rather than any other method).
This leaves the question – can we draft product claims which cover the training data itself?
Here are two examples of ways in which we might want to do that, in the form of two example claims:
- A training data set, comprising X.
- A record carrier or computer readable medium embodying a training data set comprising X.
In principle, if selecting ‘X’ as training data was itself new and inventive, then a claim directed to the training data set itself might meet the novelty and inventive step requirements.
However, it may be that the EPO would consider our example claim 1 as lacking ‘technical character’, in which case it would be be ineligible for patent protection. For example, it may be deemed excluded from patentability under Article 52(2)(d) EPC which stipulates that ‘presentations of information’ are not to be considered inventions. The EPO might object to claim 1 as being directed to the information, as such, and so be ineligible for patent protection.
Example claim 2, on the other hand, would not be excluded from patentability for this reason because it is directed to a record carrier or computer readable medium. Under established case law of the EPO, these are considered to have technical character such that the claim would not be excluded from patentability. However, under the EPO’s ‘two-hurdle’ approach to assessing computer-related inventions, in determining whether the claim was inventive, the EPO would consider whether the remaining part of the claim (i.e. the training data set) is technical in nature i.e. does it serve a technical purpose?
In this case, the remaining part of the claim is characterised by the training data itself. At the EPO, in principle, structured data can contribute to the technical character of the claim in certain circumstances. In particular, the EPO case law and guidelines draw a distinction between ‘functional’ and ‘cognitive’ data, with only functional data considered to have a technical character:
As stated in the EPO’s Guidelines at G-II.3.6.3:
“A data structure or format contributes to the technical character of the invention if it has an intended technical use and it causes a technical effect when used according to this intended technical use. Such a potential technical effect related to an implied technical use is to be taken into account in assessing inventive step (G 1/19). The data structure or format is functional data, i.e. if it has a technical function in a technical system, such as controlling the operation of the device processing the data. Functional data inherently comprise, or map to, the corresponding technical features of the device. Cognitive data, on the other hand, are those data whose content and meaning are only relevant to human users and do not contribute to producing a technical effect.”
This means that, in principle, claim 2 could be found to lack an inventive step if the EPO were to consider our ‘training data comprising X’ as merely cognitive data. Could we argue that the training data is functional? This point is largely untested and ultimately might hinge on whether that data ‘comprise, or map to’ corresponding technical features of a device.
At first glance, we might assume that the data would not have a technical nature – it does not map to a computer device, nor does it control a device processing the training data, as such. Rather it will ultimately be used in the training of a model. However, it may not be so clear cut. Some case law at the EPO points toward a very broad reading of controlling the operation of a device processing the data. The decision in T 1351/04 found that “[A]n index structure used for searching a record in a database produces a technical effect since it controls the way the computer performs the search operation.”
By analogy, it could be argued that our training data controls the way in which the computer will perform the operation of training a model. In turn, this means that the data can be considered to contribute to the technical character of the invention, and our example claim 2 therefore meets the EPO’s patentability requirements.
In principle, we could draft a product claim directed specifically to the training data set itself. This could cover the activities of any party dealing in a training data set as defined in the claim (i.e. creating a training data set, using the training data set to train a model, or providing the training data set to another party).
Conclusion
Training data sets are valuable assets and we have discussed a number of ways in which patent claims can be drafted to offer protection for those training data sets and models derived therefrom. Of course, different claim forms will offer subtly different scopes of protection, and different claim forms will be more or less useful against different infringers in different circumstances. The most successful strategy should combine a number of different claim forms to provide comprehensive protection for MedTech AI inventions.
Carpmaels & Ransford has a wealth of experience in this field, working with innovators to create effective and comprehensive IP strategies that will protect innovations in the long term. If you have any questions about securing patents to protect your MedTech AI innovations, please do get in touch.