Protecting training data for AI innovations in the MedTech space – Part 1
Unregistered intellectual property rights

The heart of many AI-implemented innovations will be a model which has been trained on a set of training data.

Whilst the model, the training method, and the use of the trained model for a particular application are all crucial, the training data itself will often be fundamental to the practical realisation of an AI innovation.

Moreover, for AI innovations in the MedTech space in particular, the training data will often comprise substantial datasets, representing large volumes of real-world data, whether that’s wet experiments in drug discovery, or patient records in medical diagnosis.

Training data can represent a significant investment on the part of innovators. The question is – how can you use intellectual property rights to protect this investment?

In this article, we consider the different intellectual property rights available for protecting MedTech training data, looking in particular at unregistered rights (such as copyright, database rights, and trade secrets). In a later article, we will also look at the extent to which patent protection can be used to protect training data.

Generally, a mix of different types of rights may be applicable, and which rights the training data will attract depends on how the data has been created, curated, and utilised.


Generally, copyright protects original creative works, allowing authors to prevent unauthorised copies of their works. On that basis one might assume that it is not relevant to training data, particularly where that training data is not original but instead is an aggregation of data from other sources. However, the Trade-Related Aspects of Intellectual Property Rights (TRIPs) agreement requires that copyright extends to databases and other compilations if they constitute an ‘intellectual creation by virtue of the selection or arrangement of their contents’, even if some or all of their contents are not themselves original or otherwise protectable by copyright.

Of course, in assembling a training data set, innovators will typically be making key decisions on which data is and isn’t included, as well as preparing the dataset in a suitable format for training. For this reason, innovators will be making conscious decisions both on the selection and the arrangement of the contents of any training data set. As a result, the provisions for copyright protection for compilations may well apply to training data.

Outside of MedTech, we have seen conflicts over the copyright infringement relating to use of third-party copyright works in training data for AI, particularly relating to large language models (LLMs) like ChatGPT. The UK Government has recently published its response following consultations for a regulatory framework for AI in the UK which references the tension between the requirement for AI developers to “easily access a wide range of high-quality datasets to develop and train cutting-edge AI systems in the UK” and concerns from rights holders in relation to use of their content in training data.

As often happens in copyright, it can be true that a work is both protected by copyright and infringes a third party’s copyright – this is especially true in relation to training data. Whatever rights an innovator has resulting from their selection and arrangement of content, they should also be mindful of having the right to use such content at the outset.

Database rights

Database rights are a unique form of intellectual property right which, similar to copyright, allow creators of databases to prevent unauthorised copies of the database which they have created.

For a database to attract database rights, there needs to have been a substantial investment in obtaining, verifying, or presenting the data. The ‘investment’ can be in terms of human, financial or technical resources, and is a bar which is likely to be met by the preparation of training data for MedTech applications.

The contents of the database do not have to be original, and, in contrast to copyright, there does not need to be any intellectual creation in the selection or arrangement of the contents of the database.

For example, even if there was no ‘intellectual creation’ in selecting a particular type of training data, or a particular format for the data to make it suitable for use in training a model (meaning that the database would not attract copyright), if it has nevertheless taken resources to acquire, aggregate and format the training data, then it will attract database rights.

Database rights are available both in the UK and in the European Economic Area (EEA), with the law currently harmonised through the EU Database Directive and UK Copyright and Rights in Databases Regulations 1997. Following Brexit, however, reciprocal recognition between the EEA and the UK for database rights holders ceased on 1st January 2021 which means that database works created after that date are only protected within the creator’s jurisdiction (i.e. a database created in the UK would not be protected by database rights in the EEA, and vice versa).

Confidential information and trade secrets

The law surrounding trade secrets and confidential information varies from country to country, although there has been some degree of harmonisation through TRIPs and the EU Trade Secret Directive (implemented in the UK through the Trade Secrets Regulations 2018).

In the UK, the statutory definition of a ‘trade secret’ is information which is: (a) secret; (b) has commercial value because it is secret; and (c) has been subject to reasonable steps to keep it secret.

Trade secret provisions also generally allow for rights holders to take action against parties who have disclosed, acquired or used their trade secrets (not just those who breached the confidentiality of the information by disclosing it). For example, an innovator may be able to take action against both the individual who unlawfully acquired the innovator’s trade secrets and any competitor who used those trade secrets without the innovator’s permission. Imagine this common scenario: a former employee has absconded with a copy of an innovator’s confidential information. The former employee took that confidential information to a competitor who then used it in its business. Under the Trade Secrets Regulations and the EU Directive, it may be possible for the innovator to take action against not just the ex-employee but also their competitor who may have rather deeper pockets.

Due to its importance in AI-implemented MedTech innovations, keeping training data confidential will likely provide significant commercial value to an innovator. On that basis, so long as the training data is secret, and has been subject to reasonable steps to keep it secret, then training data would likely be considered a trade secret. From a practical standpoint, of course, this means that innovators need to put in place internal processes to make sure their training data is kept secret. These internal processes can require a considerable amount of investment from companies and should be thought of in the round together with their innovation capture processes, and data protection and IT security policies and procedures.

What about patents?

In this article, we have focussed on various unregistered rights which may be used to protect training data for MedTech AI innovations. Each of these rights are important, but for any of these rights to have been infringed then another party must have used a copy of the training data in question. In other words, these rights would not prevent a comparably well-resourced competitor from assembling its own training data set.

Patent rights on the other hand could be used to prevent a competitor from exploiting an independently obtained set of training data and/or model trained thereon. In our next article, we will look at how patents might be used to protect training data.


Training data sets are valuable assets and we have covered a number of unregistered intellectual property rights which innovators may be able to use to protect training data used in their AI MedTech innovations. The particular rights available will vary depending on the circumstances:

  • If the assembly of a set of training data is an ‘intellectual creation’ by virtue of the selection or arrangement of the data, the training data set as a whole may attract copyright (even where the individual data does not).
  • If there has been a substantial investment in obtaining, verifying, or presenting the data in a training data set, then the data set may attract database rights.
  • If a training data set is secret, and reasonable steps have been taken to maintain its secrecy, the training data set may constitute a trade secret.

Carpmaels & Ransford has a wealth of experience in this field, working with innovators to create effective and comprehensive IP strategies that will protect innovations in the long term. If you have any questions about suitable intellectual property rights to protect your MedTech AI innovations, please do get in touch.