Data-driven drug development: in silico data
Using data derived from AI to support life sciences patents

AI has the potential to support all stages of the drug development pipeline, from the earliest stages of rational drug design through to analysis of clinical trial results. But what does this mean for inventions derived from AI, and how can data derived from AI and computer models be used to support EP applications in the life sciences field? There is limited guidance from the EPO Boards of Appeal to date specifically on in silico data. However, the general principles applied by the EPO to assessing all types of evidence in support of a technical effect can shed some light on how applicants having in silico data might best proceed.

In a previous article, we consider the patentability for AI inventions at the EPO. In this article, we consider how the EPO’s current approach to assessing patentability might affect applications that can use in silico data. In a later article, we will further consider the sufficiency requirements set by the EPO for applied-AI inventions.

Sufficiency and inventive step requirements at the EPO – is data needed?

First, the fundamentals.

For a patent application to meet the sufficiency requirement under Article 83 EPC, the application must disclose the invention in a manner sufficiently clear and complete for it to be carried out by the skilled person.

A NCE (new chemical entity) case typically claims the pharmaceutical product per se, often via Markush formulae and/or as an individualised disclosure of the compound itself. When assessing sufficiency, the primary question often asked by the EPO is whether the skilled person, based on the information in the application as filed and their common general knowledge, can obtain or make the claimed compound(s). The types of supporting information that might be required in the application as filed to meet this requirement include synthesis schemes or details regarding the manufacturing processes. For such NCE product claims, whether a technical effect is achieved should not typically be relevant for assessing sufficiency, because the claim itself does not recite the desired technical effect.

In contrast, a medical use claim or second medical use claim is purpose-limited, meaning that attaining the claimed therapeutic effect is a technical feature of the claim. Thus, to meet the requirements of sufficiency, the therapeutic efficacy of the claimed product for the claimed indication must be credible.

The case law confirms that data, let alone clinical data, is not always required to establish sufficiency. However, mere verbal statements in the application or patent that the product can treat the claimed indication will not always be considered enough by the EPO to make the therapeutic efficacy credible, for example if there is no understanding of the mechanism of action underlying the treatment, or if there are substantiated doubts that the underlying mechanism of action can be put into practice. Applicants should therefore aim to include as much data as possible in the application as filed that helps makes the claimed therapeutic effect credible in order to meet the requirements of Article 83 EPC.

Evidence of  a technical effect is usually required for establishing an inventive step of both NCE product claims and medical use claims. Furthermore, for broad claims, such as broad Markush formulae encompassing a large number of compounds, evidence is often required to show that it is credible that substantially all claimed compounds possess this technical effect.  Thus, applicants should also aim to include as much data as possible in the application as filed that demonstrates unexpected advantages and technical effects associated with their product(s) in order to help establish an inventive step under Article 56 EPC.

The potential use of in silico data

There is no definitive rule on the type of data that will be required to meet the sufficiency and inventive step requirements at the EPO, and indeed, the Board in T 801/06 confirmed that a “claimed therapeutic effect may be proven by any kind of data as long as they clearly and unambiguously reflect the therapeutic effect” (with respect to sufficiency), and the principle of free evaluation of evidence has been emphasised in numerous EPO Board of Appeal decisions. Considering the sufficiency of a medical use claim in T1642/06, the Board found that “…the determinative factor for a finding of such support is that…i.e., the skilled person understands on the basis of generally accepted models that the results in the application directly and unambiguously reflect the claimed therapeutic applications…

Applicants may therefore be optimistic that the EPO should not dismiss in silico data at face value, because this would seem to go against the EPO’s principle of free evaluation of evidence. However, applicants should expect the probative value of in silico data, just like other types of data, to be closely scrutinized by the EPO. This approach seems to have been endorsed by T 898/05 (albeit in the context of industrial applicability under Article 57 EPC), where the Board found that:

“…The fact that the putative function of the Zcytor1 receptor was assigned in the examples based on computer-assisted methods, rather than on the basis of traditional wet-lab techniques, does not mean that it has to be automatically disregarded or excluded from a careful and critical examination. There is no “all-encompassing” approach, and certainly not a “throw-into-the-bin” approach, for these in-silico examples. Their probative value has to be examined on a case-by-case basis regarding the nature of the invention and the prior art relating thereto. Such methods of analysis are increasingly becoming an integral part of scientific investigations and can often allow plausible conclusions to be made regarding the function of a product before it is actually tested…” (emphasis added).

The ability to rely on in silico data may be particularly beneficial for applicants because these techniques can allow innovators to investigate large sets of compounds quickly and cost-effectively. For example, if an applicant wanted to generate data to show that compounds across the entire scope of  broad formulae are active inhibitors of a particular protein target, the applicant could seek to generate binding data from a computer-assisted docking study with more ease than that typically required to synthesise and test numerous compounds in an in vitro or in vivo assay.

However, there is a risk that the EPO (or third parties) would object that such in silico data has a low probative value, particularly if they conclude that the computer model could have been designed to support (more or less) whatever assertion the applicant wished to make. If relying on in silico data, applicants should therefore consider how to make the in silico data and the underlying computer model as technically credible as possible. Returning to the computer-assisted docking study example, the following factors might be useful for emphasising the probative value of the data:

  • whether the target protein x-ray structure is known and widely accepted in the field
  • whether the modelling method accurately takes into account the different conformations a ligand could assume in a binding pocket
  • whether the modelling method takes into account factors such as solvation, the quality of the input ligand data
  • whether it is credible that any binding trends predicted by the in silico data would actually translate into improved inhibition of the target
  • whether the computer model can be validated, for example by providing evidence which shows alignment between some of the model output data and more widely accepted in vitro or in vivo output data.

Ultimately, in silico data is based on a prediction of the properties of compounds, whereas wet-lab data is obtained from making and testing the properties of the compounds. Therefore, wet-lab data might inherently be considered more credible than in silico data by the EPO. Whilst this does not mean that in silico data can never be useful, it does mean that the safest approach would be for applicants to avoid relying solely on in silico data, where possible.

When is data required before the EPO?

Applicants should aim to include as much data as possible in the original filing. It will sometimes be possible for applicants/patentees to rely on post-published data at the EPO, at least in the context of inventive step. However, the circumstances under which post-published data may be relied on are still being clarified by the Boards of Appeal of the EPO following the recent G2/21 Enlarged Board of Appeal (EBA) decision.

It is interesting to note that the questions referred to the EBA in G2/21 referred generally to “evidence, such as experimental data”. Therefore, the EBA’s decision in G2/21 would appear to apply to all types of post-published evidence, including post-published in silico data. The EBA decision confirms that post-published evidence may be used to prove a technical effect relied upon for acknowledgement of an inventive step, and states that:

“A patent applicant or proprietor may rely upon a technical effect for inventive step if the skilled person, having the common general knowledge in mind, and based on the application as originally filed, would derive said effect as being encompassed by the technical teaching and embodied by the same originally disclosed invention”.

There is no concrete guidance about what information might be required in the original application to ensure that a technical effect meets the requirement of being “derived as being encompassed by the technical teaching and embodied by the same originally disclosed invention”, although case law is already starting to develop in this area as Board of Appeal decisions start to apply the G2/21 tests. The uncertainty regarding the situations in which post-published evidence can be relied upon is another reason why it would be safest for applicants to include as much data as possible, including some wet-lab data, in their original filings.


The EPO’s approach to assessing sufficiency and inventive step continues to push applicants towards a later filing date, so that they can include as much supporting data as possible in the original application.

If in silico data is readily available to applicants (e.g., as part of their drug-discovery pipeline, or because it can be readily generated), applicants should consider incorporating this data into their original application.  For example, a pragmatic approach for applicants striving for an early filing date might be to include wet-lab data for at least a subset of the invention (e.g., to show a technical effect achieved for lead compounds within a broad Markush formula), and to also include in silico data showing that the technical effect is also achieved across a broader scope of the formulae. We have seen successful instances of this approach at the EPO.

However, the probative value of the in silico data will depend on the specific facts, such as the level of detail given in the application as filed regarding the computer model(s) used, and the credibility of the simulation. In view of this uncertainty, applicants should continue to strive to include the same level and quality of in vitro and/or in vivo supporting data in the application as filed that would be expected for inventions that were not made using computer modelling techniques.

The use of AI and computer models in drug discovery is an area of IP that is likely to continue rapidly changing, and we should get more guidance from the EPO as the case law in this area develops.