It’s hard to miss the hype surrounding the most recent wave of developments in machine learning and “artificial intelligence” – look no further than media coverage of DeepMind, the Alphabet-owned artificial intelligence company most famous for its AlphaGo program, which beat world number 1 Go player, Ke Jie, in 2017. As reported elsewhere recently, a number of DeepMind’s patent applications were recently published. In this article, we’ll take a look at one of the products underlying some of those patent applications and how DeepMind is trying to protect it through these applications. We’ll finish up with a look at the difficulties patent applications for Al-based inventions are likely to face at the European Patent Office and some thoughts on how these might be overcome.
Two of those published patent applications (WO2018/048934 and WO2018/048945) relate to WaveNet, the technology behind Google Assistant’s surprisingly realistic-sounding voice. At its heart, a WaveNet uses a form of artificial neural network, a convolutional neural network, to generate audio data. Convolutional neural networks are particularly good at tasks that have a complex input and a relatively simple output, such as image recognition. A convolutional neural network can efficiently process large images and other big inputs by reducing the complexity of the input as part of its internal processing. Clusters of neurons process data from limited “receptive fields” of the input and pooling layers combine the outputs of several clusters into a single input for the next layer of the network.
However, in contrast to a conventional convolutional network, a WaveNet does not use pooling layers. Instead, a WaveNet uses dilated causal convolution – “dilated” in the sense that the outputs of some of the hidden layer nodes are left out of the calculation at each time step, “causal” in the sense that the output can only depend on values in the past. This enables the WaveNet’s receptive field to be particularly large, covering thousands of previously generated audio samples, which is important when generating 16,000 samples for every second of audio output.
In use, text is input to the WaveNet as a sequence of linguistic and phonetic features containing information about phonemes, syllables, and words, etc. in the text. The WaveNet processes this input, along with previously generated samples, and outputs a corresponding audio sample for the given time step. This step-by-step generation of the audio samples is computationally expensive but essential for generating complex, realistic-sounding audio.
These two WaveNet applications include claims for a WaveNet for generating audio, but also a more general application of a WaveNet for generating arbitrary data sequences. Looking at the ‘934 publication in particular, the claims define a neural network including many of the key features described above, such as:
- a convolutional subnetwork comprising one or more audio-processing convolutional neural network layers
- an input that includes the audio sample from the preceding time step
- an output that defines a score distribution over a plurality of possible audio samples for the given time step
Of all the requirements for a patent to be granted by the EPO, two in particular stand out when trying to patent this kind of computer-implemented invention: excluded subject matter and inventive step.
In Europe, programs for computers and mathematical methods as such are excluded from patentability. Mixed-type inventions, which include features that fall within these categories and other features that do not, such as a computer itself, are permitted; however, only the technical features – for example, those features that do not fall within one of the categories of excluded subject matter – are taken into consideration in the assessment of inventive step.
An algorithm such as a neural network – which is implemented as a computer program – is generally not considered to be a technical feature per se, but when the algorithm is applied to a technical problem it can take on technical character and consequently be taken into account in the assessment of inventive step.
Thus, the key question the WaveNet applications will face, should they come before the EPO, will be whether an algorithm for generating a data sequence has technical character. More likely than not, this will come down to the specific application of the algorithm and whether the algorithm solves a technical problem. The problem of generating realistic-sounding audio in a text to speech system seems likely to be considered to be technical; however, it remains to be seen whether the general application of a WaveNet to non-specific sequences of data can overcome the inventive step hurdle in Europe without being tied to a specific application.
Once these hurdles are overcome, these applications may well find success at the EPO. In the words of the inventors themselves: “The fact that directly generating time step per time step with deep neural networks works at all for 16kHz audio is really surprising, let alone that it outperforms state-of-the-art TTS systems. We are excited to see what we can do with them next”. So am I.