Panel 4b

mine

Data as property?

by Poorna Mysoor

Data is increasingly becoming the subject matter of a vast number of businesses, be it in the form of football fixtures, horse racing results or flight information. However, data as such is not protectable as intellectual property (outside the realm of trade secrets and confidential information). When a subject matter is outside the protection of intellectual property, it remains in the public domain. A key feature of the public domain is that it is free for everyone to use. It is not hard to see that there is value in leaving data in the public domain, as it allows others to draw from it.

Nevertheless, businesses creating data or depending on the existing data to produce value added products routinely employ contracts to control access and use of data by the end users, as part of their business model. Contractual terms are fashioned in such a way as to gain property-like protection over data. This raises on one hand, philosophical questions as to what it means to ‘own’ data, which looks at the issue from a propertisation point of view. There are on the other hand, more practical questions as to whether and to what extent a contract can regulate access to and use of data, which looks at the issue from a freedom of contract perspective.

It is arguable that the regime of intellectual property that comes closest to protecting data is database protection, given that most raw data is arranged in some order or form to make sense of it. Database Directive 96/9/EC lays down the criteria for the protection of databases, and presumably not data itself. However, the recent decision of the Court of Justice of European Union (CJEU) in Case C – 30/14 Ryanair Ltd v PR Aviation BV illustrates how there may be a risk of data itself being protected. In this decision, the CJEU held that the maker of a database not protected under copyright or the sui generis regime, can still protect it under the national laws, and can also contractually regulate its use, subject to national laws. If the content of such database is only data, and if the database is not protected under the Directive, what is left is merely a collection of data. If national laws are able to protect this, one wonders if it is tantamount to extending property-like protection to data itself.

How does propertisation of data take place? The maker of an unprotected database such as Ryanair employs a contractual term to deny the users the ability to copy the database, unless the maker grants a licence for such use. Essentially, the maker of the unprotected database excludes the users from using the database, unless the maker authorises it by way of a licence. The act of exclusion is how the contractual term enables the database maker to protect the database, and the ability to issue licence is how the contractual term enables the database maker to regulate the use of the database. The right to exclude and the right to permit use are both attributes of ownership of property (Penner, 1996). Excluding the unprotected database from use is exactly what its maker cannot do, because it is in the public domain, and everyone has the liberty to use it. Granting a licence to use the unprotected database is also something that its maker cannot do, because by definition a licence is that which makes lawful what is unlawful without its grant (Thomas v. Sorrell). There is nothing in the use of an unprotected database that its maker can make lawful by the grant of a licence, when the use has always been lawful. Although by law the maker of an unprotected database cannot exclude or licence, she tries to achieve these by way of a contract.

There are two issues here – first, whether data can be subject matter of property at all; and secondly, whether a contract can be used to create property rights as detailed above? Exclusive rights in any material in the public domain cannot be created, as it reduces what scholars regard as the raw material for innovation (Litman, 1990). A contract cannot be used to create property rights because it is not the domain of a contract to create property rights (Penner, 1996). If it was, then there would no sanctity to any of the property laws.

One view is that if the users waive their use privilege in public domain in exchange for a systematic display and access to the database, a contract can still be employed to create property rights. However, there are pitfalls in contractual regulation too (Elkin-Koren, 1997). Standard form browse-wrap contracts that characterise the way businesses in data function, underscore the unequal bargaining power and lack of freedom the end users have in negotiating contractual terms. Give what the users are being asked to waive, the concern is whether a standard form browse-wrap agreements allow room for them to understand, deliberate and choose what they want to give up and what they want in return. Website terms of use normally govern both protectable and unprotectable content. It begs the question whether by simply browsing a website, the users are able to discern these differences and provide the required assent and waiver.

Despite all these challenges, the reality is that a business in data involves significant investment. Given that database protection is only extended to investment in creating databases (by obtaining, verifying and presenting its contents), many other forms of investment are excluded. A return on these other forms of investment appears to drive the economic justification, supporting both the propertisation of data and the freedom of the parties to contract whatever terms they choose in relation to data. The normative challenge is to determine the appropriate extent of contractual regulation, so that a balance is maintained between the free use of the public domain, freedom of contract and the ability of data businesses to recoup their investments.

TEXT AND DATA MINING RESTRICTIONS – CREATING OWNERSHIP OF FACTS?

by Aislinn O’Connell*

A.    Introduction

Text and Data Mining (TDM), also referred to as text mining, data mining, or text analytics, is a process used to analyse large amounts of texts using a computer, to obtain insights which would not be possible with a single human researcher. Thus, TDM is a range of techniques, all aimed at increasing knowledge in a variety of research areas. The rise of TDM techniques is compounded by the rise of ‘Big Data’ – experts suggest that we have created more data in the years since 2010 than in the entirety of human history preceding that. With the advent of digital, consumers, researchers, sales people, and even computers create data about their actions. The development of smart systems which can extract new insights from that excess of data is one which could lead to swathes of new discoveries – it has already been used to find new uses for existing drugs, by mining the existing corpus of research for side effects and then using this for knowledge discovery.[1] TDM falls foul of copyright legislation in that it requires a copy to be made in order to mine content – this is permitted by a single person with a photocopier under pre-existing copyright legislation, but the large-scale copying of potentially thousands of articles by a machine was not covered by the exceptions enshrined in legislation.

TDM was considered as early as 2011, in the submissions to Professor Ian Hargreaves’ Digital Opportunity, which was prepared and submitted to government in only six weeks, and made a swathe of recommendations, including the implementation of a variety of exceptions to UK copyright law – including an exception for Text and Data Mining. Since the recommendation by Hargreaves, there has been much debate in the academic and professional worlds about the wisdom of implementing such an exception.

 B.     The Right to Read is the right to mine

With the development of TDM, initiatives were established to consider ways to deal with the issues arising from this new research method. Among these, in 2012, was the working group for TDM which formed a part of the Licences for Europe initiative.[2] However, several participants of the group walked out before the completion of the process, citing their reasons in a letter addressed to the Commissioners of the relevant DGs. These included a concern that the outcome of the working group was predetermined – that the solution to TDM was further licensing.[3] Thus, on the back of this and other TDM movements, the viewpoint that TDM is a normal exploitation of a copyrighted work arose – succinctly coined by Peter Markham-Rust in the phrase ‘the right to read is the right to mine’.[4]

This movement argues that lawful access to a copyrighted work is permission enough to mine for content – pointing out that content mining does the exact same as a human reader would do, by parsing relevant information out of the paper, but on a much grander scale – allowing for faster, more efficient, and more innovative way of drawing out links between information which might not be possible if a single human researcher had to digest and parse all of the information contained in that copyrighted material.

This argument relies on the use of content mining as a normal exploitation of a copyrighted work, the result of which does not compete with the original copyrighted work, and in which the need to copy full-text articles is merely an incidental part of the process, and not the primary aim of the research. It further argues that content mining extracts only the underlying facts of the creative content, and not the expression of the content. This, as any copyright scholar will know, is the essential distinction of copyright – facts themselves are not protected, only the expression thereof.[5] Thus, the only issue which faces a potential miner is the fact that most mining processes require a copy of a work to be made for analysis by a computer.

 C.    Mining as a normal exploitation of a work

On the other hand, then, is the argument put forth by publishers – that data mining is a normal exploitation of a copyrighted work, and thus should be included under the list of uses which may be permitted or denied under copyright law. This was the case in the UK until June of 2014. Given that data mining generally involves copying full-text, it is easy to see the justification for this position, and why it was taken by most publishers. That is not to say that publishers automatically denied permission to data mine – as of 2011, 90% of research-focused mining permission requests were granted, on a case-by-case basis.[6]

 Although individual requests to mine each particular content item were, in the majority of research cases, granted, such a requirement is arduous at best for content minors. For this reason, mining APIs (such as that of Elsevier) and large scale mining tools (such as CrossRef and PLSClear’s TDM engine) were developed, to enable miners to submit multiple requests for permissions with greater ease, rather than having to submit each individual request separately.

D.    The UK Governmental Position

In 2011, the Hargreaves review recommended that the UK government implement an exception to copyright for the purpose of data analysis. This was accepted by the government in late 2011, and came into effect in June of 2014, through The Copyright and Rights in Performances (Research, Education, Libraries and Archives) Regulations 2014.[7]

The Regulations state that the creation of copies of a work for the purpose of computational analysis does not violate copyright provided that the copy is made by someone with lawful access, for the purpose of non-commercial research.

 While the regulation is clear that it is not a violation of copyright to copy content for the purposes of mining, it does not offer any value judgements on whether or not the denial of mining to commercial customers is a violation of freedom of information, or otherwise denies access to the underlying facts of a particular copyright work. While the position of the UK government in creating an exception for non-commercial research can be interpreted as supporting the theory that ‘the right to read is the right to mine’, the fact that the exception does not apply to commercial research could also be support the attitude that mining is simply a normal use of content, which can be permitted or denied as the licensor wishes, and is permitted for non-commercial uses as the requirement to seek permission for every non-commercial mining use would be too onerous, leading to a market failure.

 E.     Conclusion

Regardless of how one interprets the reasoning behind the UK TDM exception, the reality is that unless publishers allow access to works in order to mine for content, that content will then not be as desirable for researchers. Thus, an exception, or lack thereof, could become meaningless, in as much as licence agreements will automatically include permission to mine, as it becomes an accepted and expected use of copyright content.

However, the distinction between commercial and non-commercial research remains important, as content mining and analysis requires large-scale downloading of content, which can create significant server loads for publishers. Thus, infrastructures must be put in place to allow for such large-scale downloads by researchers. The inability to obtain any financial benefit for mining (through, for example, standard licence clauses) may disincentivise publishers to create structures which would allow for such work, thus damaging the potential for all parties involved.

On the other hand, if mining is not allowed by a particular rightholder, or if a particular rightholder’s mining framework is unwieldy or difficult to use, then this will disincentivise authors and researchers from using these works or publishing with that company in the future. A combination of these two things would lead to publishers suffering through lack of content or use of content, and ultimately would do damage to the business model that the rightholder may have been trying to protect.

* BCL (NUI), Master I (Lyon), LLM (NUI). PhD Candidate in the Department of Information Studies, University College London.

 [1] Liber ‘Text and Data Mining: The Need for a Change in Europe) (November 2014) 1 <http://libereurope.eu/wp-content/uploads/2014/11/Liber-TDM-Factsheet-v2.pdf&gt; accessed 21 March 2015.

[2] “Licences for Europe –A Stakeholder Dialogue” Working Group 4: Text and Data Mining.

[3] Letter from participants in response to “Licences for Europe – A Stakeholder Dialogue” text and data mining for scientific research purposes workshop. http://libereurope.eu/blog/2013/02/26/licences-for-europe-a-stakeholder-dialogue-text-and-data-mining-for-scientific-research-purposes-working-group/

[4] https://blogs.ch.cam.ac.uk/pmr/2012/05/31/the-right-to-read-is-the-right-to-mine/

[5] http://www.lib.umich.edu/copyright/facts-and-data

 [6] Eefke Smit and Maurits van der Graaf, ‘Journal Article Mining, A research study into Practices, Policies, Plans … and Promises.’ (2011 BV Bronfonteyn).

 [7] SI No 1372 of 2014.

Data Mining-A Potential Breakthrough Technology Impeded by Copyright and Sui Generis Database Rights? If so, what solutions are there?

by Christian Geib

I. What is data mining?
Data Mining denotes an automatic or semi-automatic process of analysis of large quantities of data in order to discover pattern and rules (Fayyad, p. 28) The process of data mining allows researchers to extract explicit and implicit information from data.

II. Why is data mining prima facie infringing?
Data mining involves scanning data, often in the form of expressive content such as scientific journal articles, and placing it into repositories. During this process at least one copy is made. This, if not permitted by author or publisher, is prima facie infringing copyright. The threat of infringement could impede the adaption of this beneficial technology.

However, infringement would depend on the content being protectable.
Data mining uses both explicit and implicit data. Data is explicit when its information is directly stated in natural language such as scientific journal articles and their abstracts which are both copyright protected. Data is implicit where the information needs to be ‘uncovered’ by applying inference and connectionist rules. While explicit data can contain copyright-protectable expressive content, implicit data rather includes unprotectable facts. However, while implicit data as such is not protectable it can be contained in databases that themself could be protectable, if deemed sufficiently ‘original’.  Even those databases not arranged in an original manner can be protected in the EU and the UK by so-called sui generis database rights, as long as there has been ‘a substantial investment in obtaining, verifying or presenting the contents’. Therefore, mining journal articles and extracting content from scientific databases can be prima facie infringing.

III. Are present exceptions applicable to data mining?
If no permission could be obtained by the rightholders, data miners would need to rely on exceptions. In the UK the available copyright exceptions are the temporary copying exception of s. 28A CDPA 1988, the fair dealing exception of s. 29 CDPA 1988 and the new text and data analysis exception of s. 29A CDPA 1988.

Each of these exceptions has its limitations when being applied to data mining.
The temporary copying exception was originally intended to legally enable transient copying in the context of browsing and caching in random access memory (RAM) in order to make non-infringing internet use possible in first place. However, the copies usually made in the context of data mining mostly do not appear to be sufficiently ‘temporary’ in order for this exception to apply. Most copies made for data mining appear operations appear to be deposited into repositories for further analysis and do not appear to be automatically deleted as in the case of RAM copies (cp. the CJEU’s Infopaq and Meltwater decisions or the US Ticketmaster case).

The fair dealing exception involves the limitation to ‘non-commercial’ research. This is problematic even in a purely academic context. It is unclear whether copying for a textbook by a professor would still be commercial as this might involve receiving royalties. The same would apply for the research of a PhD student who is sponsored by a private enterprise. The scope of the new text and data analysis exception is limited by the same ‘non-commercial-ness’ requirement as the fair dealing exception. It is even further limited by the requirement of ‘lawful access’. This necessitates for instance that the data miner’s institution needs to have lawfully subscribed to all the journals data mined for a pharmaceutical study. Given that this often involves thousands of journal articles, it can be challenging even for well-funded universities to subscribe to every journal in one field, let alone various fields. Furthermore, this requirement can be problematic even in case that the university has subscribed to all the journals being data mined: some scientific journal publishers require extra permissions/licenses to data mine their journals. Even while the new exception s. 29A (5) tries to prevent such additional permissions, they could be easily be enforced by technological protection measures (TPM). According to Art. 6 (1) of the Information Society Directive (InfoSoc) it is illegal to circumvent such TPM even if it is to enable access that is lawful under national copyright exceptions.

Apart from the copyright exception there are also sui generis database right exceptions. Sui generis database rights in the EU protect databases that do not possess the required degree of originality to qualify for copyright protection. One conceivable exception for data mining is the exception of Reg. 19(2) Copyright and Rights in Databases Regulations 1997 that renders the extraction of only insubstantial parts of a database non-infringing. However, given that data mining tends to involve the copying of entire databases or entire journal articles, extracting only insubstantial parts appears to be of limited practical relevance for most data mining operations.

Therefore, the available exceptions appear to be too limited for most data mining use and it would therefore appear that most data mining operations under the current legal situation would face the threat of being infringing.

IV. Conclusions-possible solutions:

As stated by the Hargreaves Report, the UK should lobby at EU level for a commercial data mining exception, as the InfoSoc demands ‘non-commercial’ copying exceptions. This could address the vagueness of the present ‘non-commercial-ness’ requirement brings with it.

Furthermore, the UK should lobby at EU level that the issue of TPM should be reviewed in a way that it would no longer be possible for content owners to undermine existing and lawful national copyright exceptions.

In order to ‘future-proof’ copyright legislation for the emergence of new unforeseen technologies such as data mining, the introduction of a broad copyright exception such as the US Fair Use exception should be considered. Not just since the recent Google Books decision has Fair Use proven its great flexibility in accommodating new technologies by finding for instance ‘transformative use’. Furthermore, in response to some European skeptics, Beebe and Samuelson have demonstrated that Fair Use can be highly predictable if the case law is grouped into relevant policy clusters.