Text and data mining (TDM) has emerged as a powerful technique for extracting valuable insights from large datasets, particularly in fields such as research, healthcare, and marketing. However, as the capabilities of TDM continue to expand, it is essential to consider the legal frameworks that govern its application. In Poland, this involves a complex interplay of national legislation and European Union directives, particularly regarding intellectual property rights, data protection, and exceptions for research.
Intellectual property rights in Poland are primarily governed by the Act of 4 February 1994 on Copyright and Related Rights (Copyright Act). This legislation establishes the rights of authors and creators over their works, which can include text, databases, and other forms of content. Under Polish law, the use of copyrighted material without permission is generally prohibited, which can pose challenges for TDM practitioners. However, on September 20, 2024, an amendment to the Copyright Act implementing the Directive on Copyright and Related Rights in the Digital Single Market went into effect with a three-year delay. One area of change is permitted use in TDM.
Generally, permitted use, as defined in the Copyright Act, is a certain restriction on the monopoly of rights held by the creator of a work. This restriction allows for the use of the work without the permission of the right holder, justified either by public interest (permitted public use) or by the individual interests of users (permitted private use). Permitted private use allows an already distributed work to be used for free within the scope of one’s own personal use without the author’s permission. Permitted public use, on the other hand, encompasses various categories of restrictions, such as the right to quote, the citation of works in news programs, the use of works for teaching or scientific purposes, and the permitted use granted to libraries, archives, and schools. TDM, as introduced into Polish law, is a new form of permitted use.
TDM is the analysis of texts and data exclusively using an automated technique to analyze texts and data in digital form to generate specific information, including patterns, trends, and correlations. In Poland, permitted use in terms of TDM may be understood in two ways. The first applies to cultural heritage institutions and entities such as universities. The second enables all already distributed works (e.g., texts on news portals) to be reproduced for TDM purposes, regardless of the type of work or the authorized entity. There are, however, two restrictions in this regard. First, any entity that holds economic copyright to a work may express a reservation against TDM (opt-out). Second, reproduced works may not be kept forever but only for as long as necessary to achieve the TDM purpose (certainly until their analysis is complete).
Currently, there are no specific recommendations or industry standards for opt-outs. Polish law only requires that such disclaimers be explicit, appropriate (referring to how the work itself is made available), and in machine-readable format with metadata. The law does not provide further details regarding works made available to the public to allow anyone to access them at a time and place of their choice. All of these conditions must be met jointly.
Strictly speaking, if a manufacturer of commercial artificial intelligence systems wishes to extract data from the internet to build its knowledge base, it must follow specific rules. Otherwise, the desired data (for which an opt-out has been reserved) cannot be used. Similarly, the AI Act stipulates that any use of copyright-protected content requires authorization from the right holder unless relevant copyright exceptions and limitations apply. In summary, producers of generative AI must comply with the opt-out restrictions as imposed by the respective Member State.
TDM presents significant opportunities for innovation and research across various fields. However, navigating the legal landscape in Poland requires a careful understanding of intellectual property rights, database protections, and data protection regulations. While there are exceptions that can facilitate TDM for research and educational purposes, practitioners must remain vigilant in ensuring compliance with relevant laws.
As the landscape of TDM continues to evolve, ongoing dialogue among policymakers, legal experts, and practitioners is essential. This will ensure that the legal framework remains conducive to innovation while protecting the rights and interests of authors, database creators, and individuals whose data may be involved in mining activities. Ultimately, a balanced approach can promote the responsible use of TDM, fostering advancements that benefit society as a whole.
By Daria Rutecka, Partner, Schoenherr
This article was originally published in Issue 11.11 of the CEE Legal Matters Magazine. If you would like to receive a hard copy of the magazine, you can subscribe here.