SoK: Three Facets of Privacy Policies

ISSN 0249-6399 ISRN INRIA/RR--9287--FR+ENG

RESEARCH

REPORT

N° 9287

September 2020

Project-Teams Privatics

SoK: Three Facets of

Privacy Policies

Victor Morel, Raúl Pardo

arXiv:1908.06814v4 [cs.CY] 11 Sep 2020

RESEARCH CENTRE

GRENOBLE – RHÔNE-ALPES

Inovallée

655 avenue de l’Europe Montbonnot

38334 Saint Ismier Cedex

SoK: Three Facets of Privacy Policies

Victor Morel, Ra´ul Pardo

Project-Teams Privatics

Research Report n° 9287 — September 2020 — 34 pages

Abstract: Privacy policies are the main way to obtain information related to personal data

collection and processing. Originally, privacy policies were presented as textual documents. How-

ever, the unsuitability of this format for the needs of today’s society gave birth to other means

of expression. In this paper, we systematically study the diﬀerent means of expression of privacy

policies. In doing so, we have explored the three main categories, which we call facets, i.e., natural

language, graphical and machine-readable privacy policies. Each of these facets focuses on the

particular needs of the communities they come from, i.e., law experts, organizations and privacy

advocates, and academics, respectively. We then analyze the beneﬁts and limitations of each facet,

and explain why solutions based on a single facet do not cover the needs of other communities.

Finally, we set guidelines and discuss challenges of an approach to expressing privacy policies which

brings together the beneﬁts of each facet as an attempt to overcome their limitations.

Key-words: privacy policies, legal compliance, usability, enforcement

SoK : Trois facettes des politiques de protection de vie

priv´ee

R´esum´e : Les politiques de protection de vie priv´ee sont le principal moyen d’obtenir de

l’information li´ee `a la collecte et au traitement de donn´ees `a caract`ere personnel. Ces politiques

´etaient originellement pr´esent´ees comme des documents textuels. Cependant, la non-convenance

de ce format aux besoins de la soci´et´e actuelle a donn´e lieu `a d’autres moyens d’expression.

Dans ce rapport, nous ´etudions de mani`ere syst´ematique les diﬀ´erents moyens d’expression des

politiques de protection de vie priv´ee. Ce faisant, nous identiﬁons trois cat´egories principales que

nous nommons dimensions, i.e., les politiques en langage naturel, la repr´esentation graphique des

politiques, et les politiques lisibles par les machines. Chacune de ses dimensions se concentre sur

les besoins sp´eciﬁques de la communaut´e dont elle est issue, i.e., respectivement les juristes, les

organisations et les d´efenseurs de la vie priv´ee, et les universitaires. Nous analysons ensuite les

avantages et les limites de chaque dimension, et nous expliquons en quoi les solutions bas´ees sur

une seule dimension ne couvrent pas les besoins des autres communaut´es. Enﬁn, nous proposons

une nouvelle approche pour exprimer les politiques de protection de vie priv´ee qui r´eunit les

avantages de chaque dimension, dans le but de surmonter leurs limites.

Mots-cl´es : politiques de vie priv´ee, conformit´e l´egale, utilisabilit´e, mise en application

SoK: Three Facets of Privacy Policies 3

1 Introduction

As of today, the main way to obtain information related to personal data collection and pro-

cessing is through privacy policies. Privacy policies are typically presented as textual documents

describing details such as data collection, processing, disclosure and management. Organizations

collecting personal data (in what follows data controllers, or DC) commonly use privacy policies

to inform individuals (in what follows data subjects, or DS) about how personal data is handled.

DS are often required to read these policies — even though it rarely occurs [57] — or are at least

presumed to do so and to decide whether they accept the conditions. Alternatively, giving DS

the possibility of describing their own privacy policies has recently gained in popularity. This

approach gives DS the time to reﬂect on their choices, and the possibility to consult experts and

pairs. Nonetheless, privacy policies are hard to understand [21], for DS [68] as for experts [89].

We use DS policies to refer to the privacy policies of individuals, and DC policies to denote the

privacy policies of organizations collecting personal data.

Requirements and recommendations to express privacy policies come from diﬀerent sources

such as privacy regulations, authorities and organizations. For instance, in Europe, the General

Data Protection Regulation (GDPR) [39] requires more transparency for data processing from

DC, and guidelines have been issued by the WP29

to present their expectations [111]. These

requirements are necessary for privacy policies to be compliant with the legislation. Recom-

mendations for drafting policies have also been made by diﬀerent organizations to improve their

readability. For example, the National Telecommunications and Information Administration for

mobile apps [72], and the WP29 for IoT devices [110]. Furthermore, Data Protection Authorities

(DPAs) should be able to audit data processing systems, to ensure their compliance with the

law and with the declared privacy policies.

All these requirements and recommendations can

be summarized in three requirements:

• Privacy policies must be legally valid.

• Privacy policies must be understandable by all parties.

• Privacy policies must be enforceable and auditable in data processing systems.

Existing methods to express privacy policies address some of these requirements, but not all

of them. Diﬀerent methods have arisen from diﬀerent needs, and they target diﬀerent audiences

— from expert to lay-users. For instance, legal privacy policies are often written as long and

complex documents which are necessary in court, but that are not easy to understand for lay-

users. As an attempt to simplify these legal documents, organizations work on summarized

versions of privacy policies or use visual aids to help users understand the risks of having their

data collected. Ensuring that data is processed according to the requirements in privacy policies

is not an easy task either. Some work — coming mostly from academia — proposes an alternative

format for privacy policies that can be read by computers. These solutions aim at bridging the

gap between the textual legal requirements and their enforcement in the underlying system.

Furthermore, some of these proposals are equipped with auditing tools which facilitate, for

DPAs or DS, verifying that no violations of a privacy policy have occurred. Unfortunately, these

solutions are not widely used.

In this work, we analyze the state-of-the-art methods on expressing privacy policies. We

provide a comprehensive picture of existing proposals in order to identify gaps and challenges.

WP29 stands for Working Party 29, an European advisory board.

For instance, see the decision of the “Commission Nationale de l’Informatique et des Libert´es” (CNIL), the

French DPA, against Google LLC [25].

RR n° 9287

4 V. Morel & R. Pardo

In doing so, we have studied the three main ways to express privacy policies: natural lan-

guage, graphical and machine-readable; which we call the facets of privacy policies. Each of

these facets have mostly arisen from diﬀerent communities. Natural language comes from law

experts, graphical from organizations and privacy advocates, and machine-readable from aca-

demics. Consequently, the content of this paper contextualizes knowledge often studied within

diﬀerent communities that have been working on similar issues, but with diﬀerent objectives.

Unsurprisingly, each facet mainly provides beneﬁts to the speciﬁc community it was deﬁned in.

Therefore, we take the insights of our study to explore how the diﬀerent facets of privacy policies

can complement each other. This synergy includes the beneﬁts of each facet and minimize their

limitations. We discuss guidelines and challenges of combining all facets in a single policy, which

we denoted as multi-faceted privacy policies, and study recent eﬀorts in this direction.

More concretely, in bringing the above ideas to the forefront, our contributions are:

1. A categorization of existing work in each facet according to a privacy taxonomy (introduced

in Section 2), and the speciﬁc features of each facet.

2. An in-depth study of the existing facets of privacy policies: i) privacy policies expressed

in natural language (Section 3), ii) graphical privacy policies (Section 4), and iii) machine-

readable privacy policies (Section 5). For each facet, we provide an overview of: its content;

the available tools; its beneﬁts; and its limitations.

3. Insights from the study on future research on designing privacy policies (Section 6).

Section 7 discusses related work and concludes the paper.

2 Systematization Methodology

For each facet, we study the content of the privacy policies, the available tools, beneﬁts and

limitations. The literature of privacy policies across the diﬀerent facets is vast. An exhaustive

analysis of all works is unfeasible and undesirable. Instead, we focus on highlighting representa-

tive work; hence we do not provide an all-encompassing reference to every related work.

We categorize the content in each facet according to a taxonomy of privacy policies. Several

taxonomies have been proposed to categorize the content of privacy policies [96, 109, 77]. We

use a slight variation of [109]. This taxonomy is appropriate for our purposes for two main

reasons: 1) it was devised for a wide range of privacy policies and therefore reﬂects their content

across facets; and 2) it encompasses most requirements of the current legislations and guidelines,

such as the GDPR, the Fair Information Practice Principles (FIPPs) [40] in some cases, and the

California Consumer Privacy Act of 2018 (CCPA) [97].

Taxonomy

Wilson et al. [109] proposed a taxonomy for privacy policies composed of the following items:

“First Party collection: How and why a service provider collects user information”, “Third Party

collection: How user information may be shared with or collected by third parties”, “Access, Edit,

Delete: If and how users may access, edit, or delete their information”, “Data Retention: How

long user information is stored”, “Data Security: How user information is protected”, “Speciﬁc

Audiences: Practices that pertain only to a speciﬁc group of users (e.g., children, Europeans,

or California residents)”, “Do-Not-Track: If and how Do Not Track signals for online tracking

and advertising are honored”, “Policy Change: If and how users will be informed about changes

We denote item a piece of information provided in a privacy policy.

Inria

SoK: Three Facets of Privacy Policies 5

to the privacy policy”, “Other: Additional sub-labels for introductory or general text, contact

information, and practices not covered by the other categories”, and “Choice Control: Choices

and control options available to users”.

Our variation of Wilson et al.’s taxonomy accommodates it to the purposes of our study, but

does not change the content. Concretely: i) we use DS rights to denote Access, Edit, Delete and

Choice Control as they relate to DS rights in the sense of the GDPR (see Chapter III of the

GDPR); ii) subsume Speciﬁc Audiences and Do-Not-Track under Other as they are occasional

items; and iii) we observe that a legal requirement is missing in the taxonomy, even though it is

often found in natural language privacy policies: the legal basis of processing, we will therefore

add it to our taxonomy. These diﬀerences better accommodate recent regulations (for DS rights

and Legal basis) and practices (for Other). Table 1 summarizes the taxonomy. Section 3.1

provides a detailed explanation of each taxonomy item together with illustrative examples.

Taxonomy item Description GDPR FIPPs CCPA HIPAA

COPPA

First Party

collection

Type of data collected, purpose and collection

mode.

Third Party

collection

Type of data collected, purpose and collection

mode for third parties.

G# G#

Legal basis Ground on which is determined the lawfulness

of processing.

G# # # #

DS rights Rights of the DS, e.g., right to access, to rec-

tify, to port or erasure.

# G#

Data Reten-

tion

Duration of data storage # # # G#

Data Security Modalities of protection of data, e.g., en-

crypted communication and storage.

G# # G#

Policy Change Modalities of notiﬁcation for policy changes. G# # # # #

Other Other items such as identity of DC, informa-

tion related to Do-Not-Track, to children . . .

/ G# / G# / # G#

Table 1: Summary of our taxonomy with the legal requirements of items. We use to denote

Required explicitly; G# to denote Addressed but not required; and # to denote Absent. The

subscript

means that HIPAA only considers health data. The subscript

means that COPPA

only considers personal information from children, and notice must be addressed to parents.

3 Natural language privacy policies

Most legislations require notices expressed in natural language to inform DS about the collection

and processing of their personal data: the use of natural language is necessary to ensure that the

policy has legal value (e.g., [39, Art. 13 & 14]). The ways these documents can be authored —

i.e., drafted automatically or written manually — and the manners to assist their authoring can

vary greatly. We present the content expressed by natural language privacy policies in Section

3.1, the tools used to assist their authoring and to analyze existing natural language privacy

policies in Section 3.2, the beneﬁts in Section 3.3, and the limitations in Section 3.4.

3.1 Content

Natural language privacy policies are familiar to the public as they have been adopted by a

large range of online services such as social networks, ﬁle hosting services, mobile applications,

etc. Because these privacy policies are expressed in natural language, they are not restricted in

terms of content; they cover all the items of our taxonomy (see Section 2). In the following we

RR n° 9287

6 V. Morel & R. Pardo

examine the taxonomy items in detail, and discuss what legal requirements appear explicitly in

the GDPR, the FIPPs, or in the CCPA.

We focus on the requirements of the GDPR, the FIPPs, and the CCPA as they are the three

main texts (legislations or guidelines) that determine the content required when informing DS

of data collection and processing. The GDPR is the text regulating personal data collection

and processing in the EU, and many countries consider it since it has an extraterritorial scope.

The FIPPs are guidelines designed by the United States Federal Trade Commission’s (FTC).

They represent common principles regarding fair information practices. However, they have

been criticised by scholars [22], and have not been updated in decades. The CCPA is more

recent (2018), it is a privacy enhancing and consumer protection bill for Californians. It has

been deemed “the strongest privacy controls of any state in the U.S” [2]. We also consider other

widely-known regulations, such as health data for HIPAA [104, Notice and Other Individual

Rights], or children for COPPA [41,

A§ 312.4]).

First Party Collection The most common item in natural language privacy policies is the

ﬁrst party collection, which describes what data is collected, why it is collected, and sometimes

how. The type of data ranges from generic to more precise assertions, e.g., respectively we collect

your data and your email address is collected. Cookies [31] often have a distinct treatment, most

likely because they are often collected by websites: it is common to ﬁnd a dedicated paragraph

for their management. Location information is often treated in a separate section as well because

it can be collected from diﬀerent sources — mobile applications, web browsers — or inferred from

metadata — such as IP addresses. The purpose of processing often goes hand in hand with the

type of data collected. It is also possible to ﬁnd the collection mode in some natural language

privacy policies: whether the data is collected automatically, by manual input of DS, or by any

other means.

Informing about the type of data, the purpose of processing, the recipients and the means of

collection is required by the GDPR and the FIPPs. The CCPA gives the right to request ﬁrst

party collection, but does not require it. HIPAA requires to inform of “the ways in which the

covered entity may use and disclose protected health information”. COPPA requires to inform

of “what use, if any, the operator will make of the personal information collected”.

Third Party Collection Third Party Collection is a common item in natural language privacy

policies, and it is therefore usual to ﬁnd the third-parties to whom data will be transferred: they

can be advertisers, or other business partners. The notion of sharing can also refer to other DS

and subsidiary companies. It is usually composed of the same content as First Party Collection.

Informing about third party collection is required by both the GDPR, the FIPPs, and

COPPA. The CCPA gives the right to request categories of third parties, but does not require

it except if data is sold to those third parties. HIPPA considers it implicitly (see [104, Who is

Covered by the Privacy Rule]).

Legal basis Legal basis (or legal ground) is regularly found as a complement of the purpose

of processing.

A common legal basis for processing is consent, which consists, for DC, in

retrieving an authorization from DS to legally collect their data. Consent has to be informed

and speciﬁc under the GDPR [39, Recital 32], and it still is often used as a legal basis. DC might

consider the reading of their natural language privacy policies as a proper consent, without

questioning the conditions to obtain consent [43]. Other legal basis can be found in natural

Art. 13(1)(c) of the GDPR requires ”The purposes of the processing for which the personal data are intended

as well as the legal basis for the processing.” (Highlights from authors)

Inria

SoK: Three Facets of Privacy Policies 7

language privacy policies, such as the necessity for the performance of a contract, compliance

with legal obligations, protection of DS’s vital interests or public interest, and the legitimate

interests of a DC. These legal basis are listed in the GDPR [39, Art. 6], and major stakeholders

generally consider cumulatively either all of them, or a large subset.

Informing about the legal basis is required by the GDPR, and not explicitly by the FIPPs

which requires informing “whether the provision of the requested data is voluntary or required,

and the consequences of a refusal to provide the requested information”. Legal basis information

is not required by CCPA, HIPPA, and COPPA.

DS Rights DS can exercise rights regarding their data, and natural language privacy poli-

cies now often mention the rights to access, rectify, port and erase data, likely due to the inﬂuence

of the GDPR. DS rights can be seen more restrictively as possibilities to opt-in or opt-out. Thus

natural language privacy policies present how to subscribe or unsubscribe to speciﬁc services.

Informing about DS rights is required by the GDPR, HIPAA and COPPA, but not by the

FIPPs. In CCPA, only the right to opt-out is explicit (other rights are ensured but Californians

do not have to be explicitly informed of them in a privacy policy).

Data Retention Natural language privacy policies often describe the period during which

personal data will be stored. It can be a ﬁxed value — e.g., 30 days after data collection, or

variable — e.g. as long as your account is active. It often comes with the type of data, the

purpose, and the legal basis of processing.

Informing about the retention time is required by the GDPR. COPPA addresses it but do

not make it mandatory to inform about it. FIPPs, CCPA and HIPPA do not require it.

Data Security DC regularly explain in their policies how data is stored, if its communication

is secured or its storage encrypted.

Informing about the security of data is required by the FIPPs, but not by the GDPR although

it mentions that “Personal data should be processed in a manner that ensures appropriate se-

curity and conﬁdentiality of the personal data” [39, Recital 39]. Similarly, COPPA states that

“The operator must establish and maintain reasonable procedures to protect the conﬁdential-

ity, security, and integrity of personal information collected from children.” HIPPA requires to

inform of the “entity’s duties to protect privacy”. CCPA does not require to inform of this item.

Policy Changes The modalities of notiﬁcation in the case of a change in the privacy policy

can also be observed. Notiﬁcation is usually by email or within the service’s interface, in some

cases notiﬁcations are sent by regular mail or by phone. This item is not required by the GDPR,

the FIPPs, CCPA, HIPPA, or COPPA.

Other We subsume the identity and contact of DC, requirements towards speciﬁc audiences

such as children, and Do-Not-Track under this item. The DC usually provides its identity, as

well as its contact details if the DS has to lodge a complaint. In many legislations [39, Recitals

38 & 58] [41] children have speciﬁc considerations. As a result, many natural language privacy

policies include a dedicated section, even if it is only to mention that personal data of children

under thirteen is not collected without parental consent.

The identity of DC is required by the GDPR, the FIPPs, and HIPAA, but not Do-Not-Track.

As for requirements for children, they are speciﬁcally addressed by these legislations. CCPA

requires more speciﬁcally to “Make available to consumers two or more designated methods

for submitting requests for information”, and does not address data collected from children.

RR n° 9287

8 V. Morel & R. Pardo

COPPA requires to inform of the identity and contact of DC, and targets data collection related

to children.

3.2 Tools

Several solutions exist to assist in the authoring of natural language privacy policies, oﬀering

diﬀerent levels of automation. We denote these tools authoring tools. We distinguish templates,

generators, and retrievers. Authoring tools are often tailored to websites and mobile app owners,

and are constrained in terms of content. Furthermore, there exist analysis tools: software able

to parse and analyze natural language privacy policies, in order to produce a machine-readable

or a graphical version of a policy.

Templates and generators Tools such as Docracy [35], Termsfeed [102], SEQ Legal [100],

and 3DCart [3] provide a ﬁll-in-the-gap form, where the author writes appropriate terms in

the ﬁelds. We refer to them as templates. These tools guide the authoring process, but do

not provide additional functional such as validation of the input data. It is not possible, for

instance, to check if the email of a service owner is valid. Generators comprise more complex

tools where authors input data only once regarding relevant ﬁelds of a privacy policy, and the

tool automatically generate the privacy policy. Most generators do not allow incorrect data:

email addresses without @ are highlighted, and the author cannot go further in the process and

generate the policy. Generators also give the option of expressing the same policy according

to diﬀerent legal contexts. For instance, privacypolicies.com oﬀers clauses speciﬁc to the

GDPR or COPPA. A set of what we denote light generators have a restricted set of parameters,

while another composed of more detailed generators oﬀer a choice between a policy tailored

to websites or mobile applications, with a more exhaustive list of items [62]. Privacy Policy

Generator [82], Privacy Policy Online [32], or GetTerms [46] are examples of the former set, and

PrivacyPolicies.com [84] and FreePrivacyPolicy.com [44] of the latter.

Retrievers Retrievers, such as those oﬀered by Miao [70], Apolinarski et al. [10] and Yu

et al. [113], automatically extract relevant information from code of mobile application using

static code analysis or user behavior analysis. For instance, in [10] the authors analyze sharing

behavior when using online collaboration tools. These prototypes work on Android applications,

in which personal data management is structured around the concept of permissions [47]. These

permissions deﬁne the type of data accessible by an application, such as contacts, content of text

messages, or Wi-Fi management. A retriever analyses those permissions, and interprets them

according to well-deﬁned rules to author a natural language privacy policy. In that case, an

author does not necessarily have to input any information in addition to the code: the retriever

can parse the name of the DC, the permission requested by a service or the third-party libraries,

and can convert this information into natural language. However, retrievers are often tailored to

a speciﬁc solution — mobile applications in most cases — and could thus be diﬃcult to implement

in other ecosystems. Furthermore, they cannot automatically retrieve certain information, such

as the purpose of collection or the retention time.

Analysis tools Analysis tools have been developed for over a decade. They focused on using

Natural Language Processing or Information Extraction [26] to parse natural language privacy

policies. An early work has been conducted by Brodie et al. [19]. The Usable Privacy Project

lead by Sadeh [93] further investigated the automated classiﬁcation of privacy policies. Within

this project, Ammar et al. [8] conducted a pilot study for automatic text categorization. The

Usable Privacy Project also developed a website privacy policy corpus [109], which will later be

Inria

SoK: Three Facets of Privacy Policies 9

notably used by Polisis [50]. Zimmeck et al. [116] devised a hybrid solution combining machine

learning classiﬁers (association rules) with crowdsourcing. Analysis tools achieve an accuracy

averaging around 80%, which has not signiﬁcantly improved since the ﬁrst attempts.

The content produced by most authoring tools does not have legal value. Those tools do

not provide legal advice, but rather general guidelines for policy authoring. These guidelines

may be suﬃcient, but their legal validity is not guaranteed and should be veriﬁed by a lawyer.

As an example, Iubenda advertises for its “Attorney-level compliance” [54], but advocates for a

professional legal consultancy [55], and do not guarantee conformity with the law, which they

claim “only a lawyer can do”. In other words: DC are responsible for the compliance with the

law. They have to ensure that their privacy policies address all legal requirements, and to enforce

the claims made in their policies.

3.3 Beneﬁts

Legal value Natural language privacy policies are the only type of privacy policies with legal

value as it is the standard format for legal texts.

Most legislations require DC to provide a lawful

statement detailing the processing of personal data, and natural language privacy policies often

aim to fulﬁll this obligation. Lawyers rely on natural language to evaluate whether privacy

policies are correctly drafted: these policies contain all details to determine whether there has

been a violation. Also, lawyers use these policies to check compliance with data protection

regulations. However, a document holding legal value is not necessarily compliant with the law.

For instance, lawyers or DPAs may check that all items required by the legislation are provided

to DS, and auditors can check that data processing is performed according to the policy. Legal

compliance is twofold: with respect to the information requirements, and with respect to the

actual processing.

3.4 Limitations

Ambiguity Natural language privacy policies can be ambiguous [88], as they may be inter-

preted in diﬀerent ways. Reidenberg et al. [88, 87] presented privacy policies to privacy experts,

law and policy researchers, who were ultimately unable to agree on some aspects of the policies.

They proposed a crowd-sourcing annotation to tackle this issue, but admit that it would only

provide a partial solution. This ambiguity is mainly due to the fact that a statement in natural

language can be interpreted in diﬀerent ways. Ambiguity has a direct impact on the understand-

ing, the enforcement, and the auditability of privacy policies. Ambiguity is also an explanation

of the inaccuracy of analysis tools described in Section 3.2.

Understanding McDonald and Cranor [68] showed that it would take 200 hours a year for an

average US citizen to read all the natural language privacy policies of the online services she used.

This is clearly impractical, and thinking that DS read privacy policies before using a service is a

ﬁctio juris. All the more, nowadays, it seems highly inconvenient to spend a signiﬁcant amount

of time before using an online service.

Enforcement & auditability Because they are currently ambiguous, natural language pri-

vacy policies are diﬃcult to enforce: natural language lacks precise semantics, making it diﬃcult

to decide how data must be processed by the underlying system. Likewise, natural language

It does not however mean than other facets do not have legal value, rather that natural language privacy

policies are mandatory.

RR n° 9287

10 V. Morel & R. Pardo

privacy policies can also be hurdles to auditing: it can be diﬃcult for an independent authority

to compare stated and existing processing.

Summary

Natural language privacy policies are the most used medium to express privacy policies, and the

tools used to assist their production can be categorized into templates, generators, and retriev-

ers. These tools are often tailored to speciﬁc solutions, such as website or mobile applications,

therefore restricting their scope. Analysis tools are not accurate enough to be trusted blindly.

Natural language privacy policies are necessary for legal compliance, but suﬀer in practice from

ambiguity and understandability.

4 Graphical privacy policies

In the previous section, we analyzed natural language privacy policies. They are necessary for

legal compliance, but they can mislead DS when they attempt to read them, as they are often

diﬃcult to understand. As a consequence, other representations focused on DS understanding

have been devised: privacy policies can also be expressed with graphical (in the broad sense) rep-

resentations, that we denote graphical privacy policies. Graphical privacy policies cover icons sets

and standardized notices as well as solutions providing additional information, such as warnings

or judgments, sometimes combined with simple text [91]. Note that graphical privacy policies are

rarely speciﬁcations, rather additional representations of natural language privacy policies. For

instance, Android Permissions [47] represent privacy policies combining icons and text according

to XML speciﬁcations. Graphical privacy policies often come from privacy advocates, but this is

not only the case, notably since the WP29 explicitly mentioned icons as appropriate to convey

privacy notices in their guidelines for transparency [111]. Privacy notices are means to inform

DS, and are often understood as what we denote graphical privacy policies. Therefore, we include

them in our study. We categorize each work based on: i) the elements in the taxonomy presented

in Section 3 that it captures; ii) its features, whether it is made of icons, complementary text,

or variants of those; and iii) the intended audience of the language, e.g., DS or DC. Table 2 in

Section 4.4 summarizes our study.

4.1 Content

Based on their content, graphical privacy policies can be divided into three main types: icons,

standardized notices, and rating solutions. This distinction emerged from an empirical analysis

of the solutions studied, combined with impactful insights such as [28]. Graphical privacy poli-

cies based on icons intend to express the content of privacy policies, for DC and DS policies.

Some of these icons try to cover all the items of the taxonomy introduced in Section 3. Other

graphical privacy policies aim to express the same content as natural language privacy policies,

but in a standardized and often comparable manner. Some graphical privacy policies provide

rating information concerning certain aspects of privacy policies such as the transparency level

or potential risks. These solutions were not devised to meet the same requirements as natural

language privacy policies, the content of these graphical privacy policies in that respect is often

restricted. In the following, we describe the content of graphical privacy policies according to

the elements of the taxonomy and their type.

Inria

SoK: Three Facets of Privacy Policies 11

Sets of icons

The content of graphical privacy policies reviewed in this section lies in their icons, and sometimes

in the simple explanations that comes with them. As an example, the symbol @ can represent

collection of an email address, and a stylized calendar can represent retention time. But certain

items are harder to express graphically. For instance, describing the legal basis of processing

with the help of icons can easily be mistaken, and can mislead the intended audience instead of

simplifying the understanding.

Usually, sets of icons (see [1, 37]) do not suﬃciently express the items presented in the

taxonomy. Instead, they express speciﬁc items. A solution such as Privicons [63] focuses on

informing mail correspondents of how the data should be handled instead. Other solutions such

as [92] include icons for selling, and second-use of data, but the type of data cannot be speciﬁed.

Recently, Rossi and Palmirani [90] proposed a Data Protection icon set, named DaPIS. The

interesting features of their approach is that they based the icon set on an ontology named

PrOnto, and they tested their set in order to reﬁne it. Moreover, it stands out by emphasizing

items recently introduced in the legislations, such as DS rights or legal bases for processing.

However, it does not consider the type of data.

Many privacy advocates also contributed to this area and provided numerous sets of icons

(see [4, 86]). For instance, Mehldau [69] developed a set of 30 privacy icons describing the type of

data, third-parties, the purpose of processing and the retention time. Recently, a set of privacy

icons was designed by Privacy Tech [83]. This set considers many types of data, as well as

advanced representations of sharing, such as adequacy transfer (see Figure 1).

(a) UE transfer

adequacy

(b) Connection data

conservation

(d) Audience

measurement

Figure 1: Excerpt of the Privacy Tech icons

Another notable example of privacy icons are the android permissions [47], created by Google.

They present icons combined with simple natural language. For each application installed on a

mobile phone running Android, the permission manager presents a short graphical policy. Only

little information is presented (the type of data collected, and processing in recent versions, but

not the purpose for instance), and DS have to look into the natural language privacy policy in

order to ﬁnd more information.

Standardized notices

Another line of work considers the content described by the taxonomy as standardized notices.

These standardized notices are often represented in tables [60], but the key concept is the common

vocabulary among notices. Kelley et al. [60] represent policies in a table such as nutrition labels

observed on food packaging. They present the ﬁne-grained information in a table such as nutrition

labels observed on food packaging. Polisis by Harkous et al. [50] can represent the natural

language privacy policies as a combination of icons, highlights of the corresponding paragraphs

in the natural language privacy policies, and a ﬂow diagram (see Figure 2). Because Polisis relies

on supervised machine-learning, i.e., on a labeled corpus, it classiﬁes natural language privacy

RR n° 9287

12 V. Morel & R. Pardo

policies in a standardized way. Such summarization has been accomplished by PrivacyCheck as

well [114].

Figure 2: Example of a ﬂow diagram in Polisis

Emami-Naeini et al. [38] conduct a survey in order to rank the factors of IoT devices purchase.

They determine that security and privacy were among the most important factors of purchase,

and consequently developed an IoT privacy label to improve information visualization. Cranor

analyzes the impact of the development of standardized mechanisms of notice and choice in [28],

and more speciﬁcally the eﬀorts conducted around P3P. Cranor reconsiders the advances made

in standardization,

as well as the limitations, lack of adoption and enforcement.

Rating solutions

Certain graphical privacy policies do not focus on expressing the items in our taxonomy, but

present extra information related to privacy policies [108, 99, 49], often a judgment of the risk

level associated to a DC policy, or a comparison between DS and DS policies. We denote them

rating solutions.

Privacy Bird [29] is one of the ﬁrst rating solution. It consists of a colored bird, where the

color indicates the matching (green for a match between the DS policy and the website’s DC

policy, red for conﬂict, yellow for uncertain, gray when disabled) (see Figure 3).

(a) Matching policies

(b) Conﬂicting policies

(d) Add-on disabled

Figure 3: Privacy Bird

Note that icons are considered as part of standardization eﬀorts in [28].

Inria

SoK: Three Facets of Privacy Policies 13

It is represented as an add-on for Internet Explorer, restricted to Microsoft Windows. A

dedicated website [29] provides an explanatory tour as well as a feature named privacy ﬁnder :

Privacy Bird is then used as an indicator when browsing the web [20, 28]. Privacy ﬁnder displays

the search results of a search engine, combined with the analysis of Privacy Bird. The bird

was placed alongside search results, and was inﬂuential in the choice of shopping websites [103],

notably when the items being purchased were likely to have privacy concerns [36]. DS could rank

the results according to the matching between their DS policy and the websites’ DC policies.

In [75], Pardo & Le M´etayer present a web interface to inform DS about the potential risks of

their privacy policies. The interface is composed of a user-friendly form for DS to input their

privacy policies and a set of risk analysis questions, e.g., “Can company X collect my data?”.

Additionally, DS may introduce risk assumptions in order to specify possible misbehaviors that

the collecting parties can perform. In Section 5, we describe in detail the underlying privacy

language and the automatic risk analysis. The ToS;DR initiative helps DS understanding the

risks associated to a DC policy [101]. It started in 2011 during the Chaos Communication Camp.

ToS;DR comprises not only icons, but also results from crowdsource analyses in simple language.

The idea of the project is to assess the data practices of web services by giving them badges,

awarded by the project’s community. Once a service has enough badges to assess the level of

protection of their terms for users, a class is automatically assigned by pondering the average

scores.

4.2 Tools

Tools for representing graphical privacy policies are tailored to the web, and are often found as

add-ons for web browsers.

Privacy Bird is represented as an add-on for Internet Explorer, restricted to Microsoft Win-

dows. ToS;DR is also an add-on, for both Firefox and Chrome, as it ranks policies based on

crowdsourced analyses by a community directly within the web browser. A website has been built

to present Polisis [80], and add-ons for Chrome [78] and Firefox [79] are available (the add-ons

redirect to the corresponding part of the website). The add-on ”Disconnect Privacy Icons“ [34],

in collaboration with TRUSTe, which evolved from Raskin’s set of icons for Mozilla [86], provided

an interactive and comprehensive view of privacy policies within the browser. The add-on would

display icons according to a website privacy policy if the website complies with the solution.

All of these tools focus on the web, whereas the IoT is left unchallenged in that respect.

4.3 Beneﬁts

Graphical privacy policies cannot be seen as legal commitments because they lack precise mean-

ing, but they have other beneﬁts: they can foster understanding.

Designed for lay-user understandability Many solutions coming from privacy advocates

(e.g., [69, 86, 4, 85]) aim to provide intelligible information to lay-users. These solutions were

built with the will to popularize natural language privacy policies, and were designed to be un-

derstood quickly and take simplicity account. It is also the case for academic solutions such as

the privacy labels [60], which “allows participants to ﬁnd information more quickly and accu-

rately”. Based on the principle that existing natural language privacy policies do not convey

intelligible information about data collection and processing, Kelley et al. strove to provide a

universal solution. Graphical privacy policies can also convey intelligible notices for scientists

and physicians using sensitive datasets, such as the DataTags [99, 15]. Their solution includes

RR n° 9287

14 V. Morel & R. Pardo

a simpliﬁed interface for access requirement to medical data, as this type of data is mostly

restricted to medical practitioners and researchers.

Attempts were made to analyze what icons were recognizable and to measure their reliability.

Egelman et al. [37] crowdsourced privacy indicators for the IoT. In their study, they found

out that some icons are well-recognized (e.g., the camera symbol was recognized by more than

95% of participants as representing video recording), while others are not (only 3.6% recognized

the voice command & control icon). Kelley et al. [61] conducted a user study, to reﬁne their

privacy label. They compared the accuracy of information retrieval between their proposition

and natural language privacy policies in natural language. As a result, they purposely combined

simple natural language to prevent confusion, notably for the terms opt-in and opt-out.

promising attempt to measure understandability has been conducted in [90]. They performed

three evaluations of their icon set in order to improve the recognition of icons. However, they

regret the lack of diversity in the participants’ panel, notably for the educational level.

4.4 Limitations

Ambiguity Though accessible to and often designed for lay-users, graphical privacy policies

may be interpreted in diﬀerent ways, thus leading to ambiguities (see Section 4.1). The same

icon can be interpreted in diﬀerent ways according to the diﬀerences in culture, education level,

or context etc. For instance, a euro symbol ¤ can represent the commercial use of collected

data, or that DS will be paid for having her data collected. Little has been done to produce a

reasonably recognized set of icons for privacy — e.g. validated by a user study — despite the

attempts of [37] and [1, Chapter 15] to see what were the most recognizable icons, and of [61] to

provide a graphical policy where results could be found accurately. The three stages evaluation

of Rossi and Palmirani [90] however leads the path to less ambiguous graphical privacy policies.

Incompleteness Graphical privacy policies are limited by their restricted scope. As seen in

Section 4.1, existing graphical privacy policies are not as expressive as natural language privacy

policies, due to the limited number of icons available. Some aspects are rarely mentioned, others

only in complementary text and not in the graphical part of the policy. One aspect in particular

is never mentioned in graphical privacy policies (policy change), and another is considered in

only one work (legal basis).

Claim over legal compliance Some graphical privacy policies, such as cookie consent notices,

have been used to claim legal compliance to retrieve consent. Degeling et al. [31] observed that

a signiﬁcant part (16%) of websites added cookie consent notices after the GDPR, but these

notices do not always comply with transparency requirements according to Utz et al. [106] as

they tend to use Dark Patterns [48] to lure DS into giving their consent.

Summary

Graphical privacy policies are promising for conveying summarized versions of natural language

privacy policies, and they can rely on user-friendly tools to be adopted. However, they should

come with explanations to ensure human understanding and mitigate their restricted content.

See Table 2 for a visual and global overview of our study on graphical policies.

Note that they also test the speed of retrieval, as well as comparisons between DC policies in addition to

information retrieval.

Inria

SoK: Three Facets of Privacy Policies 15

Icons

Simple text

Rating

Standardized notice

party

DS rights

Data retention

Data security

Legal basis

Privacy Bird [29] X X X X — — — — — —

Rundle [92] X X X G# # # G# #

Mehldau [69] X X G# # #

Privacy Commons [4] X X G# G# # # # #

Privacy Nutrition Label [60] X X X G# # # # #

Primelife [1, Chapter 15] X X — — — — — #

Raskin [86] X X G# G# # # #

Privicons [63] X X # G# # # # #

Privacy wheel [108] X X X # G# # # G# #

Android permissions [47] X X X G# # # # # #

“Is this thing on?”[37] X X G# # # # # #

Datatags [15] X X X # # # # #

Hagan [49] X X X G# G# G# G# # #

Polisis [50] X X X X

Privacy Tech [83] X X G# # G# #

DaPIS [90] X X G# G# # #

IoT label [38] X X X # #

ToS DR [101] X X X

Features Type Content

Table 2: Categorization of graphical privacy policies. We use the subscript

to denote that

a solution extensively uses natural language besides graphical representations. The column

indicates: Features, whether the solution is made of icons or provides ratings about policies, and

whether it provides explanation in simple natural language; Type (Type of policy), whether the

solution expresses a DC or a DS policy; Content, whether the solution can express the diﬀerent

items enumerated in Section 3. We use to denote that the solution can express most or all

values; G# to denote that the solution expresses few values of the items, and is mostly insuﬃcient;

# to denote that the solution cannot express the item; and “—” to denote that the material does

not permit judging whether the solution can express this item or not. Note that some items of

the taxonomy are omitted since no solution includes them.

5 Machine-readable privacy policies

Many eﬀorts have been devoted to the expression of machine-readable privacy policies — i.e.,

privacy policies that can be automatically processed by computers. Most of these eﬀorts were

made by academics, and result in what has been called privacy languages. According to Kasem

et al. [59], a privacy language is “a set of syntax and semantics that is used to express policies”.

Many privacy languages have been proposed in the past twenty years [59, 107]. We review here the

diﬀerent ways in which privacy languages are used to express machine-readable privacy policies.

In particular, we categorize each work based on: i) the elements in the taxonomy presented in

Section 3 that it captures; ii) the type of enforcement mechanism it uses and whether it has been

implemented; iii) additional tools for policy analysis or comparison; iv) the intended audience

RR n° 9287

16 V. Morel & R. Pardo

of the language, DS or DC; and v) whether it is intended to be used by lay-users. Table 3

summarizes our study.

5.1 Content

Here we describe the content that machine-readable privacy policies include. This content is

determined by the syntax of the privacy language. Many languages are deﬁned using machine-

readable formats such as XML so that they can be automatically processed by machines. Other

languages, however, are based on mathematical deﬁnitions (e.g., logical languages), thus enabling

the possibility of reasoning about them — these languages can easily be expressed in machine-

readable formats due to the lack of ambiguity. Another important factor is the target audience

of a language, i.e., DC, DS or both. In what follows, we describe the content of machine-readable

languages (according to the items deﬁned in Section 3), the format used to express the policies

and their target audience.

Access control languages such as XACML [9] and RBAC [94] have been among the ﬁrst

languages used for the speciﬁcation of machine-readable privacy policies. Typically, these policies

include the datatype to which they apply, and the set of entities with access privileges. Some

extensions such as GeoXACML [66] include conditions depending on geolocation information.

Usage control (UCON) [76, 81] extends access control with the notion of obligations, i.e., actions

to be executed after data has been received — e.g., “do not transfer data item i to Service

X” or “remove data on 25/05/2018”. These obligations make it possible to express items such

as retention time, purpose and allowed data transfers. The Obligation Speciﬁcation Language

(OSL) [51] is a fully-ﬂedged UCON language. OSL leverages Digital Right Management systems

(DRMs) [42] to enforce the obligations in UCON policies. Both access control and UCON are

used by DC to deﬁne their policies, and do not oﬀer mechanisms for DS to express their policies.

Several languages focused on expressing privacy policies have been developed. One of the

ﬁrst is the “Platform for Privacy Preferences” (P3P) [27]. P3P appeared as a policy language

for the web. P3P policies are speciﬁed in XML format, and include: purpose, retention time,

and conditions. Conditions may be opt-in and/or opt-out choices for DS, or preferences based

on enterprise data — e.g., DS’s credit or service usage. Many extensions to P3P have been

proposed [64, 11, 7] — for instance, E-P3P [11] extends P3P with obligations `a la UCON. After

P3P, new languages with similar syntax have been proposed: “Enterprise Policy Authorization

Language” (EPAL) [12], “An Accountability Policy Language” (A-PPL) [13], “Customer Proﬁle

Exchange” (CPExchange) [18], “Privacy Rights Markup Language” (PRML) [115], “Purpose-to-

Use” (P2U) [56] and “Layered Privacy Language” (LPL) [45]. These languages are similar than

P3P in terms of content, but bring numerous enhancements in terms of usability and enforcement

(see Section 5.2).

Formal privacy languages (formal languages in the following) comprise a diﬀerent approach

to express privacy policies. CI [73], PrivacyAPIs [67], SIMPL [65], PrivacyLFP [33], S4P [17],

QPDL [107], and PILOT [75] are languages that have their syntax and semantics deﬁned math-

ematically. More precisely, they use formal languages such as Linear Temporal Logic [53], First-

Order Logic [53] or Authorization Logic [5]. However, not all of these formal languages have the

same focus. S4P, SIMPL and PILOT are focused on expressing DS and DC policies. Similarly

to the languages above, it is possible to express types of data, conditions, purpose, retention

time and allowed data transfers. Conditions are often more sophisticated than that of the pre-

vious languages as they are based on logical languages. On the other hand, CI, PrivacyAPIs

and PrivacyLFP focus on encoding privacy regulations such as HIPAA [104], COPPA [41] or

GLBA [105]. As a consequence, their expressive power is greater than languages focusing on

DS and DC policies. They include temporal operators that make it possible to express policies

Inria

SoK: Three Facets of Privacy Policies 17

about past and future events. For example, Barth et al. [16] express the following statement from

COPPA “[...] an infant can only send identiﬁable information to a website, if her parent have

previously sent their consent for data collection”. QPDL is a meta-language to reason about

privacy languages. While privacy policies can be expressed in QPDL, it is not its intended use

— it was designed to formally reason about policy languages. Jeeves [112] is a programming

language with built-in support for a limited form of privacy policies. It allows programmers to

specify conﬁdentiality conditions based on the execution context. For instance, in a double-blind

conference management system, only organizers can see paper authors until the review process

is completed.

5.2 Tools

In this section, we describe the mechanisms used to enforce machine-readable privacy policies,

and existing tools to compare and perform analyses on policies for instance.

Formal Semantics Formal languages give meaning to their privacy policies by means of formal

semantics. Typically, these semantics deﬁne what events may be executed depending on the

privacy policies selected by the actors interacting in the system. SIMPL, PILOT, S4P and CI

use trace semantics, i.e., they deﬁne the sequences of events (traces) that respect the privacy

policies of DS and DC. Jeeves has its semantics formalized using lambda calculus [23]. Rei

semantics is deﬁned in Prolog [24]. Though precise and unambiguous, there is a gap between the

deﬁnition of formal semantics and the real implementation. Nevertheless, this gap may be very

small, e.g., Jeeves lambda calculus semantics were implemented as a Scala library, Rei’s semantics

are encoded in Prolog, and PILOT semantics are implemented as a Promela model [52].

Informal Semantics Access control, UCON and privacy dedicated languages have their en-

forcement mechanisms speciﬁed as W3C speciﬁcations, speciﬁcation languages such as UML, or

they are simply implemented using a general purpose programming language. All these languages

use request evaluation engines to enforce privacy policies. Request evaluation engines take a data

request and evaluate whether the requester may access the data based on the privacy policies.

The content of data requests depends on the language. For instance, in RBAC, data requests

contain type of data and the role of the requester. If the role of the requester matches one of the

roles allowed by the policy associated with the data, then data can be accessed. Most languages

do not have mechanisms to enforce that data will be used according to the policies after being

collected — e.g., checking that data is deleted before the retention time — but there are some

exceptions. LPL erases automatically data from the central repository after the retention time

has elapsed. UCON-based languages use DRM to enforce obligations.

Policy comparison For some languages, algorithms have been devised to automatically com-

pare policies. The goal is to determine, given two policies, which one is more restrictive. For

example, a policy that allows data processing for research purposes during 7 days is more restric-

tive than a policy that allows data processing for advertisement and research during 90 days.

Policy comparison is necessary to mechanize consent management. If a DC policy is more restric-

tive than a DS policy, then DS privacy preferences are satisﬁed. EPAL, P3P and PILOT include

algorithms and tools to compare policies. CI, SIMPL and S4P follow a diﬀerent approach. They

deﬁne how restrictive a policy is, based on its semantics.

Languages that do not distinguish

Using trace semantics it is possible to compare policies based on the set of traces satisfying the policy. The

less traces a policy satisﬁes, the more restrictive it is.

RR n° 9287

18 V. Morel & R. Pardo

DS and DC policies — such as RBAC, A-PPL, or OSL — do not have algorithms to compare

policies. This is not surprising, their goal is to enforce a policy typically deﬁned by DC or system

administrators.

Analysis tools Formal languages often come with tools to perform diﬀerent types of automatic

analyses. PILOT uses model-checking [14] to perform risk analysis. Given a DS policy and a set

of risk assumptions, such as “Company X may transfer data to Company Y”, it is possible to

automatically answer questions such as “Can Company Z use my data for advertisement?”. Rei

comes with a Prolog interface where queries such as the above can be asked. PrivacyAPIs also

uses model-checking to automatically verify properties about the privacy regulation HIPAA. It

can, for instance, be used to determine who can access patients medical ﬁles depending on their

content or role.

5.3 Beneﬁts

Machine-readable privacy policies have four main beneﬁts: 1) they can be automatically enforced;

2) they can be audited; 3) it is possible to reason about their correctness; and 4) they make it

possible to automate certain procedures. In what follows we explain each of these beneﬁts in

detail.

Enforcement As opposed to natural language or graphical policies, machine-readable policies

can be automatically enforced. As described in Section 5.2, all policy languages have the means to

guarantee that data is accessed according to the policies. Languages based on request evaluation

are easier to implement (cf Section 5.2), and are consequently more widespread. Typically, every

party holding personal data must implement a part of the request evaluation engine. In general,

the implementation of formal languages is more complicated. They require tracking actions

applied to the data, or inferring what are the purposes for which data is used — as opposed to

simply control access to data.

Auditability Machine-readable privacy policies make it possible to audit whether data is being

handled according to the privacy policies. This functionality is of great value for DPAs. Auditing

mechanisms are typically implemented as logs recording the operations performed on sensitive

data. For instance, EPAL requires to create an audit trail of access to keep track of whom

has accessed personal data. In A-PPL, on the other hand, it is possible to specify auditable

operations such as read or delete, and the enforcement records in a log every time that such

operations occur.

Correctness The lack of ambiguity in policy languages makes it possible to precisely reason

about their correctness, i.e., that data is handled as stated in the privacy policies. This is

specially true for formal languages. Their semantics can be used to formally prove certain

correctness properties. For example, S4P, SIMPL and PILOT have been used to prove global

properties such as “data is never used after its retention time”, or, “data is always used according

to DS policies”. Moreover, languages focused on modeling privacy regulation — CI, PrivacyAPIs

and PrivacyLFP — can be used to ﬁnd inconsistencies in the regulation (if any). For example,

it was possible using PrivacyAPIs to ﬁnd unexpected ambiguities in HIPPAA.

It is important

to remark that there exists a gap between the formal semantics and its implementation —

technical details not modeled in the semantics may lead to unforeseen violation of the properties.

These ambiguities were also found by commenters four years after it was enacted [67].

Inria

SoK: Three Facets of Privacy Policies 19

Therefore, formal languages should include auditing mechanisms, as the languages mentioned in

the previous paragraph.

Automation Machine-readable privacy policies allow for automating procedures such as pol-

icy communication and consent management. Automatic policy communication facilitates trans-

parency by boosting DS awareness about how their data is being handled — notably in ubiquitous

systems where passive data collection is the norm [30]. Automatic consent management can em-

power DS if managed in a protective way — e.g., by mitigating the burden of choice [98] — and

facilitate the retrieving of an informed consent for DC. Cunche et al. [71] devise a generic frame-

work to manage informed consent in the IoT, using DS and DC policies based on PILOT [75].

Automatic communication of privacy policies also makes possible a negotiation of privacy choices:

DC and DS can interact more quickly by means of machines.

5.4 Limitations

The main limitations of machine-readable privacy policies are their lack of adoption and usability.

As adoption relies among other things on human-understandability, understandable and usable

policies seems to be a condition sine qua non for their adoption.

Human understandability One of the most recurring criticism of machine-readable privacy

policies is their lack of human understandability. Only a handful of languages such as SIMPL,

LPL or PILOT take it into account: they include a natural language version of each policy.

It is however questionable whether they can actually be understood. To put things into per-

spective, the OECD [74] conducted a study showing that two third of adults from developed

countries cannot conduct a medium-diﬃculty task related to ICT environments. Although pri-

vacy management was not mentioned in the study, it is a medium-diﬃculty task, and solutions

tackling privacy management must consider information-illiteracy. Machine-readable privacy

policies should be expressed in languages close to natural language in order to be understood, or

be complemented by friendly interfaces.

Lack of adoption Another pitfall for machine-readable privacy policies is their lack of adop-

tion. It is arguably a consequence of poor human understandability. Most of the work done on

privacy languages had few or no impact, apart from P3P. With the other solutions stemming

from it (APPEL, E-P3P, . . . ) and the extension Privacy Bird for Internet Explorer, P3P ob-

tained recognition out of the academic scope. It has been an oﬃcial set of speciﬁcations of the

W3C supported by the web browser Internet Explorer. Note that other languages were published

as speciﬁcations by companies [18, 9] and can therefore be considered as having had some recog-

nition. On the other hand, most formal languages lack a practical and scalable implementation,

making it diﬃcult to use them in practice. Usability, implementation and widespread recognition

are a rare combination in privacy languages.

Summary

Machine-readable privacy policies can provide means to express unambiguous privacy policies,

and can be enforced as well as audited by authorities. However, they are often unintelligible

for lay-users, which results in a lack of adoption. We provide a visual and global overview of

machine-readable policies in Table 3.

RR n° 9287

20 V. Morel & R. Pardo

Usability

Syntax

Enforcement

Implemented

Tools

Time

Space

party

DS rights

Data security

Data Retention

P3P [27] XML Informal 3 Comparison 3 3 G# # # # # G#

CPExchange [18] XML Informal 3 # # # # G#

PRML [115] 3 XML Informal 3 # # G# # G# G#

APPEL [64] 3 XML Informal 3 3 3 G# # # # # G#

E-P3P [11] XML Formal 3 3 G# # G# G# # G# G#

Rei [58] Formal Formal Analysis 3 G# G# G# G# # # G#

Xpref [7] 3 XML Informal 3 3 G# # # # # G#

XACML [9] XML Informal 3 3 # # G# # # # #

EPAL [12] XML Informal 3 Comparison 3 # # G# # # G#

CI [16] Formal Formal 3 3 G# G# G# G# # #

SIMPL [65] 3 Formal Formal 3 3 G# # G#

S4P [17] 3 Formal Formal 3 3 G# # # # G#

Jeeves [112] Formal Formal 3 3 G# # G# G# # G# #

P2U [56] 3 XML Informal 3 # # # #

QPDL [107] Formal Formal 3 3 G# G# #

RBAC [94] XML Informal 3 3 # # G# # # # #

OSL [51] Formal Formal 3 3 G# # #

GeoXACML [66] XML Informal 3 3 # G# G# # # #

A-PPL [13] XML Informal 3 # G# G# G# G#

LPL [45] 3 XML Informal 3 3 3 # # # G#

PrivacyAPIs [67] Formal Formal Analysis 3 3 G# #

PrivacyLFP [33] Formal Formal 3 3 G# #

PILOT [75] 3 Formal Formal Analysis 3 3 G# #

Features Audience Conditions Content

Table 3: Categorization of privacy languages. The columns indicate: Usability, whether the

language is intended to be understood by DS; Syntax, whether the syntax of the language deﬁned

in XML or a formal language; Enforcement, whether the language has a formally or informally

deﬁned enforcement; Implemented, whether the language has been implemented; Tools, the type

of available tools for the language; Audience, whether the language can describe a DS or a DC

policy; Conditions, whether the language supports conditional rules describing when and/or

where data may be collected; Content, whether the language can express the items described

in Section 3.1. We use to denote that the item is explicitly included in the language; G# to

denote that the item is partially supported; and # the item is not present in the language and

cannot be encoded. Some items of the taxonomy are omitted since no solution includes them.

6 Insights

In this section, we provide several insights that we identiﬁed as a result of our study.

We show

that each facet is tailored to a speciﬁc audience, and that this is both 1) what makes it beneﬁcial,

but also 2) an obstacle to the compliance with all the requirements stated in Section 1 (i.e., legal

validity, understandability by all parties, and enforceability through auditable mechanisms). In

Section 6.1, we discuss why a single facet cannot comply with every requirement. In Section 6.2,

we put in perspective the works which attempt to overcome the limitations of mono-faceted

solutions, and provide guidelines for designing privacy policies that aim to cover the three facets.

Finally, in Section 6.3, we discuss the coverage of the items in our taxonomy by privacy policies

We refer the reader to Table 4 in Appendix A for an overview of the juxtaposition of the results combining

all facets.

Inria

SoK: Three Facets of Privacy Policies 21

in each facet.

6.1 Limitations of mono-faceted solutions

A single facet cannot cover all the requirements of privacy policies. This is due to the tension

between: i) the suitability for lawyers, ii) the suitability for lay-users, and iii) the automatic

enforcement by machines. Concretely, there are details that only have meaning in one facet

and are irrelevant in others. Details speciﬁc to natural language privacy policies are used by

lawyers to check that the policy complies with privacy protection regulations — such as the

GDPR. In general, lay-users may not have the knowledge to fully understand these details, which

makes it less accessible for them. Likewise, natural language privacy policies do not include low

level details related to the enforcement of the policies by a machine — those details are often

unnecessary for law enforcement and require lawyers to be familiar with those technicalities.

Graphical privacy policies have the objective of being understood by a general audience, but this

form of privacy policies have no use for lawyers or enforcement by machines. Machine-readable

privacy policies aim at being enforced by machines. They are written in a machine-readable

format, and they include all the necessary details for the underlying system to enforce them.

These details make them diﬃcult to understand by humans, and are, consequently, unsuitable

for lawyers and lay-users.

Illustrative example Consider the icon from Privacy Tech Icons (see Section 4) that

denotes that data is deleted after 1 year, and this excerpt of Facebook’s privacy policy: “[...]

when you search for something on Facebook, you can access and delete that query from within

your search history at any time, but the log of that search is deleted after six months.” Facebook’s

policy is more precise than the icon: it refers to concrete data which is produced after a certain

user action. However, these details may not be of prime interest for some lay-users, at least in a

ﬁrst stage. For instance, they may not know what a log entry or a query are. Hence, this level

of detail may be counter-productive for lay-users. Yet, this information is required to determine

whether Facebook is processing data according to the policy. Thus, it cannot be omitted for

legal purposes or for users who may be interested in more detailed information. Consider also

an excerpt of the policy in APPEL-P3P: <retention-time days=182 xmlns=".../P3P/retention-

time/"/>. This policy includes details that are not present in the natural language policy above:

The format of the policy, XML; the parameter xmlns, required so that the computer can retrieve

the set of possible values for the element in the policy (the XML namespace); and the fact that

retention time must be speciﬁed in days. These details are often irrelevant for law enforcement,

and make the policy diﬃcult to understand for lawyers and lay-users.

6.2 Multi-faceted privacy policies

In many cases, limitations in one facet can be addressed by other facets. For instance, nat-

ural language privacy policies may use graphical privacy policies to enhance readability, and

machine-readable privacy policies to be automatically enforced. In this section, we study multi-

faceted privacy policies, i.e., policies that combine any of the three facets studied in this paper:

natural language, graphical, and machine-readable. We present existing works on multi-faceted

privacy policies, discuss challenges in deﬁning them, and provide guidelines on addressing those

challenges.

RR n° 9287

22 V. Morel & R. Pardo

Natural Language

Machine

Readable

Graphical

None

[108, 6, 50]

[65, 45, 75]

[60, 90]

Figure 4: Works on multi-faceted privacy policies grouped by combination of facets.

Existing works on multi-faceted privacy policies.

Several initiatives are already proposing multi-faceted solutions. Harkous et al. [50] combine

natural language and graphical privacy policies, bringing together the accessibility of icons and

simple text with the legal value of a natural language privacy policy. Similarly, [108, 6] follow a

layered approach to combining graphical and natural language policies. Users are ﬁrst presented

with a graphical or simpliﬁed version of the policy, which can be reﬁned in several steps all the

way to the legal notice. SIMPL [65], PILOT [75] and LPL [45] combine natural language and

machine-readable privacy policies. They provide an enforceable policy that helps DS to better

understand their choices and provide informed consent — as required by the GDPR. Similarly,

[60, 90] add graphical representations for P3P policies, resulting in intelligible and enforceable

privacy policies.

Figure 4 summarizes the landscape of work on multi-faceted policies. Note that existing

solutions combine at most two types of privacy policies, and their number is small. Notably,

no existing solution encompasses the requirements for legal compliance, understandability, and

enforceability.

Guidelines and challenges

Here we set guidelines and discuss challenges in designing multi-faceted privacy policies, ideally

covering all three facets. We describe two approaches: uniﬁed and compound.

Uniﬁed In a uniﬁed approach, a core facet is deﬁned and the remaining facets are generated

from the core using a policy generator. Natural language is a suitable candidate as core facet

— it is a legal requirement and cannot be omitted. Machine-readable privacy policies could also

be considered as core facet as natural language could automatically be generated from them.

Graphical representations are not suitable for this purpose; they lack the precision required

to generate natural language or machine-readable privacy policies. The main challenge of this

approach is ensuring the soundness of the generated policies. Existing solutions compromise the

expressiveness of a facet to generate policies in other facets. For example, SIMPL uses constrained

natural language so that privacy policies can be enforced by a machine. As mentioned above,

each facet has details that are not addressed by other facets (see Section 6.1). Thus, another

requirement for a policy generator is to include the details speciﬁc to the facets generated,

Inria

SoK: Three Facets of Privacy Policies 23

Graphical Policies

Machine-Readable Policies

Required by Legislations

G# # G# # G# #

1st party

33% 33% 34% 69% 31% 0% 80% 20% 0%

3rd party

17% 50% 33% 47% 26% 27% 60% 40% 0%

Legal basis

5% 0% 95% 0% 0% 100% 20% 20% 60%

DS Rights

22% 5% 73% 13% 17% 70% 60% 20% 20%

Data Retention

28% 11% 61% 30% 43% 27% 20% 20% 60%

Data Security

22% 22% 56% 17% 30% 53% 40% 40% 20%

Policy Change

0% 0% 100% 0% 0% 100% 0% 20% 80%

Figure 5: Coverage of taxonomy items by diﬀerent types of privacy policies, and the privacy

legislations in Table 1.

although absent in the core facet. To sum up, the challenge for a uniﬁed approach is to ensure

that the facets generated from the core keep what makes them beneﬁcial and preserve the meaning

of the core facet.

Compound The compound approach consists in taking mono-faceted policies and using them

together. For instance, we could use legal text, icons from DaPIS, and P3P. Most existing solu-

tions work this way. Unlike the uniﬁed approach, the facets in the compound approach include

the speciﬁc details of each facet — insofar that it combines existing mono-faceted solutions. The

main challenge here is ensuring consistency between the diﬀerent facets, e.g., does the machine-

readable facet accurately represents the legal text? Currently, consistency is manually checked by

designers as it is a very diﬃcult task to automate. Tool support and strict guidelines may system-

atize this process and reduce errors. The machine-readable facet is suitable for full automation,

but we argue that the natural language and graphical facets require expert supervision (lawyers

and designers, respectively). In a nutshell, the main challenge is to devise methods ensuring

consistency between the diﬀerent facets.

6.3 Missing taxonomy items

As we saw in the previous sections, not all items of our taxonomy are covered by existing

solutions. Here we quantify the coverage of each item by the solutions we have studied. The aim

of this section is to shed light on “forgotten” items, and hopefully guide future research on these

directions.

Figure 5 summarizes our results. Each cell of the heat map shows the percentage of the

studied works (in a given facet) that cover completely ( ), partially (G#) or neither (#) an

item of the taxonomy. For instance, the item 1st party is completely covered by 69% of policy

languages (forth column, ﬁrst row in Figure 5).

The 1st and 3rd party taxonomy items have the best coverage in our study. Probably due

to the fact that they express the most relevant information regarding data collection and pro-

RR n° 9287

24 V. Morel & R. Pardo

cessing for DS. These items are followed (in terms of coverage) by Data Retention and Data

Security, which are absent from almost 50% of the languages studied — except for Data Reten-

tion in machine-readable privacy policies. DS Rights is absent in around 70% of the graphical

and machine-readable policy languages. Most likely, because they refer to information diﬃcult

to express graphically, and they are outside the realm of what machine-readable language are

designed for. Finally, legal basis and policy change are absent from all the studied work. Possibly

due to their absence in most legislations; 60% and 80% respectively.

7 Final discussion

Related work This work is not the ﬁrst to propose an overview of the diﬀerent manners

to express privacy policies. Schaub et al. [95] present the requirements and best practices for

presenting privacy notices. Their work focuses on providing users impactful notices. In other

words, they study how well they understand the messages conveyed by the privacy notices.

Cranor [28] describes the notice and choice mechanisms, what P3P attempted to do to palliate

issues raised by these principles, and why it failed in doing so. These solutions focus on the design

of privacy policies to enhance usability for lay-users. In our work, we focus on connections of the

graphical and machine-readable privacy policies with legal requirements (i.e., natural language

privacy policies). Although both of these articles consider the machine-readibility of privacy

notices, we highlight beneﬁts that they did not consider, such as the possibility it oﬀers for

enforcement, auditing or automatic consent management.

To the best of our knowledge, this work is the ﬁrst to systematically study privacy policies

based on the diﬀerent means of expression or “facets”, and how these facets must be combined

to provide legally valid, usable and enforceable privacy policies.

Conclusion In this paper, we have studied the diﬀerent ways to express privacy policies: in

natural language, with graphical representations, and using machine-readable means. We have

categorized the existing work in each facet according to a taxonomy of privacy policies, as well as

their speciﬁc features. Additionally, we have studied the beneﬁts and limitations of each facet,

and we have shown that the limitations of one facet can be addressed by the beneﬁts of the

other facets. We have studied the combination of diﬀerent facets, which overcomes limitations

by bringing together the beneﬁts of each facet, and provided guidelines to design multi-faceted

privacy policies. We have made explicit the degree of coverage of the items in our taxonomy

by the surveyed work, thus shedding light on future research directions on the design of mono-

and multi-faceted privacy policies. We envision this work as an eﬀort to facilitate and boost

collaborative work between the legal domain, design, and computer science, and to provide a big

picture of how transparency can be ensured through privacy policies.

Acknowledgments

This work has been partially funded by the CHIST-ERA project UPRISE-IoT (User-centric

Privacy and Security in the IoT) and the ANR project CISC (Certiﬁcation of IoT Secure Com-

pilation). We address special thanks to Daniel Le M´etayer for his insights and comments on the

early stages of this article.

Inria

SoK: Three Facets of Privacy Policies 25

References

[1] Privacy and Identity Management for Life, 2011. URL http://link.springer.com/

10.1007/978-3-642-20317-6.

[2] California’s New Privacy Law: It’s Almost GDPR in the US, 2018-07-02. URL

https://www.bankinfosecurity.com/californias-new-privacy-law-its-almost-

gdpr-in-us-a-11149.

[3] 3DCart. Create an online store with 3dcart store builder., 2019-03-26. URL https:

//www.3dcart.com/personalized-policy.html.

[4] Helton Aaron. Privacy Commons Icon Set .:aaron.helton:., 2009. URL https:

//web-beta.archive.org/web/20090601215200/http://aaronhelton.wordpress.com/

2009/02/20/privacy-commons-icon-set.

[5] Mart´ın Abadi. Logic in access control. In Proceedings of 18th IEEE Symposium on Logic

in Computer Science (LICS 2003), 22-25 June 2003, Ottawa, Canada, page 228, 2003. doi:

10.1109/LICS.2003.1210062.

[6] Marty Abrams and Malcolm Crompton. Multi-layered privacy notices: A better way. 2

(1):1–4, 2005.

[7] Rakesh Agrawal, Jerry Kiernan, Ramakrishnan Srikant, and Yirong Xu. An XPath-based

preference language for P3P. In Proceedings of the 12th International Conference on World

Wide Web, pages 629–639. ACM, 2003. URL http://dl.acm.org/citation.cfm?id=

775241.

[8] Waleed Ammar, Shomir Wilson, Norman Sadeh, and Noah A. Smith. Automatic catego-

rization of privacy policies: A pilot study. 2012. URL http://repository.cmu.edu/lti/

199/.

[9] Anne Anderson, Anthony Nadalin, B. Parducci, D. Engovatov, H. Lockhart, M. Kudo,

P. Humenn, S. Godik, S. Anderson, S. Crocker, et al. Extensible access control markup

language (xacml) version 1.0. 2003. URL http://courses.cs.vt.edu/cs5204/fall05-

kafura/Papers/Security/XACML-Specification.pdf.

[10] Wolfgang Apolinarski, Marcus Handte, and Pedro Jose Marron. Automating the Gener-

ation of Privacy Policies for Context-Sharing Applications. pages 73–80. IEEE, 2015-07.

ISBN 978-1-4673-6654-0. doi: 10.1109/IE.2015.18. URL http://ieeexplore.ieee.org/

document/7194273/.

[11] Paul Ashley, Satoshi Hada, G

A¼nter Karjoth, and Matthias Schunter. E-P3P privacy

policies and privacy authorization. In Proceedings of the 2002 ACM Workshop on Pri-

vacy in the Electronic Society, pages 103–109. ACM, 2002. URL http://dl.acm.org/

citation.cfm?id=644538.

[12] Paul Ashley, Satoshi Hada, G

A¼nter Karjoth, Calvin Powers, and Matthias Schunter.

Enterprise privacy authorization language (EPAL). 2003.

[13] Monir Azraoui, Kaoutar Elkhiyaoui, Melek

Onen, Karin Bernsmed, Anderson Santana

de Oliveira, and Jakub Sendor. A-PPL: An Accountability Policy Language. In Data

Privacy Management, Autonomous Spontaneous Security, and Security Assurance - 9th

RR n° 9287

26 V. Morel & R. Pardo

International Workshop, DPM 2014, 7th International Workshop, SETOP 2014, and 3rd

International Workshop, QASA 2014, Wroclaw, Poland, September 10-11, 2014. Revised

Selected Papers, volume 8872 of Lecture Notes in Computer Science, pages 319–326, 2014.

[14] Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, 2008.

ISBN 978-0-262-02649-9.

[15] Michael Bar-Sinai, Latanya Sweeney, and Merce Crosas. DataTags, data handling pol-

icy spaces and the tags language. In Security and Privacy Workshops (SPW), 2016

IEEE, pages 1–8. IEEE, 2016. URL http://ieeexplore.ieee.org/abstract/document/

7527746/.

[16] Adam Barth, Anupam Datta, John C. Mitchell, and Helen Nissenbaum. Privacy and

contextual integrity: Framework and applications. In 2006 IEEE Symposium on Security

and Privacy (S&P’06), pages 15–pp. IEEE, 2006. URL http://ieeexplore.ieee.org/

xpls/abs all.jsp?arnumber=1624011.

[17] Moritz Y. Becker, Alexander Malkis, and Laurent Bussard. S4P: A generic language for

specifying privacy preferences and policies. 2010. URL http://www.msr-waypoint.com/

pubs/122108/main.pdf.

[18] Kathy Bohrer and Bobby Holland. Customer Proﬁle Exchange (Cpexchange) Speciﬁca-

tion. 2000. URL http://mail.ctiforum.com/standard/standard/www.cpexchange.org/

cpexchangev1 0F.pdf.

[19] Carolyn A. Brodie, Clare-Marie Karat, and John Karat. An empirical study of natu-

ral language parsing of privacy policy rules using the SPARCLE policy workbench. In

Proceedings of the Second Symposium on Usable Privacy and Security - SOUPS ’06,

page 8. ACM Press, 2006. ISBN 978-1-59593-448-2. doi: 10/b3tswp. URL http:

//portal.acm.org/citation.cfm?doid=1143120.1143123.

[20] Simon Byers, Lorrie Faith Cranor, Dave Kormann, and Patrick McDaniel. Searching for

Privacy: Design and Implementation of a P3P-Enabled Search Engine. In David Martin

and Andrei Serjantov, editors, Privacy Enhancing Technologies, volume 3424, pages 314–

328. Springer Berlin Heidelberg, 2005. ISBN 978-3-540-26203-9 978-3-540-31960-3. doi:

10.1007/11423409 20. URL http://link.springer.com/10.1007/11423409 20.

[21] F. H. Cate. The Limits of Notice and Choice. 8(2):59–62, 2010-03. ISSN 1540-7993. doi:

10/cgjkcd.

[22] Fred H Cate. The Failure of Fair Information Practice Principles. page 38, 2008.

[23] Alonzo Church. An unsolvable problem of elementary number theory. American Journal

of Mathematics, 58(2):345–363, 1936. ISSN 00029327, 10806377.

[24] William F. Clocksin and Christopher S. Mellish. Programming in Prolog (4. ed.). Springer,

1994. ISBN 978-3-540-58350-9.

[25] CNIL. The CNIL’s restricted committee imposes a ﬁnancial penalty of 50 Million euros

against GOOGLE LLC, 2019-01-21. URL https://www.cnil.fr/en/cnils-restricted-

committee-imposes-financial-penalty-50-million-euros-against-google-llc.

Inria

SoK: Three Facets of Privacy Policies 27

[26] Elisa Costante, Yuanhao Sun, Milan Petkovi´c, and Jerry den Hartog. A machine learning

solution to assess privacy policy completeness:(short paper). In Proceedings of the 2012

ACM Workshop on Privacy in the Electronic Society, pages 91–96. ACM, 2012. URL

http://dl.acm.org/citation.cfm?id=2381979.

[27] Lorrie Cranor, Marc Langheinrich, Massimo Marchiori, Martin Presler-Marshall, and

Joseph Reagle. The platform for privacy preferences 1.0 (P3P1. 0) speciﬁcation. 16,

2002. URL https://elearn.inf.tu-dresden.de/hades/teleseminare/wise0405/Act.%

208%20Models%20Languages%20Pierangela/Materials/P3P.pdf.

[28] Lorrie Faith Cranor. Necessary but not suﬃcient: Standardized mechanisms for pri-

vacy notice and choice. 10:273, 2012. URL http://heinonline.org/hol-cgi-bin/

get pdf.cgi?handle=hein.journals/jtelhtel10&section=22.

[29] CyLab Usable Privacy and Security Laboratory. Privacy Bird, 2019-03-26. URL http:

//www.privacybird.org/.

[30] Anupam Das, Martin Degeling, Daniel Smullen, and Norman Sadeh. Personalized Privacy

Assistants for the Internet of Things. 2018, 2018. doi: 10.1109/MPRV.2018.03367733.

[31] Martin Degeling, Christine Utz, Christopher Lentzsch, Henry Hosseini, Florian Schaub, and

Thorsten Holz. We Value Your Privacy ... Now Take Some Cookies: Measuring the GDPR’s

Impact on Web Privacy. 2019. doi: 10/gfxgxm. URL http://arxiv.org/abs/1808.05096.

[32] Daniel DelPercio. Privacy Policy Online (2011), 2019-03-26. URL http://

www.PrivacyPolicyOnline.com.

[33] Henry DeYoung, Deepak Garg, Limin Jia, Dilsun Kirli Kaynar, and Anupam Datta. Expe-

riences in the Logical Speciﬁcation of the HIPAA and GLBA Privacy Laws. In Proceedings

of the 2010 ACM Workshop on Privacy in the Electronic Society, WPES 2010, Chicago,

Illinois, USA, October 4, 2010, pages 73–82, 2010.

[34] Disconnect. Privacy Icons, 2016-03-04. URL https://web.archive.org/web/

20160304013156/https://disconnect.me/icons.

[35] Docracy. An open source privacy policy for mobile apps, 2012-07-24. URL

https://web.archive.org/web/20171124185357/https://blog.docracy.com/post/

27931026976/an-open-source-privacy-policy-for-mobile-apps.

[36] Serge Egelman, Janice Tsai, Lorrie Faith Cranor, and Alessandro Acquisti. Timing is

everything?: The eﬀects of timing and placement of online privacy indicators. In Proceed-

ings of the SIGCHI Conference on Human Factors in Computing Systems, pages 319–328.

ACM, 2009. URL http://dl.acm.org/citation.cfm?id=1518752.

[37] Serge Egelman, Raghudeep Kannavara, and Richard Chow. Is This Thing On?: Crowd-

sourcing Privacy Indicators for Ubiquitous Sensing Platforms. pages 1669–1678. ACM

Press, 2015. ISBN 978-1-4503-3145-6. doi: 10.1145/2702123.2702251. URL http:

//dl.acm.org/citation.cfm?doid=2702123.2702251.

[38] Pardis Emami-Naeini, Henry Dixon, Yuvraj Agarwal, and Lorrie Faith Cranor. Exploring

How Privacy and Security Factor into IoT Device Purchase Behavior. In Proceedings of the

2019 CHI Conference on Human Factors in Computing Systems - CHI ’19, pages 1–12.

ACM Press, 2019. ISBN 978-1-4503-5970-2. doi: 10/gf5d6v. URL http://dl.acm.org/

citation.cfm?doid=3290605.3300764.

RR n° 9287

28 V. Morel & R. Pardo

[39] European Parliament. General Data Protection Regulation, 2016-04-26.

[40] Federal Trade Commission. FAIR INFORMATION PRACTICES IN THE ELECTRONIC

MARKETPLACE. page 208, 2000-06.

[41] Federal Trade Commission. Children’s Online Privacy Protection Rule; Fi-

nal Rule, 2013-01-17. URL https://www.ftc.gov/system/files/documents/

federal register notices/2013/01/2012-31341.pdf.

[42] Joan Feigenbaum, Michael J. Freedman, Tomas Sander, and Adam Shostack. Privacy en-

gineering for digital rights management systems. In Security and Privacy in Digital Rights

Management, ACM CCS-8 Workshop DRM 2001, Philadelphia, PA, USA, November 5,

2001, Revised Papers, volume 2320 of Lecture Notes in Computer Science, pages 76–105.

Springer, 2001. doi: 10.1007/3-540-47870-1\ 6. URL https://doi.org/10.1007/3-540-

47870-1 6.

[43] Forbrukerr

A¥det. Deceived by Design, 2018-06-27. URL https://

fil.forbrukerradet.no/wp-content/uploads/2018/06/2018-06-27-deceived-by-

design-final.pdf.

[44] FreePrivacyPolicies.com. Free Privacy Policy Generator & Template with GDPR - Free

[45] Armin Gerl, Nadia Bennani, Harald Kosch, and Lionel Brunie. LPL, Towards a GDPR-

Compliant Privacy Language: Formal Deﬁnition and Usage. Trans. Large-Scale Data- and

Knowledge-Centered Systems, 37:41–80, 2018.

[46] GetTerms. Getterms.io, 2019-03-26. URL http://getterms.io/.

[47] Google. Android Permissions overview, 2019-03-26. URL https://

developer.android.com/guide/topics/permissions/overview.

[48] Colin M. Gray, Yubo Kou, Bryan Battles, Joseph Hoggatt, and Austin L. Toombs. The

Dark (Patterns) Side of UX Design. In Proceedings of the 2018 CHI Conference on Human

Factors in Computing Systems - CHI ’18, pages 1–14. ACM Press, 2018. ISBN 978-1-4503-

5620-6. doi: 10/gfxvpz. URL http://dl.acm.org/citation.cfm?doid=3173574.3174108.

[49] Margaret D. Hagan. User-Centered Privacy Communication Design. 2016. URL https:

//www.usenix.org/system/files/conference/soups2016/wfpn16-paper-hagan.pdf.

[50] Hamza Harkous, Kassem Fawaz, R

Aberer. Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep

Learning. 2018-02-07. URL http://arxiv.org/abs/1802.02561.

[51] Manuel Hilty, Alexander Pretschner, David A. Basin, Christian Schaefer, and Thomas

Walter. A Policy Language for Distributed Usage Control. In Proceedings of the 12th

European Symposium On Research in Computer Security, ESORICS’07, volume 4734 of

Lecture Notes in Computer Science, pages 531–546. Springer, 2007. ISBN 978-3-540-74834-

[52] Gerard J. Holzmann. The SPIN Model Checker - Primer and Reference Manual. Addison-

Wesley, 2004. ISBN 978-0-321-22862-8.

[53] Michael Huth and Mark Dermot Ryan. Logic in computer science - modelling and reasoning

about systems (2. ed.). Cambridge University Press, 2004.

Inria

SoK: Three Facets of Privacy Policies 29

[54] Iubenda. Features — Compliance Solutions, 2019-03-26. URL https://www.iubenda.com/

en/features.

[55] Iubenda. Terms of service, 2019-03-26. URL https://www.iubenda.comhttps://

www.iubenda.com/en/user/tos.

[56] Johnson Iyilade and Julita Vassileva. P2U: A Privacy Policy Speciﬁcation Language for

Secondary Data Sharing and Usage. pages 18–22. IEEE, 2014-05. ISBN 978-1-4799-5103-1.

doi: 10.1109/SPW.2014.12. URL http://ieeexplore.ieee.org/document/6957279/.

[57] Carlos Jensen and Colin Potts. Privacy policies as decision-making tools: An evalu-

ation of online privacy notices. In Proceedings of the SIGCHI Conference on Human

Factors in Computing Systems, pages 471–478. ACM, 2004. URL http://dl.acm.org/

citation.cfm?id=985752.

[58] Lalana Kagal. Rei. 2002. URL http://ebiquity.umbc.edu/get/a/publication/57.pdf.

[59] Saﬃja Kasem-Madani and Michael Meier. Security and privacy policy languages: A survey,

categorization and gap identiﬁcation. 2015. URL https://arxiv.org/abs/1512.00201.

[60] Patrick Gage Kelley, Joanna Bresee, Lorrie Faith Cranor, and Robert W. Reeder. A

nutrition label for privacy. In Proceedings of the 5th Symposium on Usable Privacy and

Security, page 4. ACM, 2009. URL http://dl.acm.org/citation.cfm?id=1572538.

[61] Patrick Gage Kelley, Lucian Cesca, Joanna Bresee, and Lorrie Faith Cranor. Standardizing

privacy notices: An online study of the nutrition label approach. In Proceedings of the

SIGCHI Conference on Human Factors in Computing Systems, pages 1573–1582. ACM,

2010. URL http://dl.acm.org/citation.cfm?id=1753561.

[62] Jan Kolter and G

A¼nther Pernul. Generating User-Understandable Privacy Preferences.

pages 299–306. IEEE, 2009. ISBN 978-1-4244-3572-2. doi: 10.1109/ARES.2009.89. URL

http://ieeexplore.ieee.org/document/5066486/.

[63] Ulrich K

A¶nig and Jan Schallaboeck. Privacy preferences for E-Mail messages. 2012. URL

https://tools.ietf.org/html/koenig-privicons-03.txt.

[64] Marc Langheinrich, Lorrie Cranor, and Massimo Marchiori. Appel: A p3p preference

exchange language. 2002. URL https://www.w3.org/TR/P3P-preferences/.

[65] Daniel Le M

shop on Formal Aspects in Security and Trust, pages 162–176. Springer, 2008. URL

http://link.springer.com/chapter/10.1007/978-3-642-01465-9 11.

[66] Andreas Matheus and J Herrmann. Geospatial Extensible Access Control Markup Lan-

guage (GeoXACML). Open Geospatial Consortium Inc. OGC, 2008.

[67] Michael J. May, Carl A. Gunter, and Insup Lee. Privacy APIs: Access Control Techniques

to Analyze and Verify Legal Privacy Policies. In Proceedings of the 19th IEEE Computer

Security Foundations Workshop, CSFW’06, pages 85–97. IEEE Computer Society, 2006.

ISBN 0-7695-2615-2.

[68] Aleecia M. McDonald and Lorrie Faith Cranor. The cost of reading privacy policies. 4:543,

2008. URL http://heinonline.org/hol-cgi-bin/get pdf.cgi?handle=hein.journals/

isjlpsoc4&section=27.

RR n° 9287

30 V. Morel & R. Pardo

[69] Matthias Mehldau. Icons of privacy (original), 2007. URL https://netzpolitik.org/

wp-upload/data-privacy-icons-v01.pdf.

[70] Daniela Yidan Miao. PrivacyInformer: An Automated Privacy Description Genera-

tor for the MIT App Inventor, 2014. URL http://citeseerx.ist.psu.edu/viewdoc/

download?doi=10.1.1.1029.2434&rep=rep1&type=pdf.

[71] V. Morel, M. Cunche, and D. Le M

for the iot. In Proceedings of the 18th IEEE International Conference On Trust, Security

And Privacy In Computing And Communications(TrustCom), pages 366–373, 2019. doi:

10.1109/TrustCom/BigDataSE.2019.00056.

[72] National Telecommunications and Information Administration. Short Form Notice Code

of Conduct to Promote Transparency in Mobile Apps Practices, 2013. URL https://

www.ntia.doc.gov/files/ntia/publications/july

25 code draft.pdf.

[73] Helen Nissenbaum. Privacy as contextual integrity. 79:119, 2004. URL

http://heinonline.org/hol-cgi-bin/get pdf.cgi?handle=hein.journals/

washlr79&section=16.

[74] Organisation for Economic Co-operation and Development. Skills matter: Further results

from the survey of adult skills, 2016. OCLC: ocn953634518.

[75] Ra´ul Pardo and Daniel Le M´etayer. Analysis of privacy policies to enhance informed

consent. In Proceedings of the 33rd Annual IFIP WG 11.3 Conference on Data and Appli-

cations Security and Privacy, volume 11559 of Lecture Notes in Computer Science, pages

177–198, 2019. doi: 10.1007/978-3-030-22479-0\ 10.

[76] Jaehong Park and Ravi S. Sandhu. The UCON

ABC

Usage Control Model. ACM Trans.

Inf. Syst. Secur., 7(1):128–174, 2004.

[77] Niklas Paul, Welderufael B. Tesfay, Dennis-Kenji Kipker, Mattea Stelter, and Sebastian

Pape. Assessing Privacy Policies of Internet of Things Services. In Lech Jan Janczewski and

Miros law Kuty lowski, editors, ICT Systems Security and Privacy Protection, volume 529,

pages 156–169. Springer International Publishing, 2018. ISBN 978-3-319-99827-5 978-3-319-

99828-2. doi: 10.1007/978-3-319-99828-2 12. URL http://link.springer.com/10.1007/

978-3-319-99828-2 12.

[78] Polisis. Chrome Polisis, 2019-03-26. URL https://chrome.google.com/webstore/

detail/polisis/bkddolgokpghlbhhkflbbhhjghjdojck.

[79] Polisis. Firefox Polisis, 2019-03-26. URL https://addons.mozilla.org/en-US/firefox/

addon/polisis/.

[80] Polisis. Polisis, 2019-03-26. URL https://www.pribot.org/polisis.

[81] Alexander Pretschner, Manuel Hilty, and David A. Basin. Distributed Usage Control.

Commun. ACM, 49(9):39–44, 2006.

[82] Privacy Policy Generator. Privacy Policy Generator, 2019-03-26. URL https://

privacypolicygenerator.info/.

[83] Privacy Tech. Privacy icons, 2018. URL https://www.privacytech.fr/privacy-icons/.

Inria

SoK: Three Facets of Privacy Policies 31

[84] PrivacyPolicies.com. Privacy Policy Generator: Free, GDPR, CalOPPA - PrivacyPoli-

cies.com, 2019-03-26. URL https://www.privacypolicies.com/.

[85] Aza Raskin. Making Privacy Policies not Suck, 2010. URL http://www.azarask.in/blog/

post/making-privacy-policies-not-suck/.

[86] Aza Raskin. Privacy Icons - MozillaWiki, 2011. URL https://wiki.mozilla.org/

Privacy Icons.

[87] Joel R. Reidenberg, Travis Breaux, Lorrie Faith Cranor, Brian French, Amanda Gran-

nis, James T. Graves, Fei Liu, Aleecia McDonald, Thomas B. Norton, and Rohan Ra-

manath. Disagreeable privacy policies: Mismatches between meaning and users’ under-

standing. 30:39, 2015. URL http://heinonline.org/hol-cgi-bin/get

pdf.cgi?handle=

hein.journals/berktech30&section=6.

[88] Joel R. Reidenberg, Jaspreet Bhatia, Travis Breaux, and Thomas B. Norton. Automated

comparisons of ambiguity in privacy policies and the impact of regulation. 2016. URL

http://papers.ssrn.com/sol3/papers.cfm?abstract id=2715164.

[89] Joel R. Reidenberg, Jaspreet Bhatia, Travis D. Breaux, and Thomas B. Norton. Ambi-

guity in Privacy Policies and the Impact of Regulation. 45(S2):S163–S190, 2016-06. ISSN

0047-2530, 1537-5366. doi: 10/gdcdzm. URL https://www.journals.uchicago.edu/doi/

10.1086/688669.

[90] Arianna Rossi and Monica Palmirani. DaPIS: An Ontology-Based Data Pro-

tection Icon Set. pages 181–195, 2019. ISSN 0922-6389. doi: 10/gf7fbn.

URL http://www.medra.org/servlet/aliasResolver?alias=iospressISBN&isbn=978-

1-61499-984-3&spage=181&doi=10.3233/FAIA190020.

[91] Arianna Rossi, Rossana Ducato, Helena Haapio, and Stefania Passera. When Design Met

Law: Design Patterns for Information Transparency. page 43, 2019.

[92] Mary Rundle. International Data Protection and Digital Identity Management Tools,

presentation at IGF 2006. 2006.

[93] Norman Sadeh, Alessandro Acquisti, Travis D. Breaux, Lorrie Faith Cranor, Aleecia M.

McDonald, Joel R. Reidenberg, Noah A. Smith, Fei Liu, N. Cameron Russell, Florian

Schaub, et al. The usable privacy policy project, 2013. URL http://ra.adm.cs.cmu.edu/

anon/usr0/ftp/home/anon/isr2013/CMU-ISR-13-119.pdf.

[94] Ravi S. Sandhu, Edward J. Coyne, Hal L. Feinstein, and Charles E. Youman. Role-Based

Access Control Models. IEEE Computer, 29(2):38–47, 1996. ISSN 0018-9162.

[95] Florian Schaub, Rebecca Balebako, Adam L. Durity, and Lorrie Faith Cranor. A de-

sign space for eﬀective privacy notices. In Eleventh Symposium On Usable Privacy and

Security (SOUPS 2015), pages 1–17, 2015. URL https://www.usenix.org/conference/

soups2015/proceedings/presentation/schaub.

[96] Daniel J. Solove. A taxonomy of privacy. 154:477, 2005. URL http://heinonline.org/

hol-cgi-bin/get pdf.cgi?handle=hein.journals/pnlr154&section=20.

[97] State of California. Assembly Bill No. 375 California Consumer Privacy Act, 2018-06-28.

[98] Cass R. Sunstein. Choosing Not to Choose. 2014. ISSN 1556-5068. doi: 10/gftmr3. URL

http://www.ssrn.com/abstract=2377364.

RR n° 9287

32 V. Morel & R. Pardo

[99] Latanya Sweeney, Merc

A¨ Crosas, and Michael Bar-Sinai. Sharing sensitive data with

conﬁdence: The datatags system. 2015. URL http://techscience.org/a/2015101601/.

[100] Alasdair Taylor. Privacy policy, 2012-01-23T08:05:39+00:00. URL https://

seqlegal.com/free-legal-documents/privacy-policy.

[101] Terms of Service; Didn’t Read. Terms of Service Classiﬁcation, 2019-03-26. URL https:

//tosdr.org/classification.html.

[102] Termsfeed. Generic Privacy Policy template, 2019-03-26. URL https://

www.termsfeed.com/assets/pdf/privacy-policy-template.pdf.

[103] Janice Y. Tsai, Serge Egelman, Lorrie Cranor, and Alessandro Acquisti. The Ef-

fect of Online Privacy Information on Purchasing Behavior: An Experimental Study.

22(2):254–268, 2011-06. ISSN 1047-7047, 1526-5536. doi: 10/cxhgzz. URL http:

//pubsonline.informs.org/doi/abs/10.1287/isre.1090.0260.

[104] United States Congress. Health Insurance Portability and Accountability Act, 1996. URL

https://www.hhs.gov/sites/default/files/privacysummary.pdf.

[105] United States Congress. Gramm—Leach—Bliley Act, 1999.

[106] Christine Utz, Martin Degeling, Sascha Fahl, Florian Schaub, and Thorsten Holz.

(Un)informed Consent: Studying GDPR Consent Notices in the Field. page 18, 2019.

[107] Jasper van de Ven and Frank Dylla. Qualitative Privacy Description Language. In An-

nual Privacy Forum, pages 171–189. Springer, 2016. URL http://link.springer.com/

chapter/10.1007/978-3-319-44760-5 11.

[108] Bibi Van den Berg and Simone Van der Hof. What happens to my data? A novel approach

to informing users of data processing practices. 2012. doi: 10.5210/fm.v17i7.4010. URL

https://papers.ssrn.com/sol3/papers.cfm?abstract id=2100417.

[109] Shomir Wilson, Florian Schaub, Aswarth Abhilash Dara, Frederick Liu, Sushain

Cherivirala, Pedro Giovanni Leon, Mads Schaarup Andersen, Sebastian Zimmeck, Kan-

thashree Mysore Sathyendra, N. Cameron Russell, et al. The creation and analysis of a

website privacy policy corpus. In Proceedings of the 54th Annual Meeting of the Association

for Computational Linguistics (ACL), 2016. URL http://www.aclweb.org/anthology/P/

P16/P16-1126.pdf.

[110] WP29. Opinion 8/2014 on the Recent Developments on the Internet of Things. 2014.

[111] WP29. Guidelines on transparency under Regulation 2016/679, 2017-12-15.

[112] Jean Yang, Kuat Yessenov, and Armando Solar-Lezama. A language for automatically en-

forcing privacy policies. page 85. ACM Press, 2012. ISBN 978-1-4503-1083-3. doi: 10.1145/

2103656.2103669. URL http://dl.acm.org/citation.cfm?doid=2103656.2103669.

[113] Le Yu, Tao Zhang, Xiapu Luo, and Lei Xue. AutoPPG: Towards Automatic Generation of

3819-6. doi: 10.1145/2808117.2808125. URL http://dl.acm.org/citation.cfm?doid=

2808117.2808125.

Inria

SoK: Three Facets of Privacy Policies 33

[114] Razieh Nokhbeh Zaeem, Rachel L. German, and K. Suzanne Barber. PrivacyCheck: Auto-

matic Summarization of Privacy Policies Using Data Mining. 18(4):1–18, 2016. ISSN 1533-

5399, 1557-6051. doi: 10.1145/3127519. URL https://dl.acm.org/doi/10.1145/3127519.

[115] Zero-knowledge. Privacy Rights Markup Language Speciﬁcation. 2001.

[116] Sebastian Zimmeck and Steven M. Bellovin. Privee: An Architecture for Automati-

cally Analyzing Web Privacy Policies. USENIX Association, 2014. ISBN 978-1-931971-

15-7. URL https://www.usenix.org/system/files/conference/usenixsecurity14/

sec14-paper-zimmeck.pdf. OCLC: 254320948.

RR n° 9287

34 V. Morel & R. Pardo

A Appendix

Table 4 provides a condensed overview of the content, beneﬁts, limitations and tools that we have

studied for each facet. The content columns are a coarsed grained presentation of the results in

Figure 5. The three last columns list the beneﬁts, limitations and tools of each facet.

1st Party

3rd Party

Legal basis

DS rights

Data retention

Data security

Policy change

Others

Natural Language 3 3 3 3 3 3 3 3 - Legal value - Ambiguity - Templates

- Understandability - Generators

- Enforceability - Retrievers

- Analysis tools

Graphical 3 3 3 3 3 - Understandability - Ambiguity - DS Notiﬁcation

- Incompleteness - Visual comparison

Machine Readable 3 3 3 3 - Enforcement - Understandability - Enforcement engines

- Auditability - Lack of adoption - Formal Semantics

- Correctness - Policy comparison

- Automation - Analysis tools

Content Beneﬁts Limitations Tools

Table 4: Summary of each facet’s content, beneﬁts, limitations and tools. The 3symbol indicates

that at least one work on the corresponding facet captures the taxonomy item.

Inria

RESEARCH CENTRE

GRENOBLE – RHÔNE-ALPES

Inovallée

655 avenue de l’Europe Montbonnot

38334 Saint Ismier Cedex

Publisher

Inria

Domaine de Voluceau - Rocquencourt

BP 105 - 78153 Le Chesnay Cedex

inria.fr

ISSN 0249-6399