Friday, February 15, 2008

Can a website identify a user based on IP address?

There is a public debate about whether IP addresses should be considered to be “personally-identifiable data” (to use the US phrase) or “personal data” (to use the European phrase. The question is: when can a person be identified by an IP address? This is a question of significant import, since it’s relevant to every single web site on the planet, and indeed to every single packet of data being transferred on the Internet architecture. I’ve blogged about this before, but the debate has evolved:

http://peterfleischer.blogspot.com/2007/02/are-ip-addresses-personal-data.html

Last year, the Article 29 Working Party of EU data protection authorities published an official Opinion on the concept of personal information which included a thorough analysis of what is meant by “identified or identifiable” person. The Opinion pointed out that someone is identifiable if it is possible to distinguish that person from others. The recitals that precede the EU data protection directive explain that to decide which pieces of information qualify as personal information, it is necessary to consider all the means likely reasonably to be used to identify the individual. As the Working Party put it, this means that a mere hypothetical possibility to single out an individual is not enough to consider that person as identifiable. Therefore, if taking into account all the means likely reasonably to be used, that possibility does not exist or is negligible, a person should not be considered as identifiable and the information would not be considered as personal data.

Two recent decisions from the Paris Appeals Court followed this logic. The Court concluded that 'the IP address doesn't allow the identification of the persons who used this computer since only the legitimate authority for investigation (the law enforcement authority) may obtain the user identity from the ISP' (27 April ruling). The Court recognized in the same decision that 'it should also be reminded that each computer connected to the Internet is identified by a unique number called "Internet address" or IP address (internet protocol) that allows to find it among connected computers or to find back the sender of a message'. In its 15 May ruling, the Court considered that 'this series of numbers indeed constitutes by no means an indirectly nominative data of the person in that it only relates to a machine, and not to the individual who is using the computer in order to commit counterfeit.' The Court conclusion was then that this collection of IP addresses does not constitute a processing of personal data, and consequently was not subject to CNIL prior authorization, as required by the French Data Protection Act. The CNIL has protested loudly that these court decisions are incorrect, but the CNIL’s own position of declaring “all” IP addresses to be personal data, regardless of context, seems to be incorrect to me.

Paris Appeal Court decision - Anthony G. vs. SCPP (27.04.2007)http://www.legalis.net/jurisprudence-decision.php3?id_article=1954
Paris Appeal Court decision - Henri S. vs. SCPP (15.05.2007)http://www.legalis.net/jurisprudence-decision.php3?id_article=1955
IP address is a personal data for all the European DPAs (2.08.2007)http://www.cnil.fr/index.php?id=2244


Let’s take Google as an example. Like all websites, Google servers capture the IP addresses of its visitors. If a user is using non-authenticated Google Search (i.e., not using a Google Account to log in), then Google collects the user’s IP Address along with the search query and the date and time of the query. Can Google determine the identity of the person using that IP Address only on the basis of that information? No. The IP Address may locate a single computer or it may locate a computer network using Network Address Translation. Where the IP Address locates a single computer, can Google identify the person using that computer? The answer is still “no”. The IP Address enables to send data to one specific computer, but it does not disclose which actual computer that is, let alone who owns it. In order to get to that granular of a level, it would be necessary for Google to ask the ISP that issued the IP Address for the identity of the person that was using that IP Address. Even then, the ISP can only identify the account holder, not the person who was actually using the computer at any given time.

Also, the ISP is prohibited under US law from giving Google that information, and there are similar legal prohibitions under European laws. Surely, illegal means are not “reasonable” means in the terms of the Directive.

So the reality is that like any other web site on the Internet that logs the IP Address of the computer used to access that site, the chances of Google being able to combine an IP Address with other information held by the ISP that issued that IP Address in order to identify anyone are indeed negligible.

However, let’s hypothesize for now that Google could ask the ISP for that information. Could the ISP give Google the identity of the person? Again, the answer is “hardly.” Why is it so difficult? First, an ISP can only link an IP Address to an account. That means that if there are multiple people, like a family, logging into the same account, only the account holder’s name is associated with the IP Address.

Second, ISP’s are given a finite number of IP Addresses to assign to their subscribers. At this point there are not enough IP Addresses to cover the number of users that wish to access the Internet. So, many ISPs have resorted to the use of dynamic IP Addresses. This means that a user could be assigned a different IP Address as often as every time they access the Internet. In order for the ISP to track the account that is connected to an IP Address, the ISP may require the actual date and time of use.

Finally, almost all big organizations have their own private network that sits behind a firewall. They may use static or dynamic IP addresses, but in either case these are not visible outside the organization. They are using Network Address Translation (NAT). NAT enable multiple hosts on a private network to access the Internet using a single IP Address. NAT is also a standard feature in routers for home and small office Internet connections.

So again, on the balance of probabilities and taking into account any factors identified by the Working Party as relevant, the most obvious conclusion is that the IP Addresses obtained by Google and other websites are not sufficiently significant or revealing to qualify as personal data from the point of view of the EU data protection directive.

Some people have raised the question whether the government/law enforcement can identify an individual user from an IP address from Google’s logs. Google on its own cannot tie any IP to any specific ISP account or any specific computer. We simply know that the IP address locates a computer that is accessing our system. We don’t know who is using that computer. So, in order for someone to tie the IP to an account holder, there have to be at least two subpoenas issued: one to Google and a separate one to the ISP.

Others have suggested that IP addresses should be considered “personal data”, on the mistaken understanding that looking up an IP address in a “whois” directory allows IP addresses to be tied to identifiable human beings. But in reality, if you look up an IP address in a whois directory, you usually get the name of the organization that manages the IP address. So, normally, Google could determine that a user’s queries come from a particular IP address owned by, say, Comcast, but Google has no way of knowing the name or organization of the human being behind the IP address.

A different question altogether is whether identifiability should equate to individualization. As discussed above, identifiability is about the likelihood of an individual being distinguished from others. But for this distinction to merit the protection afforded by privacy laws, it must be necessary to establish a link between the person and their right to privacy. For example, during the course of an online transaction between a retail web site and a customer, that customer’s identity will be protected by data privacy laws that impose obligations on the website operator (like seeking the customer’s consent for ancillary uses of customer information) and give rights to the individual (like allowing the customer to opt out of direct marketing). However, if someone who visits the web site for the first time (therefore prior to any transaction taking place) is presented with a local language version of the web site as a result of the geographical identifier associated to the IP Address used to access the site, there will be an element of individualization that does not involve identifying the person. In other words, unless and until that user becomes a registered customer, the web site operator will not be able to identify that individual. But the language appearing on the pages accessed by anyone using that IP Address may be different from the language presented to those using an IP Address associated with a different geographic location.

Should privacy laws apply in this situation? There is an obvious danger in trying to apply privacy laws as we understand them today in terms of notice, choice, access rights or data transfer limitations, to these types of cases. For example, there is no way that websites can provide consumers with a so-called right of access to IP-address-based logs, since such databases provide no way of authenticating a user. Individualization of Internet users is a logical and beneficial result of the way in which Internet technology works and sometimes it is also indispensable in order to comply with legal obligations such as presenting or blocking certain information in certain territories. Attempting to impose privacy requirements to situations that do not affect someone’s right to privacy will not only hamper technological development, but will entirely contradict the common sense principles on which privacy laws were founded. Privacy laws should be about protecting identifiable individuals and their information, not about undermining individualization. No doubt some people think that the cause of privacy is advanced, if data protection is extended to ever-broader categories of numerical locators like IP addresses. But let’s think hard about when these numbers can identify someone, and when they can’t. Black and white slogans are usually wrong. The real world is more complicated than that.