How to fight browser fingerprinting?
https://panopticlick.eff.org/ , aka "How unique and trackable is your browser". For example it usually gives me a unique score. The biggest entropy values come from navigator.plugins and fonts via java and flash, but the linked pdf also points out that disabling these common plugins actually just adds to the uniqueness, as well as simply altering the User Agent. Font detection is also seems possible via css introspection.
What steps can one take, what technology options do exists to counter-measure fingerprinting of one's browser?
I think I should link the 2010 evercookie. `Its goal is to identify a client even after they've removed standard cookies, Flash cookies (Local Shared Objects or LSOs), and others.`
The easiest way to minimize tracking through things like panopticlick ... is to use something like NoScript.
- the untrue information you would need send along changes and yours doesn't, making you unique -- and suspicious;
- the detection techniques change, and you aren't aware of it, so become unique again;
or having a really awkward navigation.
Assuming that you can use Tor or a VPN or an openshell anywhere to tunnel away your IP address, the "safest" practice in my opinion would be to fire up a virtual machine, install a stock Windows Seven on it, and use that for any privacy-sensitive operation. Do not install anything unusual on the machine, and it will truthfully report to be a stock Windows Seven machine, one among a horde of similar machines.
You have also the advantage of the machine being insulated inside your true system, and you being able to snapshot/reinstall it in a flash. Which you can do every now and then - the "you" that did all the navigation before disappears and a fresh "you" appears, with a clean history.
This can be very useful, in that you could keep a "clean" snapshot and always restore it before sensitive operations such as home banking. Some VM's also allow 'sandboxing', i.e., nothing done in the VM will actually permanently change its contents -- all system changes, malware downloaded, virus installed, keyloggers injected, disappear as soon as the virtual machine is powered down.
In my opinion, not only would the total amount of work be the same (or even more), but it would be a much more complicated and less stable kind of work.
Install the most common OS, keep to the bundled browser and software, resist the temptation of pimping it, and what's to tell that machine apart from literally hundreds of thousands of similar just-installed, never-maintained, computers-are-not-my-thing machines on the Internet?
Update - browsing behaviour and side channels
Now I have installed a virtual Windows 7 machine, even upgraded it to Windows 10 as Joe Q. Average would do. I'm not using Tor or VPN; all that an external site can see is that I'm connecting from Florence, Italy. There are thirty thousand connections exactly like mine. Even knowing my provider, that still leaves around nine thousand candidates. Is this sufficiently anonymous?
It turns out not to be the case. There might still be correlations that could be investigated, having sufficient access. For example I'm playing an online game and my typing is sent straight away (character buffered, not line buffered). It becomes possible to fingerprint digram and trigram delays, and with a sufficiently large corpus, establish that online user A is the same person as online user B (within the same online game, of course). The same problem could happen elsewhere.
When I surf the Internet, I tend to always hit the same sites in the same order. And of course I hit my "personal pages" on several sites, e.g. Stack Overflow, regularly. A bespoke distribution of images is already in my browser and is not downloaded at all or is bypassed with a
If-None-Matchrequest. This combination of habit and browser helpfulness also constitutes a signature.
Given the wealth of tagging methods available to websites, it's not safe to assume that only cookies and passive data may have been collected. A site might for example advertise the need to install a font called
Tracking-ff0a7a.otf, and the browser would download it dutifully. This file would not necessarily be deleted upon cache clearing, and on subsequent visits it not being re-downloaded would be proof that I've already visited the site. The font could not be the same for all users, but contain a unique combination of glyphs (e.g. the character "1" could contain a "d", "2" could contain an "e", "4" could contain a "d" again - or this could be done with rarely used font code points), and HTML5 can be used to draw a glyph string "12345678" on an invisible canvas and uploading the result as an image. The image would then spell the hex sequence, unique to me, 'deadbeef'. And this is, to all intents and purposes, a cookie.
To fight this, I may need to:
- completely re-snapshot the VM after each browsing session (and reset the modem when I do). Keeping always the same VM wouldn't be enough.
- use several different virtual machines, or browsers, as well as well-known proxy services or Tor (it wouldn't do for me to use a proxy that's unique to me, or for which I'm the only Florence user, for anonymity purposes).
- routinely empty and/or sanitize the browser cache and remember not to always open, say, XKCD immediately after Questionable Content.
- adopt two or more different "personas" for those services I want anonymity in, and those I don't care about, and take care to keep them separated in separated VMs - it only takes one slip, and logging to one believing it's the other, for a permanent link to possibly be established by a savvy enough external agency.
This is a very ingenious method of "fingerprinting" a system, basically consisting in turning off the audio system, then virtually playing a sound through HTML5 and analysing it back. The result will depend from the underlying audio system, which is tied to the hardware and not even to the browser. Of course, there are ways of fighting this by injecting random noise with a plugin, but this very act might turn out to be damning, since you might well be the only one using such a plugin (and therefore having a noisy audio channel) among the pool matching your "average" configuration.
A better way that I found working, but you'll have to experiment and verify this for yourself, is to use two different VM engines or configurations. For example I have two different fingerprints on the same Dell Precision M6800 in two different VMs using 4 and 8 GB of RAM and two different ICH7 AC97 Audio settings (I suspect that the additional RAM available makes the driver employ different sampling strategies, which in turn yield slightly different fingerprints. I discovered this totally by accident by the way). I assume that I could set up a third VM using VMware and/or maybe a fourth with sound disabled.
What I do not know, though, is whether the fingerprints I did manage to get are revealing the fact it's a VM (do all Windows 7 VirtualBoxen have
50b1f43716da3103fd74137f205420a8c0a845ecas hash of full buffer? Do all M6800's?)
All the above going to show that I'll better have a good reason to want anonymity: because achieving it reliably is going to be a royal pain in the rear end.
`and what's to tell that machine from literally hundreds of thousands of similar just-installed, never-maintained, computers-are-not-my-thing machines on the Internet?` Easy: the AudioContext fingerprinting hash.
@forest that would be an instance of browsing behaviour pseudo-cookie. You would need several VMs, one for each group of activities you needed separated.
AudioContext fingerprinting (often) even works within VMs, since it's a hardware thing.
@Guildenstern Not saying that anonymity is worthless or you shouldn't pursue it, mind you! But I won't say it'll be easy (I mean, can I? Can you?), and well, let's face it, it all boils down to economics. If you want anonymity (or anything else), then you will better be ready to pay the price. If you aren't, you won't be getting it, or not completely -- and the resources you *do* spend for whatever result you get will no longer be available for something else.
During fingerprinting, the following are taken:
- IP address / subnet / country / region
- User Agent
- Other HTTP headers
Now, for IP address, you can use:
- Public VPN service like OpenVPN, where many others users are as well it is having nodes across multiple networks
- Online anonymous proxy
- Your own proxy server, which does set User-Agent and some common headers
- Browser plugin which modify headers
- Browser plugin, which runs browser on remote, anonymity service
BUT, you will always be fingerprinted and identified in some way. It is because you normally connect from the same subnet, and you use the same browser. So to avoid these two, is really hard, because you would need VPN servers on endless number of networks, as well User-Agent generator, and header obfuscation, which does set different values each day, or each time you start it.
I for myself, I use Amazon EC2 free micro instance with free OpenVPN server, and it does self-stop and self-start every day in different region (by itself, it boots new server with a script to set it up thru AWS API), and updates DNS via Route 53 API. It is using SQUID as proxy, and there are many rules to block advertising, tracking as well some other things. It has also full BGP table, as these VPN servers are working in the network, but you dont need this, if you are not making cover-up.
You can also make EC2 instance change it's IP address without actually rebooting it. You can use AWS API to release Elastic IP and allocate a new one, and add to it. If you change User-Agent at the same time, you will avoid fingerprinting.
See this for example:
And indeed, with different IP number and user-agent every day, Google cant recognize me, so I can see ads which are not "tracking me".
ps. When you add new instance, you need at least 20GB of EBS root volume. Then you just do
yum install squidor
apt-get install squid, configure it thru /etc/, and there you go. If you want to change IP address, just change Elastic IP on it - it does work in both ways - the way you can access new instance or you get to the internet.
And then you will need only this:
Don't forget to update instance once a while
I could produce maybe AMI image, which would work automatically. Simply you just launch instance, it setups OpenVPN, and thru AWS API it would configure IPs, DNS as well SQUID etc. Maybe there are some AMIs at Amazon already.
Some websites will complain that you work over VPN. For example, some sites which serve music or video. While most of sites works OK, there are just few exceptions.
@HendrikBrummermann Ha ha ha. You need to think like an analyst - if you dont keep the same ip number, dont keep same subnet, dont keep the user agent - in practice, it's impossible to track you over all ways are in use today (ads, network). It's practically proven to work very well.
@Andrew Smith: Your premise is completely off. You mentioned 3 pieces of information out of at least 9 that can be used to identify you. The remaining 6 WILL uniquely identify you. Regardless of how "anonymous" you appear to be, fingerprinting you proves that you actually are completely and uniquely identifiable and trackable. Specifically by looking at your system fonts and browser plug-ins installed which are highly likely to be unique, especially among the technically adept.
The so called 'canvas fingerprinting' for example will work well as long is JS is enabled, despite all other measures proposed in this question.
The technical requirement for avoiding any form of browser fingerprinting is that all potentially identifying characteristics returned by the browser are randomized, so that the probability of the browser returning a certain set of characteristics approximates the probability of observing that set of characteristics in the browser population within which we want to maintain anonymity.
Breaking this down, suppose we are only dealing with a single characteristic (say the User-Agent header) and we want to ensure we remain anonymous. For illustrative purposes, let's assume that there are 50 possible values for User-Agent, numbered 1 to 50.
One possible strategy is to use the most commonly occurring value of User-Agent, as observed according to the browser population. Under certain circumstances, this could be a poor strategy, as illustrated by the following figure:
According to the figure, the most commonly occurring User-Agent value in the browser population is only slightly more likely to occur than the remaining values. If we simply used the most common value (25), a fingerprinting service could (consistently) infer that we belong to the (relatively small) subset of 2.4% of browsers in the population. Thus, we have compromised our anonymity.
Note that simply generating values at random without regard to the browser population can severely compromise anonymity, as illustrated by the following figure:
According to the figure, the values of User-Agent returned by our browser are (uniformly) randomised. However, since the values never occur in the population, a (moderately sophisticated) fingerprinting service in this case could consistently infer that our browser type is rare--this is itself a potentially identifying characteristic! Thus, anonymity is compromised.
The only strategy which guarantees that anonymity is maintained is to match the distribution of values that we observe in the browser population, illustrated in the following figure:
By ensuring that values of User-Agent returned by our browser are randomized so that the probability of each value matches the probability of observing that value in the browser population, we never commit ourselves to identifying with a particular subset of the browser population, while ensuring that our browser doesn't use any rarely occurring User-Agent version. In this way, anonymity is maintained.
In the real-world scenario, where there are many potentially identifying browser characteristics, to maintain anonymity note that it is not sufficient to ensure individually for each characteristic that the probability of each value matches the population probability: we may risk returning rare combinations of values, thus compromising our anonymity; this may ultimately result in our browser being uniquely identifiable! Rather, it is necessary to apply the above considerations to the joint distribution of browser characteristics in the browser population, i.e. the probability of observing values for sets of browser characteristics.
Randomizing browser details is 1) not possible for the majority of fingerprinting vectors and 2) incredibly bad for your anonymity set. It literally brings your anonymity set down to one person: you. You will _always_ identify with a particular subset of the browser population, and it is better that it is a large subset than one so small that you are the only member. After all, randomness _itself_ is an identifying characteristic. **There is so much wrong with this answer.**
Todays used fingerprints mostly rely on generating as much entropy as possible by exploiting as much details of the browser as they can. Even subtle, but short term stable details, like the exact (hardware and driver version dependent) rendering on a canvas is exploited. The information gathered is most likely compressed by generating a long enough hash value depending on all the details, eg. by adding all details to a long string or actually drawing something depending on the details into a canvas and apply a hash function to it.
However, as long as the information is compressed and not stored in full, the fingerprinting can be defended by frontal attack: Instead of trying to make the fingerprint as non-unique as possible, one can just try to make it as unique as possible, eg. unique per connection.
This can easily be achived by tampering with details that relying on would be very bad practice for a web page:
- most insignificant digits of version numbers
- hiding rare fonts
- plugin detail description strings
To circumvent this attack, the fingerprint would have to be fuzzy enough to go around the random parts, which means it can not hash the data as a whole and a lot of information would have to be transfered and stored.
Circumventing the attack by masking out the random parts would mean only use details the web pages also strongly rely on, which would need permanent adaption of the fingerprinting algorithm. The amount of necessarily non-random details will be somewhat limited by standardisation. However, it may still allow unique indentification if selected properly.
The panopticlick site has "self defense" recommendations to avoid tracking.
One of the recommendations is to use the torbutton. The torbrowser design docs have a good description of how they try to avoid browser fingerprinting and tracking.
All of these approaches are trying to normalize your profile to look like as many other people as possible.
P.S : didn’t notice the question was old till late.
I may not be technically qualified to answer this, but since the OP asks for
I can help with what I know.
The EFF have an article describing some tools here.
There are lots of explanation in the other answers, I will only list tools (maybe with some description only)
Tools are for chromium, firefox users are welcomed to edit this answer to add their plugins (though most of them are available for firefox too)
- HTTPS everywhere (web store) or official page
- Privacy Badger which blocks known trackers
- best ad blocker for chromium
- zenmat vpn services, it also provide a free extension (to hide your real IP and location) or here
- Ghostery another tracking blocking plugin
- ZenMate Web Firewall (Free, Plus Ad Blocker) another zenmate product
now you only have to deal with the browser fingerprint to do so you may use the following combination of tools :
- Canvas Defender this plugin will hide your canvas fingerprint by adding a random hash to the canvas fingerprint of your browser at each new session or by your request, this will give the tracker a false fingerprint instead of blocking it(blocking it will make you standout the crowd).
- User-Agent Switcher for Chrome use this tool with Canvas Defender to hide more in the crowds this plugin will let you change your user agent, so if you use linux or a rare used browser you can hide.
- Flashcontrol as it's called this one will block any flash content unless you allow it,outdated flash can be used to track you or worse hack you.
- Disable WebGL this tool will help you to hide your real WebGL fingerprint (it will provide the tracker with 0000000000 fingerprint, however this will make you standout) the creator of the tool didn't create it to hide but simply to block WebGL features as it caused his browser to crash.
Last thing, using this tools may make your browser un-usable at first so you have to spend some time tweaking them to best meet your goals.
The best answer to actually fight browser fingerprinting is to redesign the web, such that the job of a web server is strictly limited to offer content. Every other aspect, such as how to display the page (which includes time zone, canvas, audio stuff, geolocation, available fonts, etc) is to be determined locally by the client's settings.
The solution above requires strict separation between "content" and "format" --- something that we are extremely far away in today's web, which is very horrible at so many levels. One level is the fingerprinting that we talk about, but another level is performance (the web will be also much faster if we achieve such separation, because "format" can be set locally and precompiled for all websites and re-used per "content" class).
A valid question is: is this content-format separation realistic? I argue "definitely yes, it is". The reason is that, if we survey the websites in existence today, we will notice that the content "types" do not increase linearly as we survey a new website. It rather follows a logistic curve.
If we survey websites, we will end up with content classes such as: header, footer, navbar, body, section, subsection, ..., related topics, comments, chat, images, audio, video.
A "website" made with that content-format separation, will only offer the content information, and cite resources (e.g. images, audio, video, ...) as needed. Then, the local client, will choose to render in however method seems applicable as per user's setting. E.g. placement of content types is based on local format definitions by the user, which also includes the cited resource placement policy (e.g. where should a cited image appear? in a sense it will be like how LaTeX works, except much more strict content-format separation).
But the W3C organisation has a systematic bias where they are constantly pushed to approve bad practices for the sake of allowing large organisations to track humans. There is absolutely no technical justification on why the web is so terrible at privacy. The only justification is that corporates and governments seek means to expose us.
So, a more fundamental answer is to ignore W3C, and let an open source project, with a good leadership such as that of Linus Torvald's, that strongly enforces the strict content-format separation to maintain users' liberty.
Of course, initially it will be hard, as most websites are following W3C's bad protocols and standards. Therefore, initially we will need to have lots of adapters that translate the "popular but bad W3C web" into "good web with strict content-format separation". Using such adapters, one can form public application layer proxies, that translate back and forth between thpse 2 worlds (the evil W3C world, and the good privacy-honouring protocol world) . This way, we can simply use our local clients for the good protocol, to browse evil sites, by connecting to such application-layer protocols.
It's a tough war between people who want to track us, and people who want to remain free. It's not going to be easy. But we need to start this sooner or later. W3C is not listening to us. We are many, but not united. They are a few, but united and richer. But if we get united, we will enforce the right order, and W3C will start to respect us to not totally lose its market share. Needs a lot of diligence.
Companies use finger printing partially to get rid of click fraud. AudioContext is a very strong finger printing tech, but is not widely used yet. You'd currently need to build a browser from source to get rid of it.
If you dislike finger printing, one of the things you can do is use an iOS device (in private mode), since the iphones will all tend to look similar and Apple has built in some privacy protection.
Google is at the other side of things, they are researching how to use the pattern of walking to identify users for advertising. Google Analytics is the most pervasive script & cookie (see the other answers), so I try to avoid Google as much as possible (use Startpage.com).
If you are concerned about privacy, vote with your feet. Be especially wary of news and media outlets that use finger printing to tune content to your browsing behavior, freedom (as in democracy) is at stake.
There should be a W3 institute that helps both sides. Reduce click fraud (bots) and protect privacy.
How is AudioContext fingerprinting not widely used? It became well-known when it was revealed that it was used to fingerprint millions of people by major ad companies... Also, I believe Stack Exchange only uses that specific domain for jQuery, not for tracking. However, it does use _other_ tracking URLs from Google.
Here is my two cents to this:
- The easiest change for me was to use other browser like Librefox (a Firefox modification). This browser changes many fingerprint variables including the user-agent.
- The next is a project that creates a local proxy chain:
Application <--port 3128--> Squid <--port 8118--> Privoxy <----> Internet. It is a macOS project however can be used in any OS by adapting the installation script. It does not change the HTTPS headers yet
from the author:
macOS-Fortress: Firewall, Blackhole, and Privatizing Proxy for Trackers, Attackers, Malware, Adware, and Spammers; with On-Demand and On-Access Anti-Virus Scanning
- This link shows many configuration values for Firefox privacy properties (can be used with Librefox)
I like @Andrew Smith's solution above,
adding a ssh tunnel/proxy/vpn to free instances in a cloud that self-terminate and self-build over and over.