Can someone using Wireshark obtain the full URL if my program uses HTTPS?
While perusing the contents of pcap files I've noticed some URLs appear to be visible despite being HTTPS. These mainly occur inside payloads that contain cert URLs too, but I also see HTTPS URLs inside what appear to be HTTP payloads.
Can someone say conclusively whether HTTPS URLs are truly kept secret?
I'm concerned about this because I want to put some parameters in the URL and I don't want these to be easily uncovered.
It sounds like you are looking at plain HTTP traffic which happens to contain hyperlinks to HTTPS pages. The fact the hyperlink will lead to an HTTPS page if followed doesn't magically encrypt data on the page linking to it.
The answer is simple. Is the person running Wireshark an administrator of the computer that is running your program? IFF yes, then yes.
@Izam: sounds plausible. So a partial answer to "are HTTPS URLs truly kept secret?" is "not if you publish those URLs on the internet" ;-)
So to summarize all the answers: the request URL (other than hostname) is protected exactly as well as the rest of the data, except it might be stored in browser history. And traffic analysis could let attackers figure out what someone is doing on your site. Sensitive info in form parameters sent with GET requests is fine over the wire as long as SSL isn't defeated (with several answers going off on that useful tangent), but should maybe be avoided for use in ways that will go in a users' browser history.
@Icann although not related to your specific question, it's important to remember the other place that https URLs can end up is in log files server side (maybe even multiple logs e.g. stunnel>haproxy>nginx>appserver). That's probably a good enough reason in itself to keep "secret" stuff out of the url and url query string regardless whether HTTP or HTTPs
With HTTPS the path and query string of the URL is encrypted, while the hostname is visible inside the SSL handshake as plain text if the client uses Server Name Indication (SNI). All modern clients use SNI because this is the only way to have different hosts with their own certificates behind the same IP address.
The rest of the URL (i.e. everything but the hostname) will only be used inside the encrypted connection. Thus in theory it is hidden from the attacker unless the encryption itself gets broken (compromising the private key, man-in-the-middle attacks etc). In practice an attacker might have indirect ways to get information about the remaining part of the URL:
- Different pages on the same server serve different content with different sizes etc. If the attacker scans the site to find out all possible pages he might then be able to find out which pages you've accessed just by looking at the size of the transferred data.
- Links to other sites contain Referer header. Usually the Referer is stripped when linking from https to http, but if the attacker controls one of the sites linked with https he might be able to find out where the link came from, that is the site you've accessed.
But in most cases you are pretty safe with HTTPS, at least much safer than with plain HTTP.
That's not a bad point, actually. I mean, if an attacker can read your browser history, they can probably do a lot more stuff. But it might be a good idea to keep sensitive info out of URLs so they don't go in browser history, to limit the damage of someone losing having their browser history compromised.
Using wireshark, you will be able to find out the host name, as mentioned by some other answers, due to SNI. Also, you'll be able to see some parts of certificates. The https URLs you've seen were probably the URLs of CRLs or OCSPs.
If someone could get at your URLs by walking your site, and compare the size of the returned pages with the size of what's returned in your encrypted page, they could make assumptions about which page your program called. But, as you want to put some parameters into the URL and hide those, this isn't a big attack vector in your case. If your URL is
https://my.server/api?user=scott&password=tiger&highscore=12345, and your api always returns pages between 1000 and 1010 bytes, seeing 1007 bytes returned won't help anyone determine user or password or how to enter a highscore.
https is only secure if you prevent MITM attacks. Tools like fiddler, charles or mitmproxy redirect your traffic to themselves, present a faked certificate to the client, decrypt the traffic, log it, re-encrypt it, and send it to the original site.
If your client relies on the OS's truststore, then the attacker will be able to insert his own certificate into the truststore, and your client will not notice anything. The READMEs of the abovementioned tools have instructions how to do this.
So, if you go https to prevent decryption, you will need to check if the certificate returned by the server is correct before you actually send your URL. Check how certificate pinning works for your OS/programming language, and use the above tools to make sure your client will detect a fake certificate and not send the URL before you publish it.
Good answer about mentioning the possibility of the browser contacting the CRL publishing site for a cert. I've never understood why CAs don't publish these on https sites. They seem to be mostly on http sites, which doesn't make much sense.
@SteveSether The CRL file is signed, and encrypting it isn't necessary, so insecure http is fine. Plus, using https would be tricky: How do you connect to `https://crl.example.com/` to validate `crl.example.com`'s own certificate? http sidesteps that issue.
@MattNordhoff Thank you! That answers the question. Though I believe the CRL is just supposed to be for revocation, so you're trusting the CA, and the CA's root cert. I'm assuming it's signed with the CA's root cert?
It is possible to perform traffic analysis by crawling the website and comparing packet sizes, etc. Depending on how much dynamic content is present or images referenced from the URL it is possible to get a very accurate understanding of which URLs where visited.
The presentation SSL Traffic Analysis Attacks - Vincent Berg (YouTube) explains it and has some functional demonstrations.
While interesting, this clearly doesn't answer the question as wireshark isn't doing that. I don't like being a pedant, but the answer only tangentially connects to the question. This seems more appropriate as a comment somewhere than an answer.
It is possible if the attacker has access to the client, for an overview see: https://wiki.wireshark.org/SSL#Using_the_.28Pre.29-Master-Secret
For example on Java-Clients an agent like jSSLKeylog can be attached to intercept and log the supposed encrypted content/URL. If the PreMaster-Secret of an communication can be obtained through any means in the process, captured encrypted communication can be decoded afterwards.