Difference between authentication, integrity and data origin authentication
I first thought all these terms were synonyms, but I sometimes see those terms used in the same document. For instance, on MSDN:
data origin authentication, which enables the recipient to verify that messages have not been tampered with in transit (data integrity) and that they originate from the expected sender (authenticity).
I don't completely understand how is integrity different from authenticity.
How could it be possible to ensure only authenticity of the sender without data integrity? If an attacker is able to modify the content, how can we trust the sender field to be correct?
Similarly, what does it mean to know that some data has integrity, but not knowing the sender?
To me, it's just a matter of including or not the "Sender" field of the header in the part of the message that's checked for integrity (or authenticity, I'm confused now)
As far as I know, digital signatures solve both integrity an authenticity, maybe that's why I can't see the difference between the two.
Integrity is about making sure that some piece of data has not been altered from some "reference version". Authenticity is a special case of integrity, where the "reference version" is defined as "whatever it was when it was under control of a specific entity". Authentication is about making sure that a given entity (with whom you are interacting) is who you believe it to be.
In that sense, you get authenticity when integrity and authentication are joined together. If you prefer, authenticity is authentication applied to a piece of data through integrity.
For instance, consider that you use your browser to connect to some
https://Web site. This means SSL. There is authentication during the initial handshake: the server sends its certificate and uses its private key, and the server's certificate contains the server's name; your browser checks that the server's name matches what was expected (the server name part in the URL). Then all the exchanged data is sent as "records" which are encrypted and protected against alteration: this is integrity. Since your browser receives data that is guaranteed unmodified from what it was when it was sent by a duly authenticated server, the data can be said to be "authentic".
Don't overthink things. The terminology is at least half traditional, meaning that it is not necessarily practical. We like to talk about the triad "Confidentiality - Integrity - Authenticity" mostly because it makes the acronym "CIA", which looks cool.
The "triad" has been repurposed several times, depending on who is talking about it.
@ThomasPornin: Whether before or after I read your answer, it didn't seem to me like it was possible to have either of these without the other -- meaning that, as far as I can tell, they're synonyms. Like, how can you simultaneously be convinced a message wasn't tampered with after being authored (integrity) if you're unable to certify the identity of its author (authenticity)? Could you give an example of data that has integrity but lacks authenticity? (or vice versa if that's even possible?)