Why is storing passwords in version control a bad idea?
My friend just asked me: "why is it actually that bad to put various passwords directly in program's source code, when we only store it in our private Git server?"
I gave him an answer that highlighted a couple of points, but felt it wasn't organized enough and decided this might make sense to create a canonical question for.
Also, how does not storing passwords in the source code relate to principle of least privilege and other foundations of information security?
The answer is pretty short, because you should never store a password plaintext. Not even on a file called TotallyNotMyPasswords.txt. Nowhere.
@EpicKip good luck deploying any kind of web server without plain text passwords stored in any files.
It's more of an industry concept, developer turn over, separation of dev ops role, overall control, and having multiple environments for one application are all reasons not have passwords in the code (source if you will).
One good reason is that it means you'd need two different processes for open source and closed source products. You def can't store passwords in source control for open source products. Simpler if use common process
@DaveCarruthers There are setups in which this is possible (for instance with windows and AD authentication) other then that, passwords can be saved in protected configuration
I'd like a little clarification: The answers to this question should be focussed on the security side of storing passwords in version control? (I think yes, because it's posted on "information security".) I gave this question some thought and I think there are reasons for not storing password together withn source code that go beyond security.
What's wrong with using version control for passwords? /deliberate misinterpretation
@DaveCarruthers You can either store an example configuration file in your repository and manually add that configuration file on the first deploy, or you can store the plaintext passwords in CI variables like Gitlab has and generate those configuration files + deploy in a CI script, which hides that information from everyday use, unless you have administrator rights/compromised the private git server.
Why would devs even *know* the passwords for any environments besides dev/test?
@DaveCarruthers Many setups use environment variables instead. They're populated either by hand (ex. for dev work) or from an encrypted value (ex. Travis CI) or via an encrypted database (ex. HashiCorp Vault) or via a secure service (ex. Heroku). The idea is to reduce your attack surface down to a single key stored either in a daemon (like Vault) or a service (like Heroku or Travis).
The way I see it, not storing passwords in Git (or other version control) is a convention. I suppose one could decide not to enforce it with various results, but here's why this is generally frowned upon:
- Git makes it painful to remove passwords from source code history, which might give people a false idea that the password was already removed in the current version.
- By putting the password in source control, you basically decide to share the password with anyone who has access to the repository, including future users. This complicates establishing roles within a developer team, which might have different privileges.
- Source control software tends to get pretty complicated, especially "all-in-one" systems. This means that there's a risk this system might eventually get compromised, leading to password leakage.
- Other developers might be unaware that the password is stored and might mishandle the repository - having keys in the source means that extra care would have to be taken when sharing the code (even within the company; this might create a need for encrypted channels).
I cannot say that every pattern related to infosec is good, but before breaking them it's always a good idea to consider your threat model and attack vectors. If this particular password got leaked, how difficult would it be for an attacker to use it to harm the company?
I'm not sure it's fair to call a known anti-pattern of "short passwords" an "infosec convention".
@Adonalsium What I meant to say is that it's a pattern related to infosec that is not within best practices and one shouldn't mindlessly follow other people's patterns.
I'm with @Adonalsium : convention fails to carry your meaning and much better terms are available: poor idea. needlessly risky. etc
Also, and to a lesser degree, password *history* is exposed. If the password has been changed over time, knowing previous passwords may be useful in guessing future passwords should access be revoked.
I mean for "stuff other people do" convention works, but it also can mean "stuff other (knowledgeable) people think is a good idea and is common place for a reason".
"Convention", to me, implies "everyone generally agrees to do it this way, despite it not having any particularly significant advantage, apart from everyone agreeing that it's to be done this way", but you're giving some pretty significant dis/advantages, which leans more towards "this is a bad idea". Removing the parts about convention would improve the answer quite a bit IMO.
Note that when removing passwords from source control, you don't need to purge them from the history: you just need to change the passwords.
@OrangeDog unless that history of passwords is a good indicator for future passwords (which doe snot hold true if your passwords are secure and random, but this isn't always the case).
In addition to item 4, git is a distributed repo system. Anyone who can clone the repo to, say, a development laptop could lose that laptop, and now passwords are part of that data breach.
@NotThatGuy No, convention just means "everyone generally agrees to do it this way". It's important to note, because following conventions is generally really important and valuable in its own right.
Also, there may be employees that occasionally work from home. Some of these (and I've known some) use their home PC to develop, instead of their company laptop. This means relying on their home PC's security instead of the company one's.
I would add that in many environments, devs are not allowed access to production systems at all and ops/admins will have to maintain all passwords. Since passwords will expire in 45/60/90/180 days on many systems, devs **cannot** maintain the passwords and you don't want ops to have to go into the code, so you put all credentials in a separate config file and when ops deploys the build they put the current production password(s) into the config file and update that file when passwords expire. It's possible that devs not knowing production passwords is legally required on federal systems.
I'd agree that calling this a convention is inaccurate and misleading. Conventions are just customs, or "social norms" where you could do it completely differently and it'd be OK as long as everyone else did it that way. The classic example is which side of the street we drive on. We all agree which side, and follow it. But left/right is arbitrary and would (and does) work equally well no matter which the society picked. That isn't the case here. If everyone put passwords in version control, it'd still be bad.
First, the non-security reason:
Password Change Workflow
Passwords change independently of a software application code. If a DBA changes a database password, does it make sense for developers to have to update the code, get a new build and release to production, and try to time it all? Passwords are a runtime configuration artifact, not development artifact. They should be injected via configuration files, environment variables, or whatever configuration paradigm you are using.
Generally, version control systems provide authorization control so only authorized users can access it. But within a repository, permissions are generally either read/write, or read-only, or possibly some bells and whistles like GitHub provides. You're unlikely to find security constraints that let developers get source code, but not passwords. If you can clone a repo, you can clone a repo. Period. In many environments, developers don't have full access to production. Sometimes they have read-only access to production databases, sometimes no access. What if developers do generally have access, but you don't want all of the interns, new hires, etc. to have access?
What if you have multiple environments, like a development environment, QA environment, staging, and production? Would you store all of their passwords in version control? How would the app know which one to use? The app would need to have a way to know the environment, which would end up being a configuration setting. If you can make the environment name a configuration setting, you can make the database password a configuration setting (along with the connection string, username, etc.)
As others have mentioned, version control is designed to preserve history. So older passwords will still be retrievable, which may not be the best thing.
How many times have we seen headlines in the news about source code for some product being leaked to the world? Source code is whatever is in version control. This is probably the top reason to not put passwords in version control!
That said, passwords need to be stored somewhere. What the above concerns are alluding to is not necessarily that they shouldn't be stored in version control at all, rather, that they should not be stored in the product's source code repository in version control. Having a separate repo for config management, operations, etc. is very reasonable. The key is to split operations from development when it comes to security credentials, not necessarily to avoid version control altogether.
Also, for non-production environments, I've stored passwords and API keys for external systems in configuration files within the product source code as defaults. Why? To make it painless when a developer checks out the code, they can just build and run it and not have to go fetch extra config files. These are credentials that rarely change and do not lead to any trade secrets. But I would never do this with production secrets.
So the bottom-line is... it depends.
Your first point is bigger than it sounds - having passwords tied to the source-code versioning can make it impossible to reliably `git-bisect` to find when bugs originated, for example.
@TobySpeight We must have very different definitions of "impossible". In a dev environment it'd be rather strange to change passwords anyhow - they are after all by definition not a secret - so there's rarely a reason to change them. And if there is, it should be trivial to replace the old config files with the new ones as part of your bisect script. I mean that's by far the easiest part of the script that has to build your whole project and then host it.
It's always important to keep in mind that suggestions do have to be tailored to fit your use-case. Security safeguards taken by, say, the NSA to protect all the zero-days they keep around for a rainy day should be much more stringent than security safeguards taken to protect up-votes on a cat-picture-posting site. On one extreme, I can think of an example when keeping passwords/tokens/etc in a private git repository might be reasonable:
If that repository is only accessed by one person and the application doesn't store any information of any value.
In that case, go to town! Just keep in mind what @d33tah said in his answer - scrubbing things out of a git repository can be surprisingly difficult (the whole point, after all, is to keep a history of everything forever) so if down the road you decide you want to collaborate with more people but don't want them gaining full access to all your systems, then you're going to have a bit of a head-ache on your hands.
That is what it really comes down to. Your code repository is in some way "public", even if only shared with collaborators. Just because you want to collaborate with someone doesn't mean you want to give them full access to your infrastructure (which is effectively what happens when you put passwords/tokens in your code repository). It is easy to think "yes, but I trust the people that I'm collaborating with!", but that is simply the wrong way to look at the scenario, for a number of reasons:
- In actual practice a large fraction of data leaks start with internal actors (collaborators, employees). By one study about 43% of data breaches started with an internal actor. Half of those were intentional, and half accidental.
- That last point is important and why this isn't just about trust. Roughly 20% of data breaches are accidental and caused by internal people. I'm smart enough to know that I'm not smart enough to get everything right all the time. What if I turn my repository public to make room for some fun integration tool and forget that I have credentials in it? What if I have a backup hard drive that I forgot to encrypt and it walks out of my office? What if one of my passwords gets leaked and someone uses it to guess the password for the account to my git repository (cough gentoo github repository cough)? There are any number of accidental mistakes that I can make that might result in my git repository being exposed, and if that happens and I have credentials in there and the person who finds them knows what they are looking at, I might very well be hosed. That sounds like a lot of and's but the reality is that this is how breaches happen.
- Even if I trust someone, that doesn't mean that I have to give them full access to everything [Insert cheesy quote about how power corrupts]. Again, putting credentials in a git repository means that I am potentially giving full access to all of my infrastructure to all of my collaborators. There is always the chance that you don't know that person as well as you think, and they might decide to do something they shouldn't. Even if you personally know and trust everyone one of your collaborators with your life, that doesn't mean that they won't make some of the above mistakes (or any of the endless options that I didn't mention) to accidentally release your code.
In short, good security is about having layered defenses. Most hacks happen because hackers find weaknesses in one area that allows them to exploit weaknesses in another area, which leads somewhere else, which finally takes them to the big payoff. Having as many security safeguards in place as makes sense for your situation is what keeps the whole thing secure. This way when a backup-harddrive walks out of your office, and you realize that it wasn't encrypted, you don't have to ask "Was this a targeted attack and do I now have to change every single credential I have everywhere because they were all in the git repository on that backup drive?". Again, depending on your situation this may not apply to you, but hopefully this helps explain in what scenarios this might matter.
+1 for "...this is how breaches happen." Life is chaotically random, and it is utterly amazing how many times a long string of seemingly random acts can collude together to form the perfect accident (breaches, fires, vehicle collisions). Keeping the password out of the repo removes the possibility that a breach will expose the password.
Keeping secrets (passwords, certificates, keys) separate from source code makes it possible to manage source and secrets according to different policies. Like, all engineers can read the source code, only the people who are directly responsible for production servers can access the secrets.
This makes life easier for developers because they're not bound by the strict security policies that are needed to protect the secrets. Source control policies can be made much more convenient for them.
As a person directly responsible for production servers, I can tell you that under no circumstances do I want developers to have QA or production-environment credentials. If they have them, then they'll "fix" a problem be tweaking things until they start working, won't document their changes, and the whole thing will fall apart because the ratio of Wolfsbane to Eye of Newt is off. Do whatever you want in your Dev environment, but give me a clean code package to deploy, with documentation of what I have to do to make it work, and the only access I might grant you is read-only to logs.
@MontyHarder That's definitely due to the tension between the ops goal of "don't break, and don't leak" and dev goal "fix stuff now".
@WayneWerner It's not just "don't break, and don't leak", it's "make damn sure I can replicate this reliably". The separation of roles between dev and ops forces them to explicitly communicate dependencies so that we can do that reliable replication. Fix it now in dev, document for me what you did to fix it so I can fix it in QA, and if that didn't work, you didn't document it well enough. Try again.
This convention, like many other security "best practices" is a handy way of making sure that things don't go wrong because of bad habits or routine.
If you always remember that your sensitive passwords are in your version control system and if you never give anyone who shouldn't have the password access to the repository and if your repository is stored as safely as these secrets require and if your deployment process is likewise safe, secure and protected from unauthorized people and if you can be sure that all the backups and copies of your repository are and... you can probably add a couple more "if"s specific to your context.
...then you can well store your passwords in your version control system.
Under most circumstances, at least one of those "if"s is either not sufficiently compliant to your conditions, or is outside of your control.
For example, in many settings your developers should not have access to the production database. Or the backup system is outsourced to another department. Or your build process is outsourced to the cloud.
That is why in general, not storing your passwords in your version control system is a best practice that makes sure you don't fall into one of those traps because you didn't think about them. "no passwords in git" is easier to remember than a long list of conditions that you need to satisfy to safely store passwords under version control.
Other answers are great, but consider also the problem of backups. You have much more flexibility on where, how, and for how long you store your code repository backups if they don't contain sensitive information about how to access your server or admin accounts.
From another more security-focused angle, just your code backup might be an interesting target for unethical competitors in your business. Depending on your business, most competitors in a "rule of law" country wouldn't touch it anyway. A code backup with passwords will be a target for any criminal organization interested in your users' data, or interested in blackmailing your company.
The damage from a code repository breach would be greater with the passwords for similar reasons. Would you rather have your source code stolen and published, or have all your users exposed to identity theft, with a potential lawsuit or even criminal case against you for failing to secure their private data (think GDPR)?
"why is it actually that bad to put your keys directly in your main door, when we live far away from civilization?"
Because people who want to do bad things, will do easilly and the cost of doing hard for them is not too high. Keep your keys with you not next to the door. Common sense.
Private repo? How many people have access to download it? And how many people are connected to its computers? And then... Are all they free from danger?
It also depends on the source code control system.
Git has the attitude that the source code control system should give you a reliable history of the project. Which is not a bad attitude to have. But there's the effect that if you checked a password into git, it's very hard to remove it from your repository.
Perforce has a command "obliterate" for that purpose. If you checked in a password, or someone checks in an 8 gigabyte uncompressed video file, or someone checked in some third party code that is infringing someone's copyright and should never have been checked in, or someone who got fired checked in lots of porn on their last day at work, you can "obliterate" it. It's completely gone from the repository and from the history.