How would one crack a weak but unknown encryption protocol?

I was reading this interesting question:
Is my developer's homebrew password security right or wrong, and why?
It shows a weak homebrew algorithm developed by "Dave", and the answers discuss why this is a bad idea. (Actually hashing algorithm rather than encryption, but my question applies to both.)
It makes sense to me that a homebrew algorithm is a very bad idea, but there's one thing I'm not understanding.
Assume I'm an attacker, and I am faced with an weakbutunknown encryption algorithm developed by "Dave". How would I crack it? I wouldn't even know where to begin. It would be a seemingly meaningless string of characters.
For example, say that the homebrew algorithm is like this:
 Use a weak wellknown encryption algorithm on the original data, then:
 Do a bitwisenegative on any byte whose serial number in the file has a repeated digit sum which is prime. (Or any other such mathematical manipulation, this is just an example.)
How would one hack a file produced by such an algorithm without knowing it in advance?
Edit: Everybody, please don't try to convince me of how hard it is to keep an algorithm secret. Please answer this question on the assumption that the algorithm is kept completely secret, despite of how difficult that is to achieve in real life.
Also, assume that I have no access at all to the algorithm, only to the resulting data.
Feel free to answer the question either for the case of multiple ciphertexts or just one. Regarding the type of data: Say binary.
Also: See here: http://old.honeynet.org/scans/scan16/som/som34.html
No 'unkown' encryption algorithm can be said to be 'weak' until it is understood and cracked. Unknown doesn't imply weakness, it is the attempt at discovery which reveals the nature of the beast and henceforth its strength at obsfucation.
Well, it appears that you made your point. That's it, you win. A bunch of guys on a security site weren't able to crack your "encryption".
Hello RamRachum, Challenge to decode something stylequestions are considered off topic here.
Hi Ram, I think you have a few misplaced assumptions. First of all, as I understand it, you are *only* asking about the pure mathematical aspects of cryptanalysis. If this is the case, the question should probably be moved to [crypto.se], instead of remaining on [security.se]  Here we *do* look at the whole picture, and we definitely *do* remain in the realm of reality and not a perfectly spherical chicken (as one of the commenters mentioned).
In realworld scenarios, if the confidentiality of your data is dependant on the secrecy of your encryption algorithm, it would probably be easier to steal the details of that algorithm (assuming the algorithm is not trivial). Weakest link, and all that. Moreover, in nontrivial realworld usage scenarios, it is nearly impossible to keep the algorithm to any level of secrecy, even *without* the attacker's involvement  whether it is other consumers of the algorithm, developers with access to the source code repository, or other inside attackers.
Since we are dealing with the real world, it is often much simpler to throw some automated tools at the task of basic cryptanalysis, without having to delve in to the mathy details. Okay, fine  it is precisely these details you want to know about, but again that would be off topic here. (Just as the details of an encryption algorithm would be off topic).
Lastly  I doubt anyone here would be inclined to spend more than a trivial amount of time on breaking your encryption, that of course does not mean that it cannot be done with appropriate time and resources. Which brings up my final point  how easy it is to break your encryption, is relevant to your risk profile and threat model. Is this a banking application, or your little sister's cat blog? The commensurate rewards will define how much effort  by how many attackers  will likely try to break your encryption. It is actually *easier*  let alone more secure  to just use strong encryption.
Your question really lacking the details @RoryMcCune mentions, with severe influence on the "correct" answer: If all you have are encrypted messages and no background knowledge on it, the worst case assumption is that somebody used something equivalent to a onetime pad  if a truly random, irreproducible key was used you _can't_ obtain the message without further interaction with the sender or recipient (direct or indirect). But it might be a trivial scheme as well, that an experienced cryptanalyst can break for breakfast. You can only try and see.
@Thomas wow! Apparently I'm a l33th4x0r now too. That took me < 5 minutes using python (I'm learning python) and I have no notable cryptanalysis experience... :)
I can recommend you `The Code Book` by Simon Singh. Do you want to know how Enigma was cracked during WW II? This book explains it in easy words. It also lists many of the famous ciphers from the history and how they were cracked. It's a nice read for everybody who are interested in cryptography.
"How would I crack it? I wouldn't even know where to begin." You're assuming there that the person attempting to crack it is as poorly versed in crypto cracking as you or I.
By the way: in real life you get the encrypted data from a certain source(that you know) and at a certain time, probably knowing already something about the source. This allows to do a bunch of hypothesis about the content of the text, that can help a lot in decrypting. The same was done during WWII by Turing et al. when dealing with Enigma: they knew that at 06 am there was a message regarding wheather and assumed some parts of the encrypted message referred to weather terms, allowing them to break the cipher in much less time(although they knew *more or less* the algorithm).
I now asked a similar question on crypto.stackexchange: http://crypto.stackexchange.com/questions/6752/howwouldonecrackaweakbutunknownencryptionprotocol
Please don't ask the same question on two different sites. That is discouraged/not allowed. Instead, you can click the "flag" button to ask that the question be moved to Crypto.stackexchange.

Assume I'm an attacker, and I am faced with an weakbutunknown encryption algorithm developed by "Dave". How would I crack it? I wouldn't even know where to begin. It would be a seemingly meaningless string of characters.
That's correct, you wouldn't. Here's some encrypted data (4587556841584465455874588). Got a clue what that means? Absolutely not.
However, you're missing the core, fundamental most integrally important central pillar key to the universe that holds cryptography together. The idea is simple:
the key is everything
That's it. That's the bit you have to protect. The bit you must guard with your life and hope nobody is going to hit you with a hammer until you tell them what it is.
On this basis, you must assume that your algorithim can be read by the attacker. They know how it works. They can document its process. If there are any weaknesses, they'll find them. And they'll exploit them. Like that angry CIA Dad from Taken.
This, it turns out, is less of an assumption and more of the practical case in use. Dave, the home brew cryptographer, wants to include an encryption algorithm in his program. Deciding to eschew all the testing and design work cryptographers have done for him for free over the years, he writes something involving the odd xor, compiles his program and helpfully gives it to friends.
That algorithm is now in their hands. Game over.
Now, you might ask "can't I just keep the algorithm secret? That'll work, right?" Oh Dave, plz stop. Nonono. The problem with secret algorithms is that they're much more likely to be stolen . After all, the key is different for each user (actually, this is not a requirement, but, let's just assume it is for simplicity) but the algorithm remains unchanged. So you only need one of your implementations to be exposed to an attacker and it is game over again.
Edit: Ok, in response to the OP's updated question. Let us assume for a moment that the algorithm is totally unknown. Each of the two participants in an encrypted conversation have perfect security of their algorithm implementation.
In this case, you've got data to analyse. You could do any one of the following:
 Analyze for frequently known letters. This is how you'd break a typical caesarshift cipher.
 Attempt to guess the length of the key. With this information, you can move into looking for repeated ciphertext blocks which may correspond to the same plaintext.
 Attempt index of coincidence and other such measures used to break the vigenere cipher, since many polyalphabetic ciphers are (possibly) just variants of this.
 Watch for patterns. Any pattern might give you the key.
 Look for any other clues. Do the lengths correspond to a certain measure, are they for example multiples of a certain value such as a byte boundary and so are (possibly) padded?
 Attempt to analyze with one of the symmetric cipher cryptanalysis techniques. These rely on knowing the algorithm in many cases, so may not apply here.
 If you believe the the data in question represents a key exchange, you can try one of the many techniques for breaking public key algorithms.
The fact is that a short piece of data from an unknown algorithm could well be undecryptable. However, this does not mean you should rely on this being the case. The more data a cryptanalyst can recover, the more likely they are to break your algorithm. You probably don't know without serious cryptanalysis what that boundary is  for example, it is reasonable to assume that one could bruteforce a caesercipher algorithm for three letter words, since there are few that make sense.
You are up against reuse problems too. In WWII, the Engima overcame this problem by having programmable settings for their secret algorithm, but this was broken too.
There is also the human element of cryptography to consider. I realise the label on the tin says "use once, do not digest" etc, but humans are humans and will likely use twice, three times etc. Any such behaviour plays into the hands of the cryptanalyst.
You've made an argument that keeping the secrecy of the algorithm is very hard. I completely agree. However, this is not what the question is about so this answer is not helpful.
@RamRachum: You might want to update your answer to specify that our starting assumptions are a perfectly spherical chicken on a frictionless plane.
Scott: usually, the chicken occupies a point mass, because air resistance is really complicated with all those feathers and uh, yeah. @Ram ok, I've updated the answer to give some general pointers on where you'd start with some random data and what constitutes a risk to the "random" data. Disclaimer: if you rely on that information, you do so at your own risk.
@AntonyVennard Thanks for adding useful information!
Now, would any of the approaches that you mentioned have worked in the example encryption that I've provided in my question above?
@Ram probably. Altering a file per byte position might confuse some of the text, but if you have enough you'd soon work out that's the pattern, particularly if parts of words are appearing. You'd essentially break it in two stages: 1) the weak algorithm, then 2) ok, I have parts of words, now, what's the pattern with the remaining ones?
If you don't know the algorithm, and depending on how good its design is, it is not trivial, but it has been proven you just need either enough ciphered messages or a few clear messages and their ciphered versions to remove the noise and infer the transformation by applying a correlation algorithm between ciphered messages and their most probable decryption (using the known messages as training sequence, if available). If you are interested on the maths involved, reading about information theory, signal processing and machine learning may help you.
I don't know who you are, but +1 for the Taken reference.
This answer would have been helpful when I had the gall to try to solve Kryptos a few years ago: http://en.wikipedia.org/wiki/Kryptos

An unknown "encryption" algorithm has been historically achieved at least once. I am speaking of Minoan Linear B script, a writing method which was used in Crete around 1300 BC. The method was lost a few centuries later, with the death of all practitioners and the overall collapse of civilization during the socalled Greek Dark Ages. When archaeologists began sifting the earth around Knossos and other locations, at the end of the 19th century, all they got was a bunch of tablets with unknown signs, without a clue about the writing system which was used to produce them.
The interesting story here is that Linear B was unravelled in 1950s, using the same analysis tools which were employed against encryption systems of that time. In effect, the writing was considered as an "unknown encryption algorithm". It succumbed to statistical analyses, chained inferences, and some hypotheses on the plaintext (basically, the assumption that the base language for a variant of Greek). This is a classic and masterful illustration of how cryptanalysis works against "manual cryptosystems".
Of course, assuming that a cryptographic algorithm can be in use and still remain secret, is implausible. By the same assumption, there is no piracy of video games or media contents. The real world implacably reminds us that this is not true. The only known way by which an algorithm may remain secret is to kill its inventors and practitioners, destroy their apparatus, and wait for a few centuries. This has a few inconvenient side effects.
And even if, in a given specific instance, details on an algorithm have not leaked yet, there is no way of quantifying how much secret the algorithm is, i.e. how much time it will take for reverse engineering, bribes or wholesome theft to rebuild the algorithm. This is the main reason why cryptographers, about 40 years ago, decided that key and algorithm should be split, with the key being secret and the algorithm being nonsecret: you can quantify the secrecy of a key, not the secrecy of an algorithm.
This gives us an insight into your specific question. Your "secret algorithm" hinges on the notion of a "mathematical manipulation". How many of these are they ? Can you estimate or describe the set of "mathematical manipulations" ? You will find that an encryption algorithm is itself a "mathematical manipulation", so your question is rather illdefined.
+1 this has a few inconvenient side effects.
Excellent answer, shows clearly why the question is "flawed" in itself.
I don't really understand all this "The real world implacably reminds us that this is not true." in all the answers. Real life example: one uses a reversible encryption algorithm to protect sensitive user data on the server. That means that it can't be a oneway algorithm like we can use to store passwords, so it must have a key. So now how exactly protecting this key is different from protecting the algorithm? Just assume that the guy who wrote this algorithm is the same guy who generates/manages the encryption keys. Bribes, stealing etc. would apply in the same way to both methods.
I'm not saying that it is the way to go and I will never use something like this in production, but all the answers are just lazy, imho.
The algorithm exists as compiled code in files, and also source code on developers' machines, revision control software, backups... and there are design documents, as printed paper, emails, and in the head of several people. It would be very hard to track them all and ensure secrecy. This contrasts with a _key_, which exists only in RAM or, at worst, in a single file, and not in all the other mediums I just listed. You can abduct all the developers, none of them has the slightest clue about the _value_ of the key since it never entered their brain in the first place.
@Tom Leek,When I did read the OP question I expected to see interesting answers explaining how hard it is to design an algorithm that can't be broken using modern cryptanalysis methods (for example goo.gl/IEkfh), instead I just got the message "you can't keep the algorithm secret". You write "It would be very hard to track them all and ensure secrecy" but it is quite possible and the reason why people don't do it is not because it is hard, but because even the most complex homebrewed algorithms will most likely have "holes" that can be exploited by attackers even without knowing the algorithm
Please, don't misunderstand this, I didn't downvote your answer and it is right in a lot of cases and I should have written my own answer if I want to change something in yours. Just... I'm not a cryptographer and while I'm interested in this theme, I'm not really qualified to give a complete answer to this question and I expected more from one of the top users :)
@XzKto: well, designing an algorithm which resists cryptanalysis is a quite different subject. The question was about cracking an _unknown_ algorithm (emphasis already in the original question) as opposed to a known algorithm. Cryptographers assume that the algorithm is known for all the reasons I explained; how known algorithms still resist cryptanalysis is a very large subject best handled in crypto.SE. When the algorithm is totally unknown, well... it never really happens. So there is no possible answer except wisecracking about archaeology.
@Tom Leek "The question was about cracking an unknown algorithm"  yes, I was writing about it too. "Cryptographers assume that the algorithm is known"  I guess that is what we disagree on. I don't know many popular cases of reverseengineering an unknown algorithm, but I know that it happens and there are methods to do it. I didn't find anything good, but this can be an example of what I am talking about: http://goo.gl/sgvBY . I think this should be the reason why homebrewed algorithms are bad but looking again at all the answers I admit that maybe I was wrong.

To attack a cryptographic protocol, you have the following attack methods
Known plaintext: Trying to find correlations between the plaintext you have and the corresponding ciphertext.
Chosen plaintext: Encrypting specific plaintext and studying the changes to the ciphertext as the plaintext changes.
Choosen ciphertext: Decrypting specific ciphertext and studying the changes to the plaintext and the cipher text changes.
Known cipher text: Where all you have is the cipher text, below is a simple example.
Long time ago I took a cryptography class, in one the lectures we were taught the cryptonalysis of substitution ciphers. This is not how things are done now, but this is where the science of cryptography had started, and this is how cryptonalysis had begun.
Let's say you can across this cipher text.
Mx qeoiw wirwi xs qi xlex e lsqifvia epksvmxlq mw e zivc feh mhie, fyx xlivi'w sri xlmrk M'q rsx yrhivwxerhmrk.
You don't know the algorithm, you don't know the key. How should you start?
 Analyze the letter frequency: Total length is 87 letters. We see that
i
was used 12 times > ~13%. According to Wikipedia's article on letter frequency, this letter is likely to bee
. Our cipher text is now:
Mx qeoew werwe xs qe xlex e lsqefvea epksvmxlq mw e zevc feh mhee, fyx xleve'w sre xlmrk M'q rsx yrhivwxerhmrk.
 Now the second most frequent letter is
x
was used 11 times > ~11%, so it's likely to bet
. Our cipher text is now:
Mt qeoew werwe ts qe tlet e lsqefvea epksvmtlq mw e zevc feh mhee, fyt tleve'w sre tlmrk M'q rst yrhivwterhmrk.
 Now we're starting to see the patterns. Replacing
i>e
andx>t
suggests that the key could be4
. Let's try it:
It makes sense to me that a homebrew algorithm is a very bad idea, but there's one thing I'm not understanding.
Ahaa! We got it! Now you've done your first cryptonalysis. This is one way the ciphertext could be analyzed.

I think nobody has said it aloud here, so I will.
If a cryptographer is given only one ciphertext with no means to get more, the ciphertext is short and no knowledge of the plaintext is given, it is near impossible to decrypt the text. The only way this is still possible is if the cipher is around the difficulty level of a substitution cipher.
Given the same algorithm, if there is a way to get more ciphertexts on demand, if the ciphertext is sufficiently long or if there are some known parts of the plaintext to help, it is likely that the algorithm can be cracked given enough effort.
But even still, cryptanalysis takes a lot of effort compared to the effort of creating a simple cryptoalgorithm from scratch, so it is unlikely anyone will expend the effort unless there's a good reason for doing so.
Hmm, so the ultimate security arrangement would be to couple a wellknown encryption algorithm with a homebrew one, no? Because then you enjoy the advantages of both: You enjoy the security of the wellknown algorithm, but if a vulnerability was discovered in it, you're still protected by the homebrew one. No?
Not an unreasonable position, if you are proficient enough to combine the algorithms safely without introducing new vulnerabilities. Another choice, if you are proficient enough, is to make a trivial modification to a well known algorithm and keep the modification secret. However, in the real world, you are almost 100% guaranteed to make things worse by adding something to it  cryptography is hard, even though it might at times seem like it isn't.
If, in symmetric encryption, you apply your homebrew algorithm before or after a wellknown algorithm, you will in no case make things worse. I think that applying your homebrew algorithm after a wellknown algorithm in encryption (and so before the wellknown algorithm in decryption) would be better than applying it before the wellknown algorithm. What you're stating in your comment is probably true, Ram.
To give an other example of the 'security through obscurity' and 'wellknown algorithms' combination approach: When doing a cryptographic hash, applying an injective function (such as salting or xoring with a constant) on the data before or after hashing cannot decrease security, if the hash function has no weaknesses.

If you're going to distribute a secret algorithm, why not just distribute onetime pads instead? It's more secure.
If you don't like the idea of onetime pads because too much data is moving over the wire, then why are you assuming that the attacker only has one cyphertext?
Assuming somebody only has one cyphertext, and doesn't have the algorithm, (two bad assumptions), then your weak, but wellknown underlying encryption system probably doesn't have any vulnerabilities to begin with.

There are several ways.
The first, and most obvious is that the attackers compromised your server to the extent where they managed to obtain your source code. In that particular case, your homegrown scheme is as good as nothing.
The second way is that the attacker might be able to submit his own values to your algorithm and see the before/after result. This is known as the Chosen Plaintext Attack. A good encryption scheme should not be vulnerable to it. A homegrown scheme probably is.
Even without a chosen plaintext attack, a homegrown scheme is usually laughably weak. A laymen like you and I might not be able to make sense of the output of a homegrown scheme. However, there are a class of very smart people who devote their time and effort to breaking such cryptographic schemes usually in return for a good paycheck. You might have heard of them, we call them Cryptographers.
I'm possibly being a bit pedantic here but isn't it "as good as nothing" rather than being "worse than nothing"?
@AndySmith Fine fine. :P Edited.
@RamRachum, please tell us, why do you think this is useless? Maybe with more information Terry or somebody else could help you. As far as I can see, this is a pretty good answer.
@RamRachum I'm sorry, but why is this useless? You can't expect me to tell you how to break every single homegrown scheme out there can you? The crux of the matter is that homegrown schemes are weak, and there are very smart people out there breaking them.
@RamRachum So you want me to give an answer specific to your algorithm with frankly very unrealistic limitations? Sorry no can do. That would be "Too Localized". My answer is sound as long as you are living in the real world.
@Ram  how about remaining polite, okay?
Worse than nothing because it breeds a false sense of security. At least if you don't have a control in place you continue to rationalise future security decisions on that assumption, rather than relying upon the assumption that you have a control in place that is better than it really is.

Please answer this question on the assumption that the algorithm is kept completely secret, despite of how difficult that is to achieve in real life.
The problem with this is that you are ignoring Kerckhoffs's principle, which says that the security of an encryption scheme should not depend on the secrecy of the algorithm.
Anyway if you are really interested in crypto you should take a course like this one.
It's a principle. It's meant to guide decision making. It's not a law, rule or theory that is actually fact.
It **is** a rule.
@Rushyo Given how well Kerckhoff's principle has worked for modern algorithms, I'd say it's a fact.
@SmitJohnth For it to be a rule it would need an obligation. You can apply a rule to a principle (i.e. everyone must follows Kerckhoff's principle to work on this library) but that doesn't make the principle itself a rule.
@TerryChia A principle can be applied successfully 100% of the time and still doesn't become a fact. A fact isn't actionable, like a principle is. You can enact a principle by following it, you can't enact a fact. You don't say "We need to apply the laws of gravity for this". You do say "We need to apply Kerckhoff's principle for this". Someone can choose to ignore Kerckhoff's principle (at their peril). They can't choose to ignore the laws of gravity.

Since it hasn't been mentioned and this question has been around a while...
A computer scientist helped decipher an 18th century secret society's encrypted text. The text was very ornate, with symbols and glyphs. It stumped literary experts for centuries. The trick was to guess some of the letters and what they may stand for, and to guess the original language also, since German has different letter frequencies than English or Italian.
Here is the description of the cipher text and how it was unraveled.
http://phys.org/news/201110scientistmysteriouscopialecipher.html
http://stp.lingfil.uu.se/~bea/copiale/
http://www.wired.com/dangerroom/2012/11/ffthemanuscript/all/ (Very long, very interesting.)
With the Copiale Cipher, the codebreaking team began not even knowing the language of the encrypted document. But they had a hunch about the Roman and Greek characters distributed throughout the manuscript, so they isolated these from the abstract symbols and attacked it as the true code.
"It took quite a long time and resulted in complete failure," Knight says. After trying 80 languages, the cryptography team realized the Roman characters were "nulls," intended to mislead to reader. It was the abstract symbols that held the message.
The team then tested the hypothesis that abstract symbols with similar shapes represented the same letter, or groups of letters. Eventually, the first meaningful words of German emerged: "Ceremonies of Initiation," followed by "Secret Section."
License under CCBYSA with attribution
Content dated before 7/24/2021 11:53 AM
Rory McCune 8 years ago
If you're looking for responses related to Cryptanalysis you might want to specify a couple of additional pieces of information. Firstly, how many cipher texts would the attacker have access to, secondly how many ciphertexts would the attacker have access to and thirdly what type of data is encrypted by the ciphertext (e.g. is it binary data or text).