Disclaimer: I disclosed this security issue to GitHub, and they choose to not fix it (We have reviewed your report and determined that this functionality is working as expected). This is undocumented behavior, so I am describing it here. Also, note that I am not asking anyone to hack GitHub nor I am going to do this by myself. All I want is better security and fair playground – everyone should know how safe their files are there at the GitHub.com
Short version: This is security weakness for Private GitHub repositories. I am not talking about public repositories that are available to anyone without entering username or password. Also, source files stored in the Git are safe. However attachments from GitHub’s private issue tracker can be viewed without any authentication or authorization. Not all attachments are at risk. Word, Excel and PDF documents (.doc, .docx, .xls, .pdf) are kept private. Image files .jpg and .png are accessible to anyone – without providing any username or password – even if repository is not shared to anyone, even if you are paying GitHub for enterprise account (Developer, Team, Business). The attacker only need to know the URL of image… and that is disturbing. Also, once uploaded you can not remove the photo, image or blueprint. It stays there forever…
Long version: You are paying from $7 to $21 per user per month to host your Private or Business projects at GitHub (I have tested only $7 account, but my guess is that all of them works the same way). You have read the GitHub Security article (advertisement). You think that they do what they say they do (We know your code is extremely important to you and your business, and we’re very protective of it), but… as every big company does, they fail… and they fail to fix security issues even when they know about them. I don’t know why it happens, but I think that Bounty programs play big role here. They need to give away money to random people to fix their software defects. Sometimes they choose to do nothing, because of high volume of false positives (bug reports) they can not afford security experts to look at every issue submitted.
About security issue:
Create a new Issue for your any private repository.
Drag & drop .jpg or .png file into the newly created issue. GitHub immediately creates markup/markdown with embedded image. Notice the path – https://cloud.githubusercontent.com/assets/ (Example URL – https://cloud.githubusercontent.com/assets/8530649/23546842/1265c69c-000a-11e7-9b14-c8c55b243b83.jpg)
At this point your private images / photos / blueprints are available publicly – without any authentication or authorization. You don’t even need to post the issue. Your images are already uploaded! Even more alarming fact is, that you can not remove the images / drawings or your blueprints once they are uploaded. There is no delete button. And deleting issue’s comment or even deleting whole GitHub repository does not help. Images are still there.
When deleting GitHub’s private repository, there is no information that some files will be left in the web forever.
Even more. They say that “This will permanently delete the repository, wiki, issues, and comments, and remove all collaborator associations. Of course this is not the case. Your issue attachments are kept there forever.
I can speculate that coding / programming the image cleanup code is more expensive than HDD space nowadays. I have not tested for very long periods. Maybe there are some code that scans comments and removes unused (unlinked) images once a quarter or once a year, but I do not have evidence of a such thing. So the truth is – once uploaded to the GitHub your images are there forever or at least for weeks.
I have tested both .jpg and .png images. They are hosted at https://cloud.githubusercontent.com/ server. They are accessible to anyone. On the other hand – document files like .pdf, .docx, etc. are hosted at https://github.com/YOUR_ACCOUNT_NAME/your-repository/files/your-document.pdf which are kept safe.
And to be fair, here is the GitHub’s view on the issue. They say two things:
- The links have UUID / GUID in them (more than 120 bits of entropy), so attacker can not guess them.
- Do not share links from private issue tracker to the third parties.
Let me begin with the second point – Do not share links… If you are computer savvy or security oriented, then you may try to not share the link with the third parties. However, in the practice this is not an easy task. First, there are toolbars and browser addons/plugins that grabs all the link for various purposes like virus-checking (private files shared with 3rd parties becasue of the lack of proper authentication). Also not in so long time ago popular Google Toolbar and Alexa Toolbar used such information to index deep web. So your hidden URLs started to appear in the search index. There are countless examples of this happening:
- Tracking down how “hidden” URLs showed up on search engine
- Does the Google Toolbar Index URLs?
- How Did Google Find this Hidden File? (When you use Chrome, the typical setup is to have its “Under Hood” features enabled. Such as “Use a web-service to resolve navigation”, “Use a prediction service to complete searches”, and “Enable Phishing and malware protection”.All these services make a very liberal use of various services at Google.)
- How can I figure out how a search engine is finding hidden pages?
- The URLs don’t need to be posted online
There was some myths and word games with Google and webmaster community, whether Google uses URLs for indexing purposes or not, but one thing is certain – Google Toolbar was sending URLs to Google. So, if you visited your GitHub’s issues page with Google Toolbar enabled, then your URLs was shared with Google, and you can only imagine how the Search gigant used that information. There are some more excuse going on from Matt Cutts. But even one of the brightest minds from Google says – “Security through obscurity is not a great way to keep a url from being crawled”. GitHub, if you do not want to listen to me, may be listen to the Google? According to the Wikipedia “Security through obscurity” was rejected as method of security in 1851, but we are still doing it in 2017.
I see some hands rising – “but… but… GitHub have robots.txt file at the root of https://cloud.githubusercontent.com/
But again – so you want that your important files are protected by one text file that is only a recommendation not a law, or you want proper authentication where username and password is required to access your files? The robots.txt file prevents only robots that obey these rules. The text file does not protect from humans or hackers.
And there are countless other ways how can you share links with third parties without your consent. Antivirus / antispyware sends URLs for virus-scanning purposes where sometimes human interaction is required (third party sees your images), as mentioned earlier – the Toolbars, browser addons/extensions, you can send link to misspelled email address and you think that only you and your team have access to the GitHub repository, but sorry… now the person with misspllled email address also have your file.
The first argument about 120 bit entropy and UUID / GUID is also very disturbing. It is clear that person at the GitHub who review the submitted issues have no clue about UUIDs/GUIDs and entropy. I am not here to teach you about all the historical problems with uniqueness, guesability or security. Just read some information from the links provided.
- How securely unguessable are GUIDs? (Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access))
- GUIDs are designed to be unique, not random (The GUID generation algorithm was designed for uniqueness. It was not designed for randomness or for unpredictability)
- UUIDs generally do not meet security requirements (The RFC Section 6 is very clear about this: “Do not assume that UUIDs are hard to guess; they should not be used as security capabilities”)
- From Wikipedia – Universally unique identifier (Pseudorandom number generation often lacks necessary entropy, and RFC 4122 recommends that when a high-grade source of randomness is not available, that one of the other UUID versions be used instead. Some implementations of version 4 UUIDs do not satisfy this requirement)
Here are some UUID examples. All of them are from GitHub. Maybe not guessable on the spot, maybe enough entropy, but what if they use weak PRNG, or what if some other programmer replaces UUID generating function without any clue that they are improperly used for security? What if they use standard UUID generation alghoritm and are easly predictable?
I hope that GitHub will plug the hole… soon :)