Terabytes of Internal Private Data Accidentally Leaked by Microsoft AI Research Team
An accidental leak by AI researchers at Microsoft exposed 38TB of sensitive company information dating back to July 2020. The accident took place while publishing open-source AI training data onto a public GitHub repository.
The leak, which was discovered by cloud security company, Wiz, contained private keys, passwords, and over 30,000 internal Microsoft Teams messages. Wiz’s ongoing research into accidental exposure of cloud-hosted data revealed the leak source to be a Microsoft AI division-managed GitHub repository ‘’robust-models-transfer.’’
Although the readers of the repository are only meant to download the open-source code and AI models for image recognition from an Azure Storage URL. Wiz researchers found that the URL was mistakenly configured to grant access to the entire storage account.
A reader could not only access large terabytes of company information, but ‘’[..] the token was also misconfigured to allow “full control” permissions instead of read-only. Meaning, not only could an attacker view all the files in the storage account, but they could delete and overwrite existing files as well,’’ Wiz researchers revealed.
Microsoft & Wiz investigation revealed that the storage account wasn’t directly exposed, rather the misconfigured URL included ‘’an overly-permissive Shared Access Signature (SAS) token.’’
These tokens provide access to Azure Storage data and can be customized by the user to grant either read-only or full control permissions. A user can also create never-expiring access tokens.
According to Wiz, the SAS tokens can prove to be a major challenge to an organization’s system security. ‘’Due to a lack of monitoring and governance, SAS tokens pose a security risk, [..]. These tokens are very hard to track, as Microsoft does not provide a centralized way to manage them within the Azure portal.’’
On receiving Wiz’s report, Microsoft immediately launched an investigation and invalidated, as well as replaced the SAS token on GitHub.
Furthermore, Microsoft’s investigation revealed that the leak did not contain any customer information and none of the other internal services were affected by the incident.