In a recent incident, a Microsoft AI research team unintentionally exposed 38TB of personal data while trying to provide open-source code and AI models for image recognition to other researchers. Cybersecurity firm Wiz discovered a link contained within the files, which turned out to be backups of Microsoft employees’ computers. These backups included sensitive information such as passwords to Microsoft services, secret keys, and over 30,000 internal Teams messages from hundreds of employees at the tech giant.
According to Microsoft’s own report on the incident, no customer data was exposed, and no other internal services were put at risk. However, the presence of the link in the files was intentional, as the researchers wanted interested individuals to be able to download pretrained models. Microsoft’s researchers utilized an Azure feature called “SAS tokens,” which allow users to generate shareable links giving others access to data in their Azure Storage account. Users have control over what information can be accessed through SAS links, whether it’s a single file, a full container, or the entire storage. Unfortunately, in this case, the researchers shared a link that granted access to the entire storage account.
Wiz promptly discovered and reported the security issue to Microsoft on June 22, and by June 23, the company had revoked the SAS token. Microsoft acknowledged that its system had mistakenly labeled the particular link as a “false positive” when rescanning its public repositories. The company has since rectified this issue and updated its system to detect overly permissive SAS tokens in the future. While the specific link identified by Wiz has been fixed, any misconfigured SAS tokens could potentially result in data leaks and significant privacy concerns. Microsoft emphasizes the importance of creating and handling SAS tokens appropriately and has even published a list of best practices for their usage.
The inadvertent exposure of personal data by Microsoft’s AI research team highlights the potential risks associated with open-source code and the handling of sensitive information. It serves as a reminder that even well-intentioned actions can have unintended consequences if proper security measures are not in place. This incident also emphasizes the critical role of cybersecurity firms like Wiz in identifying and reporting vulnerabilities to protect user data.
Microsoft has assured its customers that their data remains secure and that immediate steps were taken to address the issue. However, incidents like this serve as a wake-up call for organizations to review their security protocols and ensure that everyone involved in handling sensitive data follows best practices. It is crucial for businesses to have comprehensive cybersecurity measures in place, including regular security assessments, employee training, and effective incident response plans.
This incident should not discourage the sharing of open-source code or the advancement of AI research. Open-source initiatives play a vital role in fostering collaboration and innovation. However, it is essential for organizations to be mindful of the potential risks involved and implement robust security measures to protect both their own data and that of their users.
In conclusion, the accidental exposure of 38TB of personal data by Microsoft’s AI research team serves as a cautionary tale for organizations handling sensitive information. It underscores the importance of proper security protocols and vigilant oversight when sharing open-source code and utilizing cloud storage. By learning from incidents like this, organizations can strengthen their cybersecurity practices and ensure the protection of both their data and that of their customers.