
Activists and hackers at Anna’s Archive have reportedly combed nearly the entire music library of the largest streaming service, Spotify. They claim to have collected metadata for 256 million tracks and directly downloaded the audio files: 86 million songs, totaling approximately 300 TB.
Anna’s Archive is a metasearch engine for underground libraries, launched in 2022 by an anonymous activist named Anna, shortly after law enforcement attempted to shut down Z-Library. The project aggregates content from Z-Library, Sci-Hub, Library Genesis (LibGen), Internet Archive , and other sources. The activists describe their work as “preserving human knowledge and culture.”
Members of Anna’s Archive have announced the creation of the first “music preservation archive.” According to activists, they recently discovered a way to mass-scrape Spotify and decided to use this opportunity to archive content.
“A while ago, we discovered a way to extract data from Spotify at scale. We saw this as an opportunity to create a music archive focused primarily on content preservation,” the band wrote on their blog. “Sure, Spotify doesn’t have all the music in the world, but it’s a great start.”
According to activists, all existing music collections, both physical and digital, have serious shortcomings. These archives focus primarily on popular artists, strive for the highest audio quality (e.g., lossless FLAC) , which increases file size, and lack a centralized torrent directory. Anna’s Archive has decided to fill these gaps.
It’s worth noting that the Archive typically focuses on books and academic articles, as text contains the highest density of information. However, the group’s mission—the preservation of human knowledge and culture—does not distinguish between different types of media. “Sometimes the opportunity arises to preserve non-textual content. This is precisely one of those cases,” the activists note.
The resulting metadata dump is said to contain information on 99.9% of all tracks on the platform, or approximately 256 million compositions. This makes it the largest publicly available music metadata database in the world . By comparison, competitors have between 50 and 150 million records, while MusicBrainz has only 5 million unique ISRCs, compared to Anna’s Archive’s 186 million.
However, metadata alone wasn’t enough. Activists archived the audio files of 86 million tracks. While this represents only 37% of the total number of songs available on Spotify, these tracks account for 99.6% of all streams on the platform. In other words, there’s a 99.6% chance that any track a user listens to has ended up in the activists’ archive.
Spotify’s popularity metric, a numerical value from 0 to 100 calculated based on the number of plays and their relevance, was used to sort the songs. Songs with popularity above 0 were preserved in their original Ogg Vorbis quality at 160 kbps . Less popular songs were transcoded to Ogg Opus at 75 kbps. While the difference isn’t noticeable to most listeners, it helps save space.
The entire archive will be distributed via torrent in Anna’s Archive Containers (AAC), the proprietary file distribution standard. The release will be divided into several phases: all collected metadata has already been published, followed by the release of the songs themselves (sorted by popularity, from most popular to least popular), additional metadata, album art, and a patch to restore the original files.
The activists added as much metadata as possible to each file: track title, URL, ISRC code, UPC, album art, replay gain data, and other information. The original Spotify files contained no metadata, so the group embedded it into the Ogg files without re-encoding the audio.
Spotify representatives told the media that a data breach had indeed occurred. They emphasized that the company had already identified and blocked the accounts involved in the illegal scraping and had implemented new security measures to prevent similar attacks in the future.
“Spotify has identified and suspended unscrupulous accounts used for illegal scraping. We’ve implemented new security measures to combat such attacks and are actively monitoring suspicious activity. We’re committed to supporting artists in the fight against piracy from day one and actively work with industry partners to protect content creators and safeguard their rights,” said Spotify spokesperson Laura Batey.
For now, the archive focuses exclusively on content preservation and is only accessible via torrents. However, the group admits that, if there is enough interest, it may add the ability to download individual files directly from its website.
Follow us on Google News to receive daily updates on cybersecurity. Contact us if you would like to report news, insights or content for publication.
