Shazam, audio search algorithm, spectrogram, fingerprint, audio signal, song recognition, multiple audio source recognition, STFT, anchor frequency, point frequency
The Shazam algorithm recognizes exact tracks by generating an audio signal spectrogram, identifying relative peaks, and tracing these to obtain a simplified version.
[...] Each address of the recording is used to search in the database of fingerprints for associated pairs ["absolute time of the anchor in the song", "song ID"]. In terms of time complexity, if the fingerprint database is in memory, the cost of the search is proportional to the number of addresses sent to Shazam (1500 in our case). This search returns a large number of pairs, let's say that for the rest of the article, it returns M pairs. 1.4. Audio Fingerprint Comparison Although M is huge, it is much smaller than the number of notes (time-frequency points) in all the songs. [...]
[...] These tools have transformed the way we interact with music. Today, we can identify songs simply by letting our phones listen for a few seconds. The growth of these applications is not just about their practical aspect; it's also about creating links between people and the songs that matter to them. This technology is now part of everyday life, whether in a café, at a party or while remembering a tune from the past. Among these technologies, Shazam stands out as one of the most influential tools in music recognition. [...]
[...] For example, suppose our search has returned : 150 song pairs for song with 0 target zone in common with the recording. 20 song pairs for song with 0 target zone in common with the recording. 80 song pairs for song with 0 target zone in common with the recording. 110 song pairs for song with 0 target zone in common with the recording. 120 song pairs for song with 30 target zones in common. 350 song pairs for song 14, with 100 target zones in common. [...]
[...] Real-time Processing and Latency Constraints Implementing the Shazam algorithm in real-time poses several challenges. The system must quickly process the incoming audio, transform it into a digital fingerprint, and search for a match in a large database: all within just a few seconds. This rapid processing is crucial to provide users with an immediate song identification. Managing background noise and distortions present in live audio captures represents a first major challenge as seen in section 2.1. Shazam addresses this by focusing on the important frequencies, creating a unique and stable audio fingerprint that remains robust despite the surrounding noise. [...]
[...] (2005). A review of audio fingerprinting. Journal of VLSI signal processing systems for signal, image and video technology 271-284. 4. Cano, P., Batlle, E., Mayer, H., & Neuschmied, H. (2002). Robust sound modeling for song detection in broadcast audio. Proc. AES 112th Int. Conv, 1-7 5. Van Balen, J. [...]
APA Style reference
For your bibliographyOnline reading
with our online readerContent validated
by our reading committee