Distributed Hash Table, Music Recognition, Shazam, Data Serialization, Search Efficiency, Audio Fingerprint, Hash Function, Data Layout Optimization, Memory Management
This document discusses the operation of a distributed hash table for a music recognition system, specifically the Shazam application, and its benefits in improving search efficiency and reducing search time.
[...] These information allow instant access to any set of values, ensuring fast search even for large databases. Global Advantages Data serialization improves both performance and scalability of music recognition systems. By eliminating fragmentation issues and optimizing access, it provides a solid foundation for efficiently managing millions, or even billions, of audio fingerprints. To maximize search efficiency, a serialized data layout is used in the hash table. This technique involves organizing data in a sequential manner in memory, allowing for much faster read operations. [...]
[...] Principles of operation of a hash table A hash table is a data structure that maps keys to values using a hash function. In the context of audio fingerprints, the key represents a parameter extracted from the audio signal (for example, a combination of frequency and energy), while the stored value can contain metadata of the corresponding piece (such as a unique identifier) and additional information for similarity calculation Wang, & Sun, 2014. Each audio fingerprint is represented by a unique key, allowing for efficient search with a time complexity less than unity. And this in a well-designed hash table. [...]
[...] - Hard Disk Compatibility The structure is adopted to be stored on a disk. This is all the more important when the database memory size exceeds the system's storage capacity. Sophisticated and/or genetic algorithms allow for a significant reduction in data access time. - Memory Fragmentation Reduction By grouping data associated with the same keys in contiguous locations, this method minimizes fragmentation, ensuring optimal use of resources. [...]
[...] - Distribution of hash tables The ability to recover pieces with certainty with high performance is a major problem. Of course, modern systems adopt a distributed approach, an approach in which several nodes contain the distribution of the hash table and this in a computer environment with good calculation capabilities. The scientific article Wang, & Sun, 2014) assert that Two distribution methods coexist, the first is key-oriented and the second is content-oriented (value). In the context of this work, the focus will be on the second, which, according to the authors, has several advantages that will be detailed below. [...]
[...] - Simplified Scaling Adding a new node to the cluster simply requires redistributing some of the existing shards to this node and generating a new hash table for these shards. This modularity greatly facilitates the scaling of the system Wang, & Sun, 2014). - Optimization for Frequent Workloads Systems can take advantage of the Pareto principle by identifying the most frequently accessed shards and storing them on specialized nodes or replicating these data on multiple nodes. This significantly reduces the response time for the most common queries Wang, & Sun, 2014). - Data Layout Optimization To maximize search efficiency, a serialized data layout is used in the hash table. [...]
APA Style reference
For your bibliographyOnline reading
with our online readerContent validated
by our reading committee