So i decided to create my own database for chapters, but at the moment i am struggling how to identify audio files correctly.
With ffmpeg extension chromaprint it is possible to create fingerprints of audiofiles:
ffmpeg -i "input.mp3" -f chromaprint fingerprint.txt
This works nicely, but the longer the audio file, the bigger the fingerprint (which is reasonable). Since i would like to store the fingerprint in a database, the smaller it would be, the better.But for a 25h+ audio book the process takes extremely long and produce a ±5MB (!) fingerprint file.
I can think of three ways to solve this problem:
- Only take the first X Minutes of the audio file (fast, relatively small fingerprint storage, but inaccurate)
- Hash the full fingerprint with e.g. sha512 (small fingerprint storage, but slow and is this accurate?)
- Hash the X Minutes fingerprint (fastest, small fingerprint storage but most inaccurate)
Which would be the best way?
Are there other ways, i did not think of?Thank you