Feature request: statistics tracking
  • Would it be possible to implement stats tracking like this?

    https://docs.google.com/document/d/16n8YdBEYvvLNvbu2JMk43ZujpDLUZG8KvWn9dvUaekA/edit

    Thanks in advance.
  • The problem with this kind of tracking is performance.
    Also most statistics and metadata like playcounts, raitings, labels and tags can be directly written to files, so if you them you just need to update the collection.
  • I don't want to write stats to my files because there is more than one user of those files, so that's a non-solution.

    Can you explain what the performance problem is, in more detail? Thanks in advance.
  • Do the other users use Guayadeque too? Playcounts, ratings and labels will only be visible to Guayadeque users.

    Things like creating a unique id for each file will make the scanning process very slow. Even if it takes just a second per song, for collections of tens of thousands of songs will take several hours to complete.
  • Yes, other users use Guayadeque too, they have their own independent stats. As I said, writing per-user information in files is a non-solution. Perhaps you would consider writing this information in user.XXXX.YYYY extended attributes instead?

    ---------------------------------------

    Creating a unique ID for each file should take less than a second with the algorithm I proposed. It would be content-based such that you don't have to worry about tags changing. It would *also* be optional.

    Finally, generating unique file IDs should be done *in the background*, *after* scanning the library, rather than at the same time of the scan. Precisely so if the process is somehow "slow", it does not affect the use of Guayadeque -- it can always be completed incrementally and eventually.
  • I remember a similar suggestion was made in the old thread in Ubuntu forums. I supported the idea, but anonbeat said it would make the program slow.

    Anyway, it could be investigated, the problem is always time and priorities. I suggest you write an idea in the IdeaTorrent so that others can vote: http://sourceforge.net/apps/ideatorrent/guayadeque/ideatorrent/
  • anonbeat is right if the implementation is poor. if the implementation is solid, it would make the program no slower than it is today.
  • I posted an idea on ideatorrent.
  • It's now accepted.
  • I still have my doubts about how this could impact the performance.
    Suppose that you rename a directory. Guayadeque will find a new directory for which it has not information. So if it wants to check whether it was in the library before to restore the statistics it will have to generate the UIDs for each of the files and if it finds the files, update the entries. All this would happen while updating the database and not "in the background".

    And what if you have a file copied several times. Each copy will have its own statistics. Now suppose that you move 2 of them. How would it know which is which?
  • Yes. You are correct. In order to reassign statistics rows to the newly discovered files, guayadeque would have to compute the fingerprints for each of the files (as per the spec I wrote). To get this right, guayadeque would do what it already does today (namely, scan the new directory) and *then* generate the UID in a background thread. As new UIDs are generated for files that were already scanned but not fingerprinted yet, guayadeque can then start reassigning statistics based on this.

    Example

    /d/music/AlbumX/{Track1,Track2,Track3} gets renamed to /d/music/AlbumZ/{Track1,Track2,Track3}

    Then guayadeque is told to update its collection

    guayadeque updates its track table. this marks the tracks within AlbumX as deleted, and adds rows for /d/music/AlbumZ/{T1..T3}.

    So far, guayadeque has done nothing new here.

    Then when the scan is done (or perhaps in parallel as new elements are found), guayadeque generates UIDs for those files that are missing in the UID table.

    INTERESTINGLY, once you are done generating the UID for AlbumZ/Track1, you can immediately query your UID table, asking "do you have this UID here?". If there are rows with that UID, you immediately know all the old paths (if you did this right, you will only have one "obsolete" path), so you can safely update all tables that have said paths everywhere, with the new path of AlbumZ/Track1.

    And so on, and so forth.

    If you have two copies of the same file, guayadeque will add them both to the collection as it does today, then, upon scanning for missing UIDs, the generation of the second file's UID will be accompanied with the discovery process above ("hey, this file happens to be at that path already"), which will cause the stats of the second file to be redirected to the first file. Care must be taken to *merge* stats in the case that there are already stats for both files.

    For caution's sake, I would make this feature opt-in, and mutually exclusive with "write statistics to tracks".
  • Remember that UID generation, at least in the spec I wrote, is really fast. It really only consists of a bunch of audio frames (which you can read with FADVISE to prevent readahead) that won't actually change when the tags themselves change.

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Sign In with Google Sign In with OpenID

In this Discussion