linux sound tags

Categorizing linux audio applications

As a first step to clean up the tag mess at http://apps.linuxaudio.org I started to visualize “the problem”. It turns out there are ~1800 tagged pages with ~200 unique tags1).

Most of the tags go back to headlines in the original data of linux-sound.org. All applicable headings (<h1>..<h5> and <li><b>..) were used as tags for a given link in a flat structure. However interrelations between the tags (headings) have been maintained by tagging the tags themselves. 57 of the 203 tags are not tagged (orphaned, loose), these are either top-level-tags or new additions that have not yet been categorized. An index of all tags can be generarated at apps.linuxaudio.org; a text-file is available with the sources below.

The most commonly used tags are:

  154 software_sound_synthesis_and_music_composition_packages
  145 other_documentation_and_newsworthy_items
  137 all_things_jack
  119 midi_software
  115 linux_audio_tools
  113 tools_to_make_tools
  110 signal_analysis_processing_software
  102 on-line_articles

Conclusion

I trashed the initial intention to reduce the number of tags to a manageable amount. On the contrary: some of the multi_word_tags shall be broken up, and the overall conclusion is: We just need a better user-interface to browse and assign tags.

dokubookmark already prototypes a check-box interface to manage tags. For the apps-wiki a tag-cloud or hierarchical display would be needed to ultimately improve usability.

More discoveries

Now made visible, I discovered a bug in the tag-inheritance parser which imported the data from linux-sound to the apps-wiki. There are additional false tags to content where a sub-level heading was not closed. Most prominently affected are SOAL and openAL.

I'm pondering on a re-import of the data but that would require magic to merge the updates. Magic is the cue, so I'm now writing a perl-script to manage, rename and clean up the tags: dokutagitor.pl

Source image of all tags (1.8MB)

The data was generated from the wiki-content (dokuwiki text files) using this greppy shell-script and graphviz. The png/jpeg images turned out to be rather large (2400×2200 pixel ~2MB). I suggest you download look at the SVG or render images from source.

Legend

The color of each tag gives a hint on how often it is used. This value is also printed in brackets after each tag-name. red color increases with more common usage. white indicates end-points and the shades of blue/green jump at usage-counts < 20 , < 40 , >= 40.

Arrows should go from parent topic-tag to child tag - but the current script does not filter out back-links so the direction is more or less random. it is not relevant for this analysis anyway.

1) numbers as of august 9 2008
 
blog/linux_sound_tags.txt · Last modified: 11.08.2008 21:37 (external edit)