if(($ACT == 'edit' || $ACT == 'preview') && $INFO['editable']){ ?> } else { ?> } ?>
Categorizing linux audio applications
As a first step to clean up the tag mess at http://apps.linuxaudio.org I started to visualize “the problem”. It turns out there are ~1800 tagged pages with ~200 unique tags1).
Most of the tags go back to headlines in the original data of linux-sound.org. All applicable headings (<h1>..<h5>
and <li><b>..
) were used as tags for a given link in a flat structure. However interrelations between the tags (headings) have been maintained by tagging the tags themselves. 57 of the 203 tags are not tagged (orphaned, loose), these are either top-level-tags or new additions that have not yet been categorized. An index of all tags can be generarated at apps.linuxaudio.org; a text-file is available with the sources below.
The most commonly used tags are:
154 software_sound_synthesis_and_music_composition_packages 145 other_documentation_and_newsworthy_items 137 all_things_jack 119 midi_software 115 linux_audio_tools 113 tools_to_make_tools 110 signal_analysis_processing_software 102 on-line_articles
Conclusion
I trashed the initial intention to reduce the number of tags to a manageable amount. On the contrary: some of the multi_word_tags shall be broken up, and the overall conclusion is: We just need a better user-interface to browse and assign tags.
dokubookmark already prototypes a check-box interface to manage tags. For the apps-wiki a tag-cloud or hierarchical display would be needed to ultimately improve usability.
More discoveries
Now made visible, I discovered a bug in the tag-inheritance parser which imported the data from linux-sound to the apps-wiki. There are additional false tags to content where a sub-level heading was not closed. Most prominently affected are SOAL and openAL.
I'm pondering on a re-import of the data but that would require magic to merge the updates. Magic is the cue, so I'm now writing a perl-script to manage, rename and clean up the tags: dokutagitor.pl
The data was generated from the wiki-content (dokuwiki text files) using this greppy shell-script and graphviz. The png/jpeg images turned out to be rather large (2400×2200 pixel ~2MB). I suggest you download look at the SVG or render images from source.
fdp tags.dot -Tpng > graph_all.png
Legend
The color of each tag gives a hint on how often it is used. This value is also printed in brackets after each tag-name. red color increases with more common usage. white indicates end-points and the shades of blue/green jump at usage-counts < 20 , < 40 , >= 40.
Arrows should go from parent topic-tag to child tag - but the current script does not filter out back-links so the direction is more or less random. it is not relevant for this analysis anyway.