Contents

Properties Hack Week

DISCLAIMER: This spec is still updated on frequent basis!

Properties Hack Week is an initiative to fix-up all property names (and friends) in Beagle. Until now the authors of backends and filters added/used property names at will, without following any naming convention. This resulted in the API being very difficult to use (same types from different sources don't share property names, property names change very often).

The goal of this event is to create a specification for naming basic properties of the result elements, as well as adding extended properties for specific types of results (artists in media files, dimensions in image files, etc).

Since this event will surely break API compability, we may consider fixing/cleaning up other stuff as well. I would like to see the Beagle API become easy, fun and feasible to use.

I would still like to point out one of the early things which fascinated me about the Beagle backends and filters. All the code shipping in Beagle will be changed according to the specification below and future authors are encouraged to follow the specification. But the specification is not binding in general. Authors can use different names and add new ones at will if they are using their own search frontend. But note, that beagle-search and probably other Beagle frontends won't handle properties with names outside this specification.

However, it is cruical for any new backends/filters that you want to be included into the mainstream to follow this specification.

People

Put your name here if you want to join the initiative.

  • Lukas Lipka
  • dBera
  • Arun Raghavan
  • Kevin Kubasik
  • Roman Telicak
  • Lukas Zboron

To-Do

Active:

  • Write the property naming spec (Lukas - in progress)

Depends on previous:

  • Update individual backends (see below)
  • Update individual filters (see below)
  • Since several properties in the "beagle" namespace are modified, verify that the LuceneQueryingDriver, LuceneIndexingDriver and LuceneCommon and the code for other beagle-specific properties work as expected (needs extensive testing)
  • Update property search mapping in query syntax (PropertyKeywordFu)
  • Update beagle-search tiles
  • Revisit search/TypeFilter.cs, it contains some code with respect to property types
    • This is about a special "@" syntax in beagle-search. Currently, beagle-search has a hackish, IMO, way of specifying types, mimetypes etc. (See search/TypeFilter.cs) using i18n-ized strings. But if any users want to hadwrite a query to search only in images, he has to type "filetype:image" in his language. The TypeFilter code looks very hackish to me and since beagle-search has to to anyway map the i18n string "filetype:image" to the ASCII "filetype:image", why not go the full throttle and remove the requirement that the user has to know whether to query by mimetype, filetype, source or hittype (or any other property for that matter). How about using macros, i18n-ized, only in beagle-search which is expanded to the right string when sent to beagled. E.g. "@image" (i18n-ized) would be a macro for "filetype:image" (ASCII), "@java" would be a macro for "ext:java", "@pidgin" would be a macro for "source:pidgin" etc. So, searching for emails about property hack week would look like "property hack week @email" (with @email in my own sweet language). It would make the query syntax look easier and intuitive. To seach all images, just do "@image". It would be great to have the i18n mapping in beagled itself, but currently that does not happen. Note that this is not about any UI, the UI will have drop-down list or options to add the correct QueryPart based on user choice.
  • Update beagle-search tiles
  • Update web interface property name mapping
  • Update xesam-adaptor ontology mapping
  • Up-up-up the Lucene index version
  • Backends "opening" results (Lukas)
  • Food for thought - what if we indexed IMs by line, instead of the whole file-based view. This really isn't the best view, because in Pidgin for example it only reflects when the user closed the window.
  • Backends can add their own specific properties that will be used only within them (for example properties that are key to be able to open the file), but these should not be sent out to the clients --- is this already possible? (And they should most likely be in their own namespace - backend:foo?)

Cheat sheet

The clean up will take place on the beagle-cleanup-branch.

$ svn checkout http://svn.gnome.org/svn/beagle/branches/beagle-cleanup-branch
$ make && sudo make install

Hit types

We should consider renaming the current hit types to be nice and short.

Having a way to distinguish where data came from and their types is nice, but keep in mind that making it too complicated will make it hard for third party applications/developers to use.

Hit type Previous hit type name
beagle:file File
beagle:im IMLog
beagle:email MailMessage
beagle:webpage WebHistory, Bookmark
beagle:note Note
beagle:task Task
beagle:calendar Calendar
beagle:contact Contact
beagle:feed FeedItem
beagle:documentation MonodocEntry, DocbookEntry
PUNTED MonodocEntry (rename --- this is horrid)
PUNTED DocbookEntry (rename --- this is horrid)

Property types

All properties are stored as strings in the Lucene index. The following list defines the type of property stored and their respective C# equivalent type you can call the Parse method on to get the first-class type. These do not imply you have to use this type and are here only for property types reference.

Type C# equivalent
beagle:string System.String
beagle:boolean System.Boolean
beagle:integer System.Int32
beagle:double System.Double
beagle:datetime System.DateTime
beagle:timespan System.TimeSpan

For example, to get the page count from a document file type, where dc:extent is stored as beagle:integer.

Beagle.Hit hit;
int pages = System.Int32.Parse (hit ["dc:extent"]);

DCMI terms

DCMI Term Type Description Status
This is a basic set of properties each result element will *ALWAYS* contain.
dc:title beagle:string A name given to the resource. ACCEPTED
dc:date beagle:datetime A point or period of time associated with an event in the lifecycle of the resource. Duplication?
dc:identifier beagle:string An unambiguous reference to the resource within a given context. Duplication?
These properties are optional but should always be defined if available.
dc:subject beagle:string The topic of the resource. Use only dc:title?
dc:creator beagle:string An entity primarily responsible for making the resource. ACCEPTED
dc:contributor beagle:string An entity responsible for making contributions to the resource (other than the author). dc:creator?
dc:language beagle:string A language of the resource. ACCEPTED
dc:rights beagle:string Information about rights held in and over the resource. ACCEPTED
dc:extent variable The extent of the resource.
dc:format beagle:string The file format of the resource (MIME). Duplication?

Global properties

Beagle metadata
Property Type Description Multi-property Status
beagle:type beagle:string The hit type. No ACCEPTED
beagle:application beagle:string The name of the application associated with the hit. No ACCEPTED
beagle:source beagle:string The name of the backend this hit came from. No ACCEPTED

This is a proposal for the introduction of user-metadata properties. The main reason behind this is the possiblity of doing queries like: "show me all data with rating over 4" or "show me all data tagged with work". This is based on the assumption that there will be a way to rate/tag/etc. data globally in the future directly from the desktop.

User metadata
Property Type Description Multi-property Status
user:tag beagle:string One or more tags associated with the data object. Yes ACCEPTED
user:rating beagle:integer Rating of the data object on a scale of 1 - 5. No ACCEPTED

Extended properties

A green title marks that the property names are completed, but does not mean the acception for all of them. The status field marks the acception of a property.

FIXME: Add possible values example for each property
FIXME: Add the always required DCMI terms for each type.
FIXME: Some of the enforced dublin core names are misleading, consider renaming them.
Property Type Description Multi-property Status
beagle:file
dc:title beagle:string Filename (overriden below). No ACCEPTED
dc:date beagle:datetime Date the file was last modified. No ACCEPTED
file:name beagle:string Filename. No ACCEPTED
file:type beagle:string The type of the file (see below). No ACCEPTED
file:size beagle:integer Length of the file in bytes. No ACCEPTED
file:extension beagle:string Extension of the filename (if any). No ACCEPTED
beagle:file, where file:type is document
dc:title beagle:string Title of the document. No ACCEPTED
dc:subject beagle:string Summary of the document. No ACCEPTED
dc:creator beagle:string Author of the document. No ACCEPTED
dc:contributor beagle:string Contributors other than the author. Yes dc:creator?
dc:extent beagle:integer Number of pages. No ACCEPTED
document:words beagle:integer Word count in document. No ACCEPTED
document:characters beagle:integer Character count in document. No ACCEPTED
document:version beagle:string Iteration of the document. No ACCEPTED
beagle:file, where file:type is image
dc:title beagle:string Title of the image file. No ACCEPTED
dc:subject beagle:string Description of the image file. Yes ACCEPTED
dc:creator beagle:string Author of the image file. No ACCEPTED
image:width beagle:integer The width of the image file in pixels. No ACCEPTED
image:height beagle:integer The height of the image file in pixels. No ACCEPTED
image:depth beagle:integer The color depth of the image file. No ACCEPTED
image:orientation beagle:string Orientation of the image (landscape, portrait). No ACCEPTED
image:colorspace beagle:string The colorspace used by the image. (rgb, cmyk, etc.) No ACCEPTED
image:location beagle:string The location where the image was taken. No ACCEPTED
fspot:indexed beagle:boolean Present in F-Spot photo manager. No
digikam:indexed beagle:boolean Present in Digikam. No
beagle:file, where file:type is video
dc:title beagle:string Title of the video file. No ACCEPTED
dc:subject beagle:string Description of the video file. No ACCEPTED
dc:creator beagle:string Author of the video file. ACCEPTED
dc:extent beagle:timespan Duration of the video file. No ACCEPTED
video:width beagle:integer The width of the video file. No ACCEPTED
video:height beagle:integer The height of the video file. No ACCEPTED
video:depth beagle:integer The color depth of the video file. No ACCEPTED
video:codec beagle:string The encoding format of the video. No ACCEPTED
video:bitrate Video content bitrate. No ACCEPTED
video:aspect beagle:string The aspect ratio of the video (16:9, 14:3). No ACCEPTED
video:fps beagle:integer Frames per second. No ACCEPTED
video:year beagle:integer Year the video was published. No ACCEPTED
audio:bitrate Audio content bitrate. No
audio:codec beagle:string The encoding format of the audio. No
audio:channels beagle:integer Number of audio channels. No
beagle:file, where file:type is audio
dc:title beagle:string Title of the track No audio:title?
dc:creator beagle:string The artist of the audio file. audio:creator?
dc:extent beagle:timespan The length of the track No ACCEPTED
audio:composer beagle:string The composer of the audio file. dc:contributor?
audio:performer beagle:string The performer in the audio file.
audio:album beagle:string The album the audio file belongs to. No ACCEPTED
audio:genre Genre ACCEPTED
audio:year beagle:integer Year when the track was recorded. No ACCEPTED
audio:channels beagle:integer Number of audio channels. No ACCEPTED
audio:bitrate Bit rate sampling of the track. No ACCEPTED
audio:codec beagle:string The encoding format of the audio. No ACCEPTED
audio:trackcount beagle:integer Total number of tracks. No ACCEPTED
audio:tracknumber beagle:integer Number of the current track. No ACCEPTED
audio:disccount beagle:integer Number of discs. No ACCEPTED
audio:discnumber beagle:integer Disc number of the current track. No ACCEPTED
beagle:file, where file:type is application
dc:title beagle:string Application name. No ACCEPTED
dc:subject beagle:string Description of the application. No ACCEPTED
application:icon beagle:string Icon. No ACCEPTED
application:category beagle:string The categories the application belongs to. Yes ACCEPTED
application:type beagle:string The application type (application, capplet). No ACCEPTED
application:executable beagle:string The application executable file name. No ACCEPTED
application:keyword beagle:string Keywords. Yes ACCEPTED
beagle:file, where file:type is package
dc:subject beagle:string Name of the packaged program. No dc:title?
package:description beagle:string Description of the package. No dc:subject?
package:architecture beagle:string Target architecture of the package (386, x64, etc). Yes ACCEPTED
package:version beagle:string Version of the package. No ACCEPTED
package:size beagle:integer Size of the extracted data in the package. No ACCEPTED
beagle:file, where file:type is archive
Child properties?
dc:extent beagle:integer Number of files in archive. No archive:filecount?
beagle:im
dc:title beagle:string This is a problem --- IMs dont have a title.
Use buddy name for now. No
dc:date beagle:datetime Date/Time when the IM was initiated. No ACCEPTED
dc:creator beagle:string Our identity (buddyname). No im:identity?
im:buddyname beagle:string The buddyname of the person we are speaking to. No Sucks
im:protocol beagle:string The protocol (AIM, ICQ, MSN, etc) No im:service?
beagle:email
dc:title beagle:string Subject of the email message. No ACCEPTED
dc:date beagle:datetime Date sent/received. No ACCEPTED
email:type beagle:string Status of the email message (sent, received). No ACCEPTED
email:to beagle:string The reciepent of the email message. Yes
email:from beagle:string Author of the email message. Yes dc:creator?
email:cc beagle:string Yes ACCEPTED
email:bcc beagle:string Yes ACCEPTED
email:mailinglist beagle:string Mailing list. No ACCEPTED
email:replyto beagle:string No ACCEPTED
email:attachment beagle:string Attachment titles (if any). Yes
email:folder beagle:string The folder the email message is located in. No ACCEPTED
email:priority beagle:string Priority of the email message (low, medium, high). No ACCEPTED
email:id beagle:string Used to group messages together into conversations. No ACCEPTED
beagle:webpage
dc:title beagle:string Title of the webpage. No ACCEPTED
dc:date beagle:datetime Date visited/bookmarked. No ACCEPTED
dc:identifier beagle:string URI of the webpage. No
webpage:type beagle:boolean Specifies the type of webpage (history, bookmark). No ACCEPTED
webpage:generator beagle:string Generator of the webpage. No
webpage:referrer beagle:string Referrer to the webpage. No ACCEPTED
beagle:note
dc:title beagle:string Title of the note. No ACCEPTED
note:priority beagle:string Priority of the note (low, medium, high). No ACCEPTED
note:category beagle:string The categories this note belongs to. Yes
note:folder beagle:string The folder this note is filed under. No
beagle:task
dc:title beagle:string Summary of the task. No ACCEPTED
dc:subject beagle:string Comment, description. No ACCEPTED
dc:date beagle:datetime Start of the task. No ACCEPTED
task:start beagle:datetime Date/Time of start. No Duplication?
task:end beagle:datetime Date/Time of end. No ACCEPTED
task:completed beagle:datetime Date/Time of completion. No ACCEPTED
task:priority beagle:string Priority of the task (low, medium, high). No ACCEPTED
task:participant beagle:string Participants of the task. Yes ACCEPTED
task:status beagle:string Status (not-started, in-progress, finished). No ACCEPTED
task:percentage beagle:integer Percent completed. No
task:category beagle:string The categories this task belongs to. Yes ACCEPTED
task:folder beagle:string The folder this task is filed under. No
beagle:calendar
dc:title beagle:string Summary of the event. No ACCEPTED
dc:subject beagle:string Comment, description. No ACCEPTED
dc:date beagle:datetime Start of the event. No ACCEPTED
dc:extent beagle:timespan Duration of the event. No calendar:duration?
calednar:attendee beagle:string Attendees of the event. Yes ACCEPTED
calendar:location beagle:string Location of the event. No ACCEPTED
calendar:start beagle:datetime Date/Time of start. No Duplication?
calendar:end beagle:datetime Date/Time of end. No ACCEPTED
calendar:timezone beagle:string The timezone for this event. No
calendar:event beagle:string Type of the event (private, public, all-day). No
calendar:category beagle:string Categories this event belongs to. Yes ACCEPTED
calendar:folder beagle:string The folder this event is filed under. No ACCEPTED
beagle:contact
dc:title beagle:string Display name. No contact:displayname?
dc:subject beagle:string Contact note. No
contact:fullname beagle:string Contact's full name. No
contact:title beagle:string Contact's title. No ACCEPTED
contact:nickname beagle:string Contact's nickname. No ACCEPTED
contact:email beagle:string Contact's email. Yes ACCEPTED
contact:im beagle:string IM address. No Protocol?
contact:pager beagle:string No ACCEPTED
contact:telex beagle:string No ACCEPTED
contact:tty beagle:string No ACCEPTED
contact:radio beagle:string No ACCEPTED
contact:proffesion beagle:string Proffession. No ACCEPTED
contact:cellphone beagle:string No ACCEPTED
contact:homeaddress beagle:string Contact's home address. No ACCEPTED
contact:homephone beagle:string No ACCEPTED
contact:homefax beagle:string No ACCEPTED
contact:workaddress beagle:string Contact's work address. No ACCEPTED
contact:workphone beagle:string No ACCEPTED
contact:workfax beagle:string No ACCEPTED
contact:company beagle:string Company name. No ACCEPTED
contact:department beagle:string Department name. No ACCEPTED
contact:assistant beagle:string Assistant's name. No ACCEPTED
contact:assistantphone beagle:string No ACCEPTED
contact:manager beagle:string Manager's name. No ACCEPTED
contact:managerphone beagle:string No ACCEPTED
contact:birthday beagle:datetime Birthday. No ACCEPTED
contact:spouse beagle:string No ACCEPTED
contact:webpage beagle:string Webpage URI. No ACCEPTED
contact:blog beagle:string Blog URI. No ACCEPTED
contact:calendar beagle:string Calendar URI. No ACCEPTED
contact:folder beagle:string The folder this contact is filed under. No ACCEPTED
contact:category beagle:string Categories this contact belongs to. Yes ACCEPTED
beagle:feed
dc:title beagle:string Title of the feed item. No ACCEPTED
dc:creator beagle:string Author of the feed item. No ACCEPTED
dc:date beagle:datetime Date/Time the feed item was published. No ACCEPTED
feed:generator beagle:string Generator of the feed item. No ACCEPTED
feed:source beagle:string Source of the feed item. No ACCEPTED
beagle:documentation, where documentation:type is docbook
Properties?
beagle:documentation, where documentation:type is monodoc
Properties?

Backends and Filters

Example update message (please follow for consistency):

* Filters/SampleFilter.cs --- DONE
  MISSING:
   * contact:telex

  INCOMPLETE:
    * fixme:obscurepropertyname
    * fixme:unneededproperty

This will allow us to track what the filter/backend status is after the event is over and what still needs to be fixed and updated. (INCOMPLETE marks a property for which a new name is not available, MISSING marks properties that are in this specification but are not provided)

The following filter files need altering for the new specification.

* Filters/FilterAbiword.cs
* Filters/FilterArchive.cs
* Filters/FilterAudio.cs
* Filters/FilterBmp.cs
* Filters/FilterBoo.cs
* Filters/FilterC.cs
* Filters/FilterChm.cs
* Filters/FilterCpp.cs
* Filters/FilterCSharp.cs
* Filters/FilterDeb.cs
* Filters/FilterDesktop.cs
* Filters/FilterDocbook.cs
* Filters/FilterDOC.cs
* Filters/FilterEbuild.cs
* Filters/FilterEmpathyLog.cs
* Filters/FilterExternal.cs
* Filters/FilterFortran.cs
* Filters/FilterGif.cs
* Filters/FilterHtml.cs
* Filters/FilterIgnore.cs
* Filters/FilterImage.cs
* Filters/FilterJava.cs
* Filters/FilterJpeg.cs
* Filters/FilterJs.cs
* Filters/FilterKAddressBook.cs
* Filters/FilterKCal.cs
* Filters/FilterKNotes.cs
* Filters/FilterKonqHistory.cs
* Filters/FilterKopeteLog.cs
* Filters/FilterKOrganizer.cs
* Filters/FilterLabyrinth.cs
* Filters/FilterLisp.cs
* Filters/FilterM3U.cs
* Filters/FilterMail.cs
* Filters/FilterMan.cs
* Filters/FilterMatlab.cs
* Filters/FilterMonodoc.cs
* Filters/FilterMPlayerVideo.cs
* Filters/FilterOle.cs
* Filters/FilterOpenOffice.cs
* Filters/FilterPackage.cs
* Filters/FilterPascal.cs
* Filters/FilterPdf.cs
* Filters/FilterPerl.cs
* Filters/FilterPhp.cs
* Filters/FilterPidginLog.cs
* Filters/FilterPls.cs
* Filters/FilterPng.cs
* Filters/FilterPPT.cs
* Filters/FilterPython.cs
* Filters/FilterRPM.cs
* Filters/FilterRTF.cs
* Filters/FilterRuby.cs
* Filters/FilterScilab.cs
* Filters/FilterScribus.cs
* Filters/FilterShellscript.cs
* Filters/FilterSource.cs
* Filters/FilterSpreadsheet.cs
* Filters/FilterSvg.cs
* Filters/FilterTeX.cs
* Filters/FilterTexi.cs
* Filters/FilterText.cs
* Filters/FilterTiff.cs
* Filters/FilterTotem.cs
* Filters/FilterVideo.cs
* Filters/FilterXslt.cs

The following backends need altering for the new specification.

* beagled/AkregatorQueryable
* beagled/BlamQueryable
* beagled/EmpathyQueryable
* beagled/EvolutionDataServerQueryable
* beagled/EvolutionMailQueryable
* beagled/FileSystemQueryable
* beagled/IndexingServiceQueryable
* beagled/KAddressBookQueryable
* beagled/KMailQueryable
* beagled/KNotesQueryable
* beagled/KonqBookmarkQueryable
* beagled/KonqHistoryQueryable
* beagled/KonversationQueryable
* beagled/KopeteQueryable
* beagled/KOrganizerQueryable
* beagled/LabyrinthQueryable
* beagled/LifereaQueryable
* beagled/NautilusMetadataQueryable
* beagled/NetworkServicesQueryable
* beagled/OperaQueryable
* beagled/PidginQueryable
* beagled/ThunderbirdQueryable
* beagled/TomboyQueryable

References

  • Dublin Core Metadata [1]
  • DCMI Metadata Terms [2]
  • DCMI Type Vocabulary [3]
  • Beagle's filter properties [4]
  • Xesam ontology draft [5]
  • Tracker ontology [6]
  • Spotlight metadata spec [7]
  • Google metadata schema [8]

Approval

This document was signed-off by: <name> on <date>


This page was last modified 15:21, 21 January 2008. This page has been accessed 11,751 times.

  
MediaWiki

Copyright © 2004-2007