Contents |
Properties Hack Week
DISCLAIMER: This spec is still updated on frequent basis!
Properties Hack Week is an initiative to fix-up all property names (and friends) in Beagle. Until now the authors of backends and filters added/used property names at will, without following any naming convention. This resulted in the API being very difficult to use (same types from different sources don't share property names, property names change very often).
The goal of this event is to create a specification for naming basic properties of the result elements, as well as adding extended properties for specific types of results (artists in media files, dimensions in image files, etc).
Since this event will surely break API compability, we may consider fixing/cleaning up other stuff as well. I would like to see the Beagle API become easy, fun and feasible to use.
I would still like to point out one of the early things which fascinated me about the Beagle backends and filters. All the code shipping in Beagle will be changed according to the specification below and future authors are encouraged to follow the specification. But the specification is not binding in general. Authors can use different names and add new ones at will if they are using their own search frontend. But note, that beagle-search and probably other Beagle frontends won't handle properties with names outside this specification.
However, it is cruical for any new backends/filters that you want to be included into the mainstream to follow this specification.
People
Put your name here if you want to join the initiative.
- Lukas Lipka
- dBera
- Arun Raghavan
- Kevin Kubasik
- Roman Telicak
- Lukas Zboron
To-Do
Active:
- Write the property naming spec (Lukas - in progress)
Depends on previous:
- Update individual backends (see below)
- Update individual filters (see below)
- Since several properties in the "beagle" namespace are modified, verify that the LuceneQueryingDriver, LuceneIndexingDriver and LuceneCommon and the code for other beagle-specific properties work as expected (needs extensive testing)
- Update property search mapping in query syntax (PropertyKeywordFu)
- Update beagle-search tiles
- Revisit search/TypeFilter.cs, it contains some code with respect to property types
- This is about a special "@" syntax in beagle-search. Currently, beagle-search has a hackish, IMO, way of specifying types, mimetypes etc. (See search/TypeFilter.cs) using i18n-ized strings. But if any users want to hadwrite a query to search only in images, he has to type "filetype:image" in his language. The TypeFilter code looks very hackish to me and since beagle-search has to to anyway map the i18n string "filetype:image" to the ASCII "filetype:image", why not go the full throttle and remove the requirement that the user has to know whether to query by mimetype, filetype, source or hittype (or any other property for that matter). How about using macros, i18n-ized, only in beagle-search which is expanded to the right string when sent to beagled. E.g. "@image" (i18n-ized) would be a macro for "filetype:image" (ASCII), "@java" would be a macro for "ext:java", "@pidgin" would be a macro for "source:pidgin" etc. So, searching for emails about property hack week would look like "property hack week @email" (with @email in my own sweet language). It would make the query syntax look easier and intuitive. To seach all images, just do "@image". It would be great to have the i18n mapping in beagled itself, but currently that does not happen. Note that this is not about any UI, the UI will have drop-down list or options to add the correct QueryPart based on user choice.
- Update beagle-search tiles
- Update web interface property name mapping
- Update xesam-adaptor ontology mapping
- Up-up-up the Lucene index version
- Backends "opening" results (Lukas)
- Food for thought - what if we indexed IMs by line, instead of the whole file-based view. This really isn't the best view, because in Pidgin for example it only reflects when the user closed the window.
- Backends can add their own specific properties that will be used only within them (for example properties that are key to be able to open the file), but these should not be sent out to the clients --- is this already possible? (And they should most likely be in their own namespace - backend:foo?)
Cheat sheet
The clean up will take place on the beagle-cleanup-branch.
$ svn checkout http://svn.gnome.org/svn/beagle/branches/beagle-cleanup-branch $ make && sudo make install
Hit types
We should consider renaming the current hit types to be nice and short.
Having a way to distinguish where data came from and their types is nice, but keep in mind that making it too complicated will make it hard for third party applications/developers to use.
| Hit type | Previous hit type name |
|---|---|
| beagle:file | File |
| beagle:im | IMLog |
| beagle:email | MailMessage |
| beagle:webpage | WebHistory, Bookmark |
| beagle:note | Note |
| beagle:task | Task |
| beagle:calendar | Calendar |
| beagle:contact | Contact |
| beagle:feed | FeedItem |
| beagle:documentation | MonodocEntry, DocbookEntry |
| | |
| | |
Property types
All properties are stored as strings in the Lucene index. The following list defines the type of property stored and their respective C# equivalent type you can call the Parse method on to get the first-class type. These do not imply you have to use this type and are here only for property types reference.
| Type | C# equivalent |
|---|---|
| beagle:string | System.String |
| beagle:boolean | System.Boolean |
| beagle:integer | System.Int32 |
| beagle:double | System.Double |
| beagle:datetime | System.DateTime |
| beagle:timespan | System.TimeSpan |
For example, to get the page count from a document file type, where dc:extent is stored as beagle:integer.
Beagle.Hit hit; int pages = System.Int32.Parse (hit ["dc:extent"]);
DCMI terms
| DCMI Term | Type | Description | Status |
|---|---|---|---|
| This is a basic set of properties each result element will *ALWAYS* contain. | |||
| dc:title | beagle:string | A name given to the resource. | ACCEPTED |
| dc:date | beagle:datetime | A point or period of time associated with an event in the lifecycle of the resource. | Duplication? |
| dc:identifier | beagle:string | An unambiguous reference to the resource within a given context. | Duplication? |
| These properties are optional but should always be defined if available. | |||
| dc:subject | beagle:string | The topic of the resource. | Use only dc:title? |
| dc:creator | beagle:string | An entity primarily responsible for making the resource. | ACCEPTED |
| dc:contributor | beagle:string | An entity responsible for making contributions to the resource (other than the author). | dc:creator? |
| dc:language | beagle:string | A language of the resource. | ACCEPTED |
| dc:rights | beagle:string | Information about rights held in and over the resource. | ACCEPTED |
| dc:extent | variable | The extent of the resource. | |
| dc:format | beagle:string | The file format of the resource (MIME). | Duplication? |
Global properties
| Beagle metadata | ||||
| Property | Type | Description | Multi-property | Status |
|---|---|---|---|---|
| beagle:type | beagle:string | The hit type. | No | ACCEPTED |
| beagle:application | beagle:string | The name of the application associated with the hit. | No | ACCEPTED |
| beagle:source | beagle:string | The name of the backend this hit came from. | No | ACCEPTED |
This is a proposal for the introduction of user-metadata properties. The main reason behind this is the possiblity of doing queries like: "show me all data with rating over 4" or "show me all data tagged with work". This is based on the assumption that there will be a way to rate/tag/etc. data globally in the future directly from the desktop.
| User metadata | ||||
| Property | Type | Description | Multi-property | Status |
|---|---|---|---|---|
| user:tag | beagle:string | One or more tags associated with the data object. | Yes | ACCEPTED |
| user:rating | beagle:integer | Rating of the data object on a scale of 1 - 5. | No | ACCEPTED |
Extended properties
A green title marks that the property names are completed, but does not mean the acception for all of them. The status field marks the acception of a property.
FIXME: Add possible values example for each property FIXME: Add the always required DCMI terms for each type. FIXME: Some of the enforced dublin core names are misleading, consider renaming them.
| Property | Type | Description | Multi-property | Status |
|---|---|---|---|---|
| beagle:file | ||||
| dc:title | beagle:string | Filename (overriden below). | No | ACCEPTED |
| dc:date | beagle:datetime | Date the file was last modified. | No | ACCEPTED |
| file:name | beagle:string | Filename. | No | ACCEPTED |
| file:type | beagle:string | The type of the file (see below). | No | ACCEPTED |
| file:size | beagle:integer | Length of the file in bytes. | No | ACCEPTED |
| file:extension | beagle:string | Extension of the filename (if any). | No | ACCEPTED |
| beagle:file, where file:type is document | ||||
| dc:title | beagle:string | Title of the document. | No | ACCEPTED |
| dc:subject | beagle:string | Summary of the document. | No | ACCEPTED |
| dc:creator | beagle:string | Author of the document. | No | ACCEPTED |
| dc:contributor | beagle:string | Contributors other than the author. | Yes | dc:creator? |
| dc:extent | beagle:integer | Number of pages. | No | ACCEPTED |
| document:words | beagle:integer | Word count in document. | No | ACCEPTED |
| document:characters | beagle:integer | Character count in document. | No | ACCEPTED |
| document:version | beagle:string | Iteration of the document. | No | ACCEPTED |
| beagle:file, where file:type is image | ||||
| dc:title | beagle:string | Title of the image file. | No | ACCEPTED |
| dc:subject | beagle:string | Description of the image file. | Yes | ACCEPTED |
| dc:creator | beagle:string | Author of the image file. | No | ACCEPTED |
| image:width | beagle:integer | The width of the image file in pixels. | No | ACCEPTED |
| image:height | beagle:integer | The height of the image file in pixels. | No | ACCEPTED |
| image:depth | beagle:integer | The color depth of the image file. | No | ACCEPTED |
| image:orientation | beagle:string | Orientation of the image (landscape, portrait). | No | ACCEPTED |
| image:colorspace | beagle:string | The colorspace used by the image. (rgb, cmyk, etc.) | No | ACCEPTED |
| image:location | beagle:string | The location where the image was taken. | No | ACCEPTED |
| fspot:indexed | beagle:boolean | Present in F-Spot photo manager. | No | |
| digikam:indexed | beagle:boolean | Present in Digikam. | No | |
| beagle:file, where file:type is video | ||||
| dc:title | beagle:string | Title of the video file. | No | ACCEPTED |
| dc:subject | beagle:string | Description of the video file. | No | ACCEPTED |
| dc:creator | beagle:string | Author of the video file. | ACCEPTED | |
| dc:extent | beagle:timespan | Duration of the video file. | No | ACCEPTED |
| video:width | beagle:integer | The width of the video file. | No | ACCEPTED |
| video:height | beagle:integer | The height of the video file. | No | ACCEPTED |
| video:depth | beagle:integer | The color depth of the video file. | No | ACCEPTED |
| video:codec | beagle:string | The encoding format of the video. | No | ACCEPTED |
| video:bitrate | Video content bitrate. | No | ACCEPTED | |
| video:aspect | beagle:string | The aspect ratio of the video (16:9, 14:3). | No | ACCEPTED |
| video:fps | beagle:integer | Frames per second. | No | ACCEPTED |
| video:year | beagle:integer | Year the video was published. | No | ACCEPTED |
| audio:bitrate | Audio content bitrate. | No | ||
| audio:codec | beagle:string | The encoding format of the audio. | No | |
| audio:channels | beagle:integer | Number of audio channels. | No | |
| beagle:file, where file:type is audio | ||||
| dc:title | beagle:string | Title of the track | No | audio:title? |
| dc:creator | beagle:string | The artist of the audio file. | audio:creator? | |
| dc:extent | beagle:timespan | The length of the track | No | ACCEPTED |
| audio:composer | beagle:string | The composer of the audio file. | dc:contributor? | |
| audio:performer | beagle:string | The performer in the audio file. | ||
| audio:album | beagle:string | The album the audio file belongs to. | No | ACCEPTED |
| audio:genre | Genre | ACCEPTED | ||
| audio:year | beagle:integer | Year when the track was recorded. | No | ACCEPTED |
| audio:channels | beagle:integer | Number of audio channels. | No | ACCEPTED |
| audio:bitrate | Bit rate sampling of the track. | No | ACCEPTED | |
| audio:codec | beagle:string | The encoding format of the audio. | No | ACCEPTED |
| audio:trackcount | beagle:integer | Total number of tracks. | No | ACCEPTED |
| audio:tracknumber | beagle:integer | Number of the current track. | No | ACCEPTED |
| audio:disccount | beagle:integer | Number of discs. | No | ACCEPTED |
| audio:discnumber | beagle:integer | Disc number of the current track. | No | ACCEPTED |
| beagle:file, where file:type is application | ||||
| dc:title | beagle:string | Application name. | No | ACCEPTED |
| dc:subject | beagle:string | Description of the application. | No | ACCEPTED |
| application:icon | beagle:string | Icon. | No | ACCEPTED |
| application:category | beagle:string | The categories the application belongs to. | Yes | ACCEPTED |
| application:type | beagle:string | The application type (application, capplet). | No | ACCEPTED |
| application:executable | beagle:string | The application executable file name. | No | ACCEPTED |
| application:keyword | beagle:string | Keywords. | Yes | ACCEPTED |
| beagle:file, where file:type is package | ||||
| dc:subject | beagle:string | Name of the packaged program. | No | dc:title? |
| package:description | beagle:string | Description of the package. | No | dc:subject? |
| package:architecture | beagle:string | Target architecture of the package (386, x64, etc). | Yes | ACCEPTED |
| package:version | beagle:string | Version of the package. | No | ACCEPTED |
| package:size | beagle:integer | Size of the extracted data in the package. | No | ACCEPTED |
| beagle:file, where file:type is archive | ||||
| Child properties? | ||||
| dc:extent | beagle:integer | Number of files in archive. | No | archive:filecount? |
| beagle:im | ||||
| dc:title | beagle:string | This is a problem --- IMs dont have a title. | ||
| Use buddy name for now. | No | |||
| dc:date | beagle:datetime | Date/Time when the IM was initiated. | No | ACCEPTED |
| dc:creator | beagle:string | Our identity (buddyname). | No | im:identity? |
| im:buddyname | beagle:string | The buddyname of the person we are speaking to. | No | Sucks |
| im:protocol | beagle:string | The protocol (AIM, ICQ, MSN, etc) | No | im:service? |
| beagle:email | ||||
| dc:title | beagle:string | Subject of the email message. | No | ACCEPTED |
| dc:date | beagle:datetime | Date sent/received. | No | ACCEPTED |
| email:type | beagle:string | Status of the email message (sent, received). | No | ACCEPTED |
| email:to | beagle:string | The reciepent of the email message. | Yes | |
| email:from | beagle:string | Author of the email message. | Yes | dc:creator? |
| email:cc | beagle:string | Yes | ACCEPTED | |
| email:bcc | beagle:string | Yes | ACCEPTED | |
| email:mailinglist | beagle:string | Mailing list. | No | ACCEPTED |
| email:replyto | beagle:string | No | ACCEPTED | |
| email:attachment | beagle:string | Attachment titles (if any). | Yes | |
| email:folder | beagle:string | The folder the email message is located in. | No | ACCEPTED |
| email:priority | beagle:string | Priority of the email message (low, medium, high). | No | ACCEPTED |
| email:id | beagle:string | Used to group messages together into conversations. | No | ACCEPTED |
| beagle:webpage | ||||
| dc:title | beagle:string | Title of the webpage. | No | ACCEPTED |
| dc:date | beagle:datetime | Date visited/bookmarked. | No | ACCEPTED |
| dc:identifier | beagle:string | URI of the webpage. | No | |
| webpage:type | beagle:boolean | Specifies the type of webpage (history, bookmark). | No | ACCEPTED |
| webpage:generator | beagle:string | Generator of the webpage. | No | |
| webpage:referrer | beagle:string | Referrer to the webpage. | No | ACCEPTED |
| beagle:note | ||||
| dc:title | beagle:string | Title of the note. | No | ACCEPTED |
| note:priority | beagle:string | Priority of the note (low, medium, high). | No | ACCEPTED |
| note:category | beagle:string | The categories this note belongs to. | Yes | |
| note:folder | beagle:string | The folder this note is filed under. | No | |
| beagle:task | ||||
| dc:title | beagle:string | Summary of the task. | No | ACCEPTED |
| dc:subject | beagle:string | Comment, description. | No | ACCEPTED |
| dc:date | beagle:datetime | Start of the task. | No | ACCEPTED |
| task:start | beagle:datetime | Date/Time of start. | No | Duplication? |
| task:end | beagle:datetime | Date/Time of end. | No | ACCEPTED |
| task:completed | beagle:datetime | Date/Time of completion. | No | ACCEPTED |
| task:priority | beagle:string | Priority of the task (low, medium, high). | No | ACCEPTED |
| task:participant | beagle:string | Participants of the task. | Yes | ACCEPTED |
| task:status | beagle:string | Status (not-started, in-progress, finished). | No | ACCEPTED |
| task:percentage | beagle:integer | Percent completed. | No | |
| task:category | beagle:string | The categories this task belongs to. | Yes | ACCEPTED |
| task:folder | beagle:string | The folder this task is filed under. | No | |
| beagle:calendar | ||||
| dc:title | beagle:string | Summary of the event. | No | ACCEPTED |
| dc:subject | beagle:string | Comment, description. | No | ACCEPTED |
| dc:date | beagle:datetime | Start of the event. | No | ACCEPTED |
| dc:extent | beagle:timespan | Duration of the event. | No | calendar:duration? |
| calednar:attendee | beagle:string | Attendees of the event. | Yes | ACCEPTED |
| calendar:location | beagle:string | Location of the event. | No | ACCEPTED |
| calendar:start | beagle:datetime | Date/Time of start. | No | Duplication? |
| calendar:end | beagle:datetime | Date/Time of end. | No | ACCEPTED |
| calendar:timezone | beagle:string | The timezone for this event. | No | |
| calendar:event | beagle:string | Type of the event (private, public, all-day). | No | |
| calendar:category | beagle:string | Categories this event belongs to. | Yes | ACCEPTED |
| calendar:folder | beagle:string | The folder this event is filed under. | No | ACCEPTED |
| beagle:contact | ||||
| dc:title | beagle:string | Display name. | No | contact:displayname? |
| dc:subject | beagle:string | Contact note. | No | |
| contact:fullname | beagle:string | Contact's full name. | No | |
| contact:title | beagle:string | Contact's title. | No | ACCEPTED |
| contact:nickname | beagle:string | Contact's nickname. | No | ACCEPTED |
| contact:email | beagle:string | Contact's email. | Yes | ACCEPTED |
| contact:im | beagle:string | IM address. | No | Protocol? |
| contact:pager | beagle:string | No | ACCEPTED | |
| contact:telex | beagle:string | No | ACCEPTED | |
| contact:tty | beagle:string | No | ACCEPTED | |
| contact:radio | beagle:string | No | ACCEPTED | |
| contact:proffesion | beagle:string | Proffession. | No | ACCEPTED |
| contact:cellphone | beagle:string | No | ACCEPTED | |
| contact:homeaddress | beagle:string | Contact's home address. | No | ACCEPTED |
| contact:homephone | beagle:string | No | ACCEPTED | |
| contact:homefax | beagle:string | No | ACCEPTED | |
| contact:workaddress | beagle:string | Contact's work address. | No | ACCEPTED |
| contact:workphone | beagle:string | No | ACCEPTED | |
| contact:workfax | beagle:string | No | ACCEPTED | |
| contact:company | beagle:string | Company name. | No | ACCEPTED |
| contact:department | beagle:string | Department name. | No | ACCEPTED |
| contact:assistant | beagle:string | Assistant's name. | No | ACCEPTED |
| contact:assistantphone | beagle:string | No | ACCEPTED | |
| contact:manager | beagle:string | Manager's name. | No | ACCEPTED |
| contact:managerphone | beagle:string | No | ACCEPTED | |
| contact:birthday | beagle:datetime | Birthday. | No | ACCEPTED |
| contact:spouse | beagle:string | No | ACCEPTED | |
| contact:webpage | beagle:string | Webpage URI. | No | ACCEPTED |
| contact:blog | beagle:string | Blog URI. | No | ACCEPTED |
| contact:calendar | beagle:string | Calendar URI. | No | ACCEPTED |
| contact:folder | beagle:string | The folder this contact is filed under. | No | ACCEPTED |
| contact:category | beagle:string | Categories this contact belongs to. | Yes | ACCEPTED |
| beagle:feed | ||||
| dc:title | beagle:string | Title of the feed item. | No | ACCEPTED |
| dc:creator | beagle:string | Author of the feed item. | No | ACCEPTED |
| dc:date | beagle:datetime | Date/Time the feed item was published. | No | ACCEPTED |
| feed:generator | beagle:string | Generator of the feed item. | No | ACCEPTED |
| feed:source | beagle:string | Source of the feed item. | No | ACCEPTED |
| beagle:documentation, where documentation:type is docbook | ||||
| Properties? | ||||
| beagle:documentation, where documentation:type is monodoc | ||||
| Properties? | ||||
Backends and Filters
Example update message (please follow for consistency):
*Filters/SampleFilter.cs--- DONE MISSING: * contact:telex INCOMPLETE: * fixme:obscurepropertyname * fixme:unneededproperty
This will allow us to track what the filter/backend status is after the event is over and what still needs to be fixed and updated. (INCOMPLETE marks a property for which a new name is not available, MISSING marks properties that are in this specification but are not provided)
The following filter files need altering for the new specification.
* Filters/FilterAbiword.cs * Filters/FilterArchive.cs * Filters/FilterAudio.cs * Filters/FilterBmp.cs * Filters/FilterBoo.cs * Filters/FilterC.cs * Filters/FilterChm.cs * Filters/FilterCpp.cs * Filters/FilterCSharp.cs * Filters/FilterDeb.cs * Filters/FilterDesktop.cs * Filters/FilterDocbook.cs * Filters/FilterDOC.cs * Filters/FilterEbuild.cs * Filters/FilterEmpathyLog.cs * Filters/FilterExternal.cs * Filters/FilterFortran.cs * Filters/FilterGif.cs * Filters/FilterHtml.cs * Filters/FilterIgnore.cs * Filters/FilterImage.cs * Filters/FilterJava.cs * Filters/FilterJpeg.cs * Filters/FilterJs.cs * Filters/FilterKAddressBook.cs * Filters/FilterKCal.cs * Filters/FilterKNotes.cs * Filters/FilterKonqHistory.cs * Filters/FilterKopeteLog.cs * Filters/FilterKOrganizer.cs * Filters/FilterLabyrinth.cs * Filters/FilterLisp.cs * Filters/FilterM3U.cs * Filters/FilterMail.cs * Filters/FilterMan.cs * Filters/FilterMatlab.cs * Filters/FilterMonodoc.cs * Filters/FilterMPlayerVideo.cs * Filters/FilterOle.cs * Filters/FilterOpenOffice.cs * Filters/FilterPackage.cs * Filters/FilterPascal.cs * Filters/FilterPdf.cs * Filters/FilterPerl.cs * Filters/FilterPhp.cs * Filters/FilterPidginLog.cs * Filters/FilterPls.cs * Filters/FilterPng.cs * Filters/FilterPPT.cs * Filters/FilterPython.cs * Filters/FilterRPM.cs * Filters/FilterRTF.cs * Filters/FilterRuby.cs * Filters/FilterScilab.cs * Filters/FilterScribus.cs * Filters/FilterShellscript.cs * Filters/FilterSource.cs * Filters/FilterSpreadsheet.cs * Filters/FilterSvg.cs * Filters/FilterTeX.cs * Filters/FilterTexi.cs * Filters/FilterText.cs * Filters/FilterTiff.cs * Filters/FilterTotem.cs * Filters/FilterVideo.cs * Filters/FilterXslt.cs
The following backends need altering for the new specification.
* beagled/AkregatorQueryable * beagled/BlamQueryable * beagled/EmpathyQueryable * beagled/EvolutionDataServerQueryable * beagled/EvolutionMailQueryable * beagled/FileSystemQueryable * beagled/IndexingServiceQueryable * beagled/KAddressBookQueryable * beagled/KMailQueryable * beagled/KNotesQueryable * beagled/KonqBookmarkQueryable * beagled/KonqHistoryQueryable * beagled/KonversationQueryable * beagled/KopeteQueryable * beagled/KOrganizerQueryable * beagled/LabyrinthQueryable * beagled/LifereaQueryable * beagled/NautilusMetadataQueryable * beagled/NetworkServicesQueryable * beagled/OperaQueryable * beagled/PidginQueryable * beagled/ThunderbirdQueryable * beagled/TomboyQueryable
References
- Dublin Core Metadata [1]
- DCMI Metadata Terms [2]
- DCMI Type Vocabulary [3]
- Beagle's filter properties [4]
- Xesam ontology draft [5]
- Tracker ontology [6]
- Spotlight metadata spec [7]
- Google metadata schema [8]
Approval
This document was signed-off by: <name> on <date>
