Yet Another Defense of FRBR in a Linked Data World

There is periodically some discussion on the nets, including recently on the RDA listserv, where people attack the FRBR Work-Expression-Manifestation-Item model, from what they believe is a forwards-looking linked-data-oriented perspective.

I still think the WEMI model (or ‘ontology’ — the model of the ‘things’ to be dealt with in the system or data ecology) is in fact crucial for linked data applications, rather than problematic. (My earlier post on the FRBR model as set theory may be useful to reacquaint yourself with the model, or read how I think it’s most productive to think of it).

Linked data applications rely on taking data from multiple sources, and being able to tell when it’s about the same ‘thing’.

But what is a ‘thing’, in the ‘bibliographic universe’? 

Different editions/versions of a work may matter.

For instance, If there are multiple editions/versions of a book, they may have different pagination.  If you care about what’s on “page 123″ of a book, it’s crucial to know what version/edition is being talked about. In fact, the FRBR “manifestation” entity is exactly the set of things where (eg) word number 4 on page 123 will be identical. Different versions/editions may contain different revised text — for a ”close reading” criticism of a text, it may matter exactly what revision of the text it was based on, but not exactly what the pagination was — this set of all printings that have share the exact same text is exactly what an ‘expression’ is. Other data may be about the work as a whole, regardless of edition/version/revision, or we may not know exactly what edition/version/revision is about. We may have a user review of “Hamlet” that doesn’t indicate exactly what edition of Hamlet they are reviewing, it is appropriate to link such a thing to the ‘work’ as a whole, for lack of more precise information if nothing else.

In a linked data world without the WEMI ontology, we have a mishmash of data and no way to know what ‘thing’ the data is really about. If there’s a citation to page 123 of Hamlet, but it’s just attached to an identifier for “Hamlet” as a whole (the “work”), we have no way of knowing or even of later establishing what edition of Hamlet that can be found on page 123 of.  If it’s instead attached to an identifier for ‘manifestation’, that identifier can later be linked to the edition in Amazon or a local library catalog or online fulltext etc, even if it wasn’t initially, establishing what edition all things talking about that particular manifestation identifier are talking about, and allowing a user to track down that page 123 citation.

What matters, users mental models, or meeting user needs?

It doesn’t in fact matter whether WEMI matches users mental models.  The bibliographic universe is complex in an abstract way, and users are not used to thinking about it analytically. Different users may have different mental models, and users mental models may not be internally consistent or complete or practical for basing actual answers to user questions on.

WE, and our community and tradition, are the experts at thinking analytically and logically about the bibliographic universe.  That WEMI is the formalization of a library tradition of modelling the bibliographic universe is not a flaw, it’s too it’s credit.

Interfaces do not need to present things using the WEMI language or even the structure, and maybe shouldn’t — but it’s that structure that lets us answer user questions and meet user needs ‘under the hood’, and that common language that lets us talk about things ourselves in setting up common systems, using ‘terms of art’ understood by us.

Now, what would matter is if the WEMI ontology is useless or counter-productive for actually serving user needs and answering user questions. But, while it’s surely not perfect, it is clear to me that it is in fact useful and productive and better than anything else we have or are likely to come up with anytime soon for serving user needs and answering user questions.  The linked data world absolutely demands a common shared ontology making sense of the ‘bibliographic universe’, not just a mishmash of mental models where it’s unclear what the ‘things’ being talked about are, and the FRBR WEMI is the best we’ve got, and pretty darn good.

From ontology to formal vocabularies

Now, it may be that the particular vocabulary formalization of FRBR represented by RDA or the RDA formal vocabularies are not right, or have serious problems, or are counter-productive. I am not experienced enough in using those vocabularies or in the actual on the ground techniques of linked data to have an opinion one way or another, but it may be.   But that would be a criticism of the particular way FRBR was fleshed out and further formalized in RDA and the corresponding vocabularies — not of the WEMI ontology.

Forget the FRBR user tasks if you like

FRBR presents itself as being based on ‘user tasks’ and ‘requirements’. It’s even there in the name, ‘functional requirements’.

It made sense to try to base everything we’re doing on user tasks and ‘functional requirements’, it was a noble effort. But the bibliographic universe and our users contexts is enormously complex, that part may well have failed.

But while it may be ideal to approach things that way (starting from “user tasks”), we can’t always pull it off, and it doesn’t mean we throw up our hands and do nothing, or spend decades trying to get ‘user tasks’ right. We go forward anyway.

I don’t write to defend the FRBR user tasks or ‘functional requirements’, I am agnostic on it.

But the FRBR WEMI ontology is useful anyway. Perhaps the idea that it’s based on user tasks is a fiction that should be dismissed. We can accept that the FRBR WEMI model really comes out of a rigorous analysis and formalization of traditional cataloging mental models of the bibliographic universe as expressed in our metadata.  And that’s just fine, for reasons I try to justify above, it is useful as that, and useful in a linked data universe going forwards as that.

So, if you like, give up on the actual user tasks or functional requirements from FRBR — heck, everyone else has, who actually pays them any attention anyway?

The most useful part of FRBR is actually the WEMI ontology, and it’s use value is in fact not dependent on the user tasks or ‘functional requirements’ — even if the FRBR report itself would have you believe differently.

Or,  maybe the user tasks and ‘functional requirements’ are actually useful. I dunno, I’m agnostic. I’m just asserting that the value of the WEMI ontology is clear either way.

Really

It is quite clear to me that the WEMI ontology is not only useful but crucial for a useful linked data environment, and especially for one that preserves the hard-earned and useful semantics in our present data (which DO make a distinction between ‘manifestation’ and ‘work’, although we’ve generally not analyzed ‘expression’, no big deal).  And it continues to dismay and frustrate me that people are so negative towards the FRBR WEMI ontology, thinking they are being linked-data forward-looking.

Posted in General | 2 Comments

Forcing a one column page in Blacklight 3.2

I have a Blacklight 3.2 app which uses the default CSS (which is now based on compass with susy grids for layout), and a rails layout that’s a barely customized version of the stock layout.

This layout is normally a two-column display, with a sidebar and main area.

But on some pages, I don’t want a sidebar. I want the main area to take up the whole width.  if there’s any content in the ‘sidebar’ area, it should just be beneath the main area in normal page flow, not beside it. (Sometimes I might hide the sidebar entirely, other times let it flow beneath).

I don’t want to have to give different pages different layouts. The DOM should be the same after all, it’s just a question of forcing the sidebar below instead of side-by-side.

I had a way to do this with previous yui-grids based Blacklight layout. Here’s a way to do it with the compass susy based layout too. Thanks to James Stuart for the hints. All errors mine — I confess I don’t entirely understand what’s going on here (I understand about 85% of it), and haven’t thoroughly tested this yet, but it makes sense and appears to work. There’s probably a nicer way to do this.

First, in my layout, I have it add a class to the <body> if such a class (or space seperated class literals) are in the @body_class ivar.

  <body class="<%= @body_class %>">

So far, that won’t do anything though, the CSS is still the same. So now we want to add CSS overrides such that when .one_column wrapper, override the CSS to force both the main and sidebar areas to be full width. We’re going to do that with plain old CSS overrides (although using scss/compass/susy to generate the CSS) — there’s probably a cleverer way integrate into BL’s existing scss context to make things more clean and re-use variables already declared and such, but this is good enough.

We’re going to add a new .css.scss file in our app, and require it in our ‘application.css’ manifest after the Blacklight styles are included, so it can override them CSS cascade style.

*= require force_one_column

./app/assets/force_one_column.css.scss

/* give us access to the compass susy mixins like
prefix/columns/omega. possibly also set up BL's default column widths
and such? */
@import "blacklight/grids/susy_framework"; 

.one_column {
   #bd  {
      #main {
        @include columns(24,24);
        @include omega(24);
      }
      #sidebar {
        @include columns(24,24);
        @include omega(24);
      }
   }
}

That’s it, it works.

Note that we needed to know standard BL susy layout is 24 columns. If someone changes that, our stuff will break and have to be fixed. There’s probably a way to make this more robust by incorporating this in the single existing BL scss (with access to it’s variables) in your local ./app/assets/blacklight_themes/standard.scss.css (or similar), but this is good enough for me for now. Susy docs say one ought to be able to use `@include full(24)` instead of `@include columns(24,24); @include omega(24);` but for some reason that didn’t work for me, I dunno.
Posted in General | 2 Comments

Alan Lomax archives to be digitized and made open access

From the New York Times. Folklorist’s Global Jukebox Goes Digital.

 Just as he dreamed, his vast archive — some 5,000 hours of sound recordings, 400,000 feet of film, 3,000 videotapes, 5,000 photographs and piles of manuscripts, much of it tucked away in forgotten or inaccessible corners — is being digitized so that the collection can be accessed online. About 17,000 music tracks will be available for free streaming by the end of February, and later some of that music may be for sale as CDs or digital downloads.

It’s not entirely clear to me if every single piece of content will be free, but the article seems to say a huge chunk will. But even before digitization,  the Association for Cultural Equity, which apparently is custodian of this material, had an admirable policy aimed at getting the material out there for use in cultural/creative projects, without letting cost be a barrier:

“We go from the attitude that we just want everyone to use it, whatever their budget is,†Mr. Fleming said. “If it’s educational or for the press, it’s usually no charge, and when someone has a budget, well, then we just want to get roughly what other people are getting.â€

How many ostensibly not-for-profit library and archival special collections have similar policies?  

As the digitization rush in mass produced publications changes the role of libraries, our unique rare/special materials are what may still distinguish libraries. Getting them out in use without letting cost be a barrier will not only fulfill our missions (books are for use), but remind the public that libraries (not Google, not Amazon) really are just about the only institutions whose primary interests are in serving our users, not in making a buck off them.  It’s from that standpoint that libraries can expect the popular support needed to make our funding sustainable as the environment changes around us. On the other hand, miseducating patrons about copyright in order to try to maintain/maximize income streams  is counter-productive to our missions (in at least a couple different ways), and will teach the public that we’re no more on their side than any of the commercial information institutions and that there’s no reason to support us over them.

I have a personal interest in folk music, and am very excited to see/hear the digitized and released archives. Much respect to the custodians of these archives for prioritizing the public interest as per their mission.

(The article is somewhat vague on exactly what’s going on organizationally and who’s doing it, so I’m not entirely sure who deserves the credit. Obviously written for an audience more interested in Lomax’s music, rather than who’s doing it and how like us library geeks are interested in. The Library of Congress’s American Folklife Center may also deserve some of the credit?  We’ll also have to wait to see if the entire digitization output will be open access or the open access component will be as large as the article suggests, I sadly wouldn’t be surprised if the article has over-stated that aspect, but wait hopefully.)

Another part also sounds like an awesome show of responsibility to the folk communities that generated the content, cultural repatriation instead of appropriation, very unusual even amongst non-profit cultural heritage institutions:

The Association for Cultural Equity also has what it calls a repatriation program, meant to make Lomax’s work available to the communities where it was obtained and to pay royalties to the heirs of those whose music was recorded.

Posted in General | Leave a comment

An inside scoop on harvard library reorg

Dailykos published a useful short essay by a former harvard librarian, reflecting on the Harvard reorg/layoff news. 

I see a couple interesting points here.

Harvard has a famously byzantine library system comprising over forty libraries, and administratively divided into two separate library systems (confusingly called the Harvard University Library or HUL, and the Harvard College Library, or HCL) has changed very little in terms of organizational structure since the late 19th century.

Harvard is not alone here. In fact, I’d suggest that the oldest academic libraries, and ironically especially the old ones that really excelled 80+ years ago, are most likely to have completely dysfunctional organizational structures and organizational behavior today.

Libraries today aren’t the same as libraries 80+ years ago, especially with regard to electronic content we purchase, which has different workflows to manage and different economies to purchase; and in terms of metadata maintenance as well, something which the blog author rightly points out libraries realized the benefits of cooperating/coordinating/sharing many years ago — but sharing cards (or data to print cards) through LC is a different beast than than modern metadata control needs.

I also generally agree with the blogger’s conclusion — but with less optimism:

But second, the importance of catalogers, and more broadly speaking, librarians is not necessarily diminishing into nothingness.  The environment has changed radically, and there are sure to be plenty of future “massacre-like” events that will painfully remind us of these changes.  But librarians do have a future, and I think it may even be a bright one: they just need to accept that it won’t be quite the same as the past.

I fully agree that there is still as much of a need for the tasks librarians have always done as ever — most definitely and even especially including cataloging/metadata control.

However, despite agreeing with that, I am actually not optimistic, like that blogger is.  We are running out of time to demonstrate that our profession, community, and industry is capable of meeting the metadata control needs of the 21st century.  We are not doing a good job of it. We do not seem to be capable of changing our priorities, expertise, organizational structures, and inter-organizational collaborative infrastructures, to deal with it.

The traditional goals of libraries have traditionally are still useful and needed just as much as ever, but with different ways of accomplishing them. There is still a great need for an organization specializing in information management on behalf of a user community, and without trying to make a profit off that user community.  But I am, sadly, no longer particularly optimistic that libraries as they are are actually capable of accomplishing those goals.  However, even in the best of cases, trying will result in some painful organization reorgs — nobody likes change. (It’s of course also possible for painful reorgs to end up entirely useless or even counter-productive, or simply admissions of defeat as libraries slowly die).

Hint: If you or your organization thinks if we can just put all our metadata into RDF as quickly as possible and therefore be “doing linked data”, that this is necessary and sufficient to handle modern metadata control needs — you have not only missed the boat, you are on the wrong boat.   I have lately been seeing a worrying increase of people suggesting “oh, we just need linked data to solve that problem”, with “linked data” meaning “the data we’ve already got expressed in RDF”, with a worrying ignorance/disregard for what good data actually entails in the 21st century systems environment.

Posted in General | 4 Comments

Popular press on ebooks on libraries

I posted a few days ago about my worries that publisher unwillingness to allow library ebook lending (made possible by the fact that publishers have more legal right to block such activities than with print) imperils the future of public lending libraries. 

I worry that there isn’t enough patron education on this issue. Patrons need to know that it’s publishers standing in the way, not library traditionalism or incompetence (well, there might be some of that too).

I was heartened to see a popular press article highlighting publisher resistance to library ebook lending, and the barriers publishers put in place. I hope this starts getting more coverage (and wish the ALA’s advocacy and popular education wings would work on it; what’s the ALA for, anyway?)

As demand for e-books soars, libraries struggle to stock their virtual shelves. By Christian Davenport, Published: January 14 . Washington Post.

And in a very related topic, an Amazon press release (blogged and analyzed here) suggests that making a title available through the Kindle lending program increases ebook sales for that title compared to if it were not available for lending, as well as resulting in royalties from Amazon’s library program.  Perhaps the publishers don’t need to be scared of library lending?

Of course, Amazon (as well as perhaps the publishers), would like to see an Amazon-controlled platform take control with no intermediation by pesky non-profit public libraries.  And with a new per-use royalty payment model. (which the US First Sale Doctrine makes unenforceable for print, but not ebooks).  Remember, not-for-profit libraries (public, academic, and sometimes special use)  are pretty much the only institutions in the publishing chain/universe, whose only interest is the benefit of their patrons/customers, rather than squeezing as much profit as possible out of their customers.

Posted in General | Leave a comment

web app security

I just discovered the Rails Security Guide, which is actually a pretty darn good intro text to web application security issues and attack vectors, whether you work in rails or not.

(In fact, there are some places which don’t contain useful Rails content I’d expect it to, it seems to mostly be a general text!  The chapter on HTML Injection oddly doesn’t mention Rails 3.x auto-escaping in ERB <%= %>, and #html_escape, etc. )

But anyhow, recommended if you want to brush up on your knowledge of categories of attacks and types of defenses for web apps.

Posted in General | Leave a comment

mobile and control

The nytimes allows you to read only 20 articles a month by direct browsing (unlimited for articles you follow from links from another site, presumably implemented by checking the http referer).

It is quite easy for a user to get around this. For instance, with a simple javascript bookmarklet that deletes or modifies the cookies the site uses to track how many views you’ve had.

Being a programmer, I started thinking of ways the nytimes site could try defend against this. Right now it has no defenses at all. I thought of possible defenses that would at least require much more complicated javascript, and possibly ways that would require an actual browser plugin (say, to modify an spoof the Referer header) to defeat.

The nytimes doesn’t seem too interested in this game of cat and mouse, their defenses haven’t changed much since it was deployed. Perhaps because it’s enough that many/most users are ‘honest’, and because they realize that unless they change their usage policies to be much more locked down (perhaps requiring a login to view any articles at all, if not ending the lack of limits from external referrers), it will always be possible to defeat.

Mobile lockdown

But then I also thought of iOS.  I haven’t checked to see if the bookmarklet approach to delete cookies works on iOS. It is possible to install clickable js bookmarklets on iOS (iPhone or iPad) Mobile Safari, although it’s a pain, requiring either manual copy-paste/editing of JS code, or syncing with your desktop safari bookmarks.  It’s accessible to many fewer users than saving a bookmarklet on a desktop device.

It is not possible to spoof Referer headers on an iPad or an iPhone.

Unlike a desktop computer where you can install any software you want (including, say,  an open source browser with a feature to spoof referers, or a browser that takes plugins, and a plugin that does this), on iOS you can only install software approved by Apple.

[Unless you 'jailbreak' your device, which at least temporarily for the moment is not actually illegal (Apple would surely like it to be, does it surprise you that it quite likely would be without the special exemption from the Librarian of Congress?), but is something that typical users won't want to do, for various reasons.]

The App Store rules prohibit any alternate non-Safari/built-in-Webkit browsers.  Apple sometimes approves alternate browsers — only when they use the built in Webkit.  I am confident that part of the approval process for any app that involves a browser component is ensuring that it doesn’t let users do ‘untoward’ things like spoof referer headers or other parts of an http request.

The reasons for these restrictions are not just (or even mostly) about a consistent UI experience. They are about making sure websites that want to have DRM-like restrictions like the nytimes (or netflix, or hulu), have those restrictions be airtight on the iOS.  (These restrictions implemented by a website may or may not technically be ‘DRM’, under the DMCA etc. The actual implementation by the website probably is not, although it serves the same ends as DRM. The restrictions on the device itself or it’s built in software to try to keep you from an end-run around the website’s implementation probably would count as DRM, thus the LoC’s specific exemption for jailbreaking your phone).

The nytimes may or may not have yet implemented paywall protection that is impossible to get around on an iPhone.  But hulu already has. You can’t watch hulu on an iPhone for free, although you can on a desktop.  If the iPhone were a platform that gave users control, it would be easy to install a browser or browser plugin that hid from the hulu servers the fact that it was an iPhone.

But instead, the iPhone is a platform where users can only install software that Apple approves, and Apple’s policies and approval processes are in part designed to protect and enforce content provide restrictions.  Note that not all content owner technical restrictions simply enforce the law — DRM keeps users from doing things that would be legal, for fair use or other reasons, too. 

A future where most people have a mobile device as their main or only web browsing computer seems quite plausible.  If the iOS ‘closed-shop’ platform model becomes prevalent (as also seems quite plausible, as it’s been quite succesful — and I wouldn’t be shocked to see larger form factor non-mobile OSs adopt this model too, perhaps the Apple desktop app store is an exploratory shot) — This could be the end of the era where computer owners have the freedom to install whatever they want on their computers, and the beginning of an era where computer owners can only install what the platform vendors say they can install.  And their permission to install will be subject to their own business models and interests, and the business models and interests of their business partners.  This is not a welcome course.

(Note the paucity of open source software on the iOS app store — can anyone find me any examples?  I don’t think the app store rules actually prohibit open source, but the nature of the ecosystem discourages it or makes it less attractive to developers for several reasons. It’s pretty difficult to hack on a fork of an open source project for iOS, let alone distribute your mods to others.)

Posted in General | 1 Comment