Tuesday 25 November 2008

Bloggers take German national library to task

From: FT November 25 2008

For someone writing under the name Robert Basic, it seemed too good to be true.
“My parents are never going to believe I’m going to be catalogued by the German national library,” the blogger wrote about the library’s plans to collect things German on the web to add to its century-old collection of the nation’s books.

But such expressions of delight were drowned out by outraged disbelief as websites reported that the Nationalbibliothek, based in Frankfurt and Leipzig, could force every private website owner and amateur blogger to submit material – and fine the noncompliant up to €10,000 ($13,000, £8,500).

Blogs have since been alive with jokes about German thoroughness, and calls to resist.
“Every home page owner should shunt them a pdf [file] with a copy of their website in highest quality, preferably all on the same day,” one blogger wrote on heise.de, a popular site among techies. “Then [the library’s] server would burst.”

Another blogger, writing under the pseudonym “night watchman”, published a screed on his homepage. The hassle of submitting pages and the threat of fines would kill the German-speaking internet as a forum of free speech, he thundered. His site was a “personal archive” that was of no value to a public institution.

The internet is often praised for its “viral” qualities, which set it apart from the methods of traditional mass media. But in this case, word-of-mouth authenticity morphed into unreliable Chinese whispers, as many of the things criticised about the library’s plans turned out to be incorrect.

The library had indeed received a government mandate in 2006 to collect web publications and to fine the unco-operative – as a last resort.

On October 22, Berlin released more details: the library should choose what it collected – based on its as-yet modest capacity and what it deemed to be of public interest. But the webbies’ frenzy had touched on an important and unresolved issue.

Faced with the deluge of online information and limited budgets for gathering and archiving, what could and should a public archive preserve for the nation – and when should easily tapped home pages be considered private rather than public?

With the internet already in its second decade and host to reams of material for which paper was too expensive or too cumbersome, it is startling to realise that the German national library and its worldwide peers are only just beginning to grapple with the problems of systematically archiving the web.

While the US Library of Congress started looking at “web capture” in 2000, and founded an international group to do the same in 2003, its internet archive remains selective. It boasts 17 thematic collections – but its archive about web coverage of September 11 2001, say, gives no idea how news sites’ top story developed that day.

The Germans started making plans in earnest two years ago to save web publications for posterity. Ute Schwens, director of the Nationalbibliothek in Frankfurt, said the collection was still a work in progress, one that was taking shape in consultation with national libraries in France, the Netherlands, the UK and North America.

“At the moment, we’re only collecting e-books and online dissertations but we’re going to be moving into the areas of blogs and websites fairly soon,” she said. “It’s got to be information other people might need but nothing purely commercial” – basically eliminating a huge crop of online shopping and corporate websites.

“We’re talking to [newspaper and magazine] publishers about their sites,” Ms Schwens said, “And we’re interested in blogs by people in public life – but not in every site of every private individual.” The limiting factors were technical: What file types to accept? How often should a library archive an ever-changing website?

Currently targeting e-books and dissertations, much of the collecting in Frankfurt is still done by hand. The 20,000 publishers and academic institutions registered with the library are obliged to submit web material to the library’s server, run by an outside provider, or leave files on their own systems for the library to pick up.

Soon a lot of collecting will be done by machine. Material from news sites, for example, can already be secured automatically using a technique called “harvesting”. The question for each library will be how often to instruct its computers to do this: Ms Schwens said there were 12m active German websites, although not all would deserve a look.

The library has already collected 40,000 e-books, 60,000 online dissertations and 1,200 e-journals, still a modest number compared with a physical archive that counts 24.5m items.
“But in the next few years, we’re going to collect millions of files,” she said – perhaps even the web encyclopaedia Wikipedia. Now that should make all webbies happy.

Indeed, the German blogosphere seems to be coming round to the idea. Admitting it was caught off guard by the deluge of misinformed protest, the Nationalbibliothek now gives comprehensive information about its plans – on its website, of course. One newly enthused webbie recently said on heise.de: “Let’s stop digital amnesia!”

Source: http://www.ft.com/cms/s/0/fb9fb642-ba81-11dd-aecd-0000779fd18c.html

No comments: