Costing data management

November 8, 2010

There have been a few events of late on costing research data management. Two that I’ve attended are:

Roles and responsibilities were a key theme. Is data management the concern of researchers, their institutions, funders or disciplinary data centres? At the RDMF, Jeff Haywood, Vice Principal for Knowledge Management at University of Edinburgh, described the institution as the place of last resort for preserving data. They hope to direct researchers to external data centres where possible but are concerned to keep a register of the data so they know where their assets are and can act to secure these if external services are under the threat of closure.

A breakout session at the RDMF on institutional solutions versus national data centres reached a similar conclusion. It isn’t a matter of choice – we have to live with a mixed landscape. It was argued there should be more services at local level: a sort of first step data management service. A series of handovers could then scale up to various levels as appropriate based on the nature of the data, the available infrastructure and the specific requirements of each case. Jeff’s argument holds well in this scenario – HEIs don’t need to provide a complete infrastructure, just add to existing provision where required and most importantly know what they own and where this is.

At the JISC workshop, Andrew Bush of KPMG addressed how costs can be built into research funding bids when there’s a gap in provision.  He recommended that data management support costs should be recovered through indirects, as this is apparently where research councils see them being placed. He advised not to class data management infrastructure as research facilities, as the cost of these should only be applied when the facility is used by a project – not on every bid – so you need to work at capacity. Also, as projects typically draw on data management infrastructure once finished, it’s better to include this as an indirect cost. It seems research funders are willing to meet data management costs but it’s quite an untested area so examples of how people have costed in support would be welcome.

One aspect where headway has been made is in defining some of those costs. The JISC MRD projects have been asked to identify researcher needs and pilot services to address these. At Leicester they’ve been investigating the provision of ‘good enough’ data centres, which provide robust but cheaper storage to researchers. The cost comparison was £400 per Tb per year versus the usual £1 a Gb a day on university SANs. Jonathan Tedds reported that the reception to this has been overwhelming, as researchers often struggle to manage their own storage and back-up efficiently. Comparable charges were noted by other JISC projects too.

More work is underway across the MRD programme on defining benefits and business models for sustainability. This will be presented at the International workshop in Birmingham in March 2011.

Initial test user thoughts, and my cautious optimism

November 5, 2010

We recently had our very first guinea pig researcher evaluation exercise of the nascent Cambridge web pages for data management support. This was, as you might imagine, very helpful. We’ve been squinting into our computer screens for a while on these, and this gave us an opportunity to take a quick step back and make some adjustments.

We have now received a bit of support and encouragement for our (a) FAQ-centric format, (b) light-weight level of detail with many ‘further reading’ links for adventurous souls, (c) categories of support/guidance, i.e. ‘managing your data’, ‘organising your data’, ‘accessing your data’, and ‘looking after your data’, and (d) topics.  So, with this first data point, I breathe a tentative sigh of relief. Wheh!

There were also some well-earned criticisms and helpful suggestions. Here are some of those:

  • Practice what you preach! (We have instructed researchers to use open formats, and, where necessary, use PDF/A rather than PDF formats. But we had a lone Excel spreadsheet attachment on one of our pages. Hmm…)
  • ‘Teaser’ text explaining links or categories must be precise and complete. (This may sound obvious, but it’s encouraging to see that our test user read and depended on this text, and we will give it some more thought).
  • Including Pros & Cons is pretty much always helpful, especially when a question has no definitive answer that will serve all users best.
  • ‘Return to top’ buttons are a site user’s best friend!
  • Reminder: There are preferred formats for preserving file content in the long term, and preferred formats for preserving maximum file usability in the short/medium term, and the two aren’t always the same; we need to make sure that users understand this.

We’re continuing the work on the Cambridge and Glasgow websites, associated training resources, and data-management related workshops hosted at each institution.

As part of this process, we are hoping to send some of our draft resources your way soon, for your thoughts and appraisal. More on that soon– watch this space!

And, of course, as always, please send your thoughts on website usability and communication in our direction! Finally: have a safe, fun, and scalding-free fireworks weekend!

Incremental in the press!

November 2, 2010

Well, in the Digital Preservation Coalition newsletter, at least!  Find us in the ‘Who’s Who: Sixty Second Interview’ section here:

One of the most difficult things is keeping responses short and pithy.  The more I thought about the questions, the more I wanted to say.  (The other most difficult thing is finding a photo I was prepared to put in the article!)

So, have a quick read and let us know what you thought.  Was I talking nonsense?  Let us know in the comments!

FOI and researchers: an update

October 25, 2010

Following on from our earlier post, ‘Panic?  What Panic?  FOI and researchers‘, we’ve noticed that the JISC has produced some FOI guidance for researchers.  Primarily for England and Wales, the resource does however mention the Scottish position.  You can find it at

and we’d love to know what you think.  Have you used this JISC guide?  Is it useful / does it make sense to you?  Let us know in the comments.

The crossing point

August 20, 2010

I’ve been looking at existing online provision of research data management guidance at leading UK universities and, yes, I’ve found something of a trend.  There may be some useful guidance on each website, but it’s anyone’s guess where it is, and it’s certainly not all in one, easy-to-find place.

Many – if not all – UK universities have some webpages aimed at researchers.  These are usually called ‘Research Support’, ‘Our Research Environment’, ‘Research Services’ or something scary relating to commercialisation and knowledge transfer.  Anyway, they’re usually pretty obvious when you find them because they’re full of photos of attractive people wearing safety specs and looking intently at things in test-tubes.  The text is reassuring, generally promising to hold the hands of researchers through all aspects of finding, bidding for and managing funding for research.  Oddly, though, they don’t often say anything about looking after that valuable information which people are going to the lengths of giving you money to gather in the first place.

Then, in another place on the website entirely, usually in the ‘Staff’ webpages, we find information on training and development.  Elsewhere again for the usually-bewildering IT support department website and the services and tools they provide.

And then we must look elsewhere for the information – if it’s online at all – about records management or information management.  If you’re lucky enough to actually locate these pages, you’ve often followed a trail – entirely by chance, I should imagine – that goes something like, ‘Home > About the university > Governance and administration > University directorates > Records management and information access > Legal compliance > Records management’.  I wish I was making this up.  Alternatively, you could try the free text Google search box and hope that your choice from ‘records management’, ‘information management’, ‘data management’, ‘research data management’, ‘managing research data’ or ‘research information’ comes up trumps.

Elsewhere, we find some webpages aimed at library users.  These pages, naturally, take the reader through using the library, how to find things, how to get hold of your subject librarian, should you still be lucky enough to have one, and any special collections or galleries the library may be attached to.  This is often – but not always – where you find any mention of the institutional repository.

Yes, the institutional repository, or ‘IR’: often as not, it’s not linked from anywhere except maybe an obscure corner of the IT services website, or maybe a dusty by-way on the library webpages.  Sometimes we only know it exists because the SHERPA list tells us so.  Sometimes even then, it doesn’t turn up online.  When this is the case, you could be forgiven for resorting to your university website’s ‘A-Z’ index – but wait!   It turns out that the IR is very, very unlikely to be listed there under ‘institutional’ or ‘repository’ or ‘archive’ or even ‘research’.  Most university IRs seem to be called something cute, often a name from classical mythology which nobody can remember the relevance of, or a witty acronym from which a highly unlikely title has been tortuously back-formed.  Sometimes they’re just plain baffling and you may as well just search the whole site for ‘EPrints’ and hope for the best.

My point is this – if you are a researcher in need of data management guidance (in the widest, ‘lifecycle’, understanding of the term), you need a little bit of input from each of these places, throughout the life of your project.

  • You need to know from the library where to find the resources you need for your work, if don’t want to trust your review of literature to the likes of Google.
  • You need the staff training or development service to provide you with training on the research software or methods you want to use, and which will allow you to preserve your data in a meaningful way.
  • You need the records management people to let you know what the university thinks you should be keeping, what you should be getting rid of and what the best ways to do these things are.
  • You need to know from the institutional repository how you can submit your work, what format it should be in, what your rights are if you do submit a piece of research to them, and how other people are going to find your work.
  • You need the research support people for funder-specific data management requirements, and to let you know if there’s a research-specific data management policy that differs from the general, institutional records management and/or retention policy.
  • You need to know from IT support what your IT people are prepared to offer you in terms of access to specialist software, equipment, data storage, back-up services and the rest of it.
  • And – crucially – when you’re writing that last-minute bid for funding, you need smooth interaction between these departments to answer questions like, ‘What’s the best way to record my findings during the project and share them with the rest of the team?’, ‘Where and how should I store my data?, ‘Are IT services responsible for backing-up my research data?’, ‘Will my funder pay for the cost of a new server and staff time to administer it?’, ‘Will my funder let me publish my findings in the institutional repository?’, ‘Should I keep my research data once I’ve published or submitted my findings, and if so, where?’ and probably ‘What is a technical appendix anyway?’

The information needed to reliably answer such questions often falls between the realms of IT services and research support services, or research support and the institutional repository, or research support and the training people, or – well, you get the idea.

Help with managing research data is provided by many institutions, but delivery is fragmented and inconsistent.  In many institutions, these resources or pieces of guidance are separate islands, with no crossing points between them. This is no good to researchers – it makes finding guidance much more difficult and time-consuming than it needs to be. You may have found contacts through your personal network or the protocol of your department to help you with this stuff but if you’re new, out of the loop or just not so lucky, bids can be faulty or delayed, funding missed out on and, as a result, research careers damaged.

I say all this based, as mentioned at the start, on a survey I recently undertook of the websites of twenty leading UK universities whose websites I, as a random visitor, studied.  I found evidence of just under a third offering any kind of researcher-specific data management advice online (although it should be noted that I didn’t have access to staff-only intranets).  The other two-thirds of university websites apparently provided only records management advice for either unspecified types of records or specifically for administrative records only (although of course a lot of the practice outlined was still highly relevant to research data).

I gave myself five minutes on each site to get to the research data management advice, if it existed, by navigation of likely-looking links.  After that time, I resorted to the free text search box.   In ninety percent of cases, I had to use the all-site search in order to find any records management or information management guidance at all.   Only one of the twenty university websites appeared to offer any link between the data management advice pages and the IR.  (I’d be interested to know what percentage of university research staff at each institution know a) what an institutional repository is; b) whether they have one; c) what it’s called and d) where it is online. Hey, I think I’ll find that out …)

Only fifteen percent of websites visited listed their IR in the website A-Z index in a way that you’d be able to find it without knowing its cute, in-house name, and a quarter of institutions listed it only under this name.

So, in short, to improve matters, universities need to consider the pieces of guidance they already supply their research staff about data management, and draw them together to form comprehensive, simple resources that will make sense from the working researcher, with little time and no data-management specialist knowledge.  These resources should act as crossing points between previously-separate realms.  And this is where the opportunity is for Incremental to make things better for researchers.

If we can find good practice in UK university research data management guidance, whether that’s in a well-written list of FAQs, or a well-organised website pulling together guidance from across a university website into one accessible, obvious place, then all to the good.  If we can’t find this, or find enough of it, we need to start making it and positioning it on our respective university websites in a way that is prominent and intuitive for research staff of that university. These connections can be the crossing points to help researchers get to the guidance they need, when they need it, and if we manage that, I think Incremental’s job is done!

Does your university offer meaningful help with data management?  Or are you struggling to find the assistance you need to look after your data?  Are you responsible for promoting one of these services at your institution?  Let us know in the comments.

Panic? What panic? FOI and researchers.

August 10, 2010

With all the recent brouhaha around the forced disclosure of research data, after the University of East Anglia and Queens University Belfast climate-change researchers were required to make their data available whether they like it or not, it seems that Incremental really is working on a hot topic!

However hair-raising the various reports have been for researchers, though, a couple of sensible points have been made, particularly in the Times Higher Education Supplement article and its subsequent reader comments [at

Chris Rusbridge, erstwhile director of the Digital Curation Centre, reminds us therein that in the Queens University case, the request for information was made specifically under the Environmental Information Regulations, which he describes as ‘stricter’. Further, Rodney Breen points out that “the Freedom of Information Act has exemptions to protect data which is collected with a reasonable expectation of confidentiality, and data which is commercially sensitive. Under the Scottish Act, there is specific protection for research data. There is no reason why material for which researchers have legitimate need for protection should need to be disclosed.”

I had a look at the Freedom of Information Act (Scotland) 2002 and Part 2 (Exempt Information) does indeed say:

27 Information intended for future publication

(1) Information is exempt information if—

(a) it is held with a view to its being published by—

(i) a Scottish public authority; or

(ii) any other person,

at a date not later than twelve weeks after that on which the request for the information is made;

(b) when that request is made the information is already being held with that view; and

(c) it is reasonable in all the circumstances that the information be withheld from disclosure until such date as is mentioned in paragraph (a).

(2) Information obtained in the course of, or derived from, a programme of research is exempt information if—

(a) the programme is continuing with a view to a report of the research (whether or not including a statement of that information) being published by—

(i) a Scottish public authority; or

(ii) any other person; and

(b) disclosure of the information before the date of publication would, or would be likely to, prejudice substantially—

(i) the programme;

(ii) the interests of any individual participating in the programme;

(iii) the interests of the authority which holds the information; or

(iv) the interests of the authority mentioned in sub-paragraph (i) of paragraph (a) (if it is a different authority from that which holds the information).

I’m (obviously) no legal expert, but I read this as describing certain types of research data as exempt from FoI requests in Scotland, which means, as Chris Rusbridge puts it, ‘This is not all as bleak as it’s painted!’

Clarification, comments, and any differences in the situation for Wales, Northern Ireland and England are, of course, very welcome.

Decoding the Digital

August 3, 2010

This time last week, Catharine and I attended the joint DPC and BL Preservation Advisory Centre workshop: Decoding the Digital . So, what was it all about? Well, Caroline Peach began by explaining their intentions in putting the event on. Much is similar between traditional collection care and digital preservation, so rather than differentiating between our work, we were encouraged to build on and learn lessons from each other’s practices.

The theme of the day – defining a common language – helped make sure these intersections were addressed. Gareth Knight spoke about InSPECT and their work on significant properties. He acknowledged this to be a very opaque digital preservation term, but promptly gave a clear definition: they’re the characteristics you feel must be preserved for your data to remain accessible, usable etc. Our talk, given by Catharine, also reflected on the language being used, as we’ve found many researchers are bamboozled by terms such as ‘digital curation’ and ‘data management’. As William Kilbride concluded, we’ve spent the last 10 years working on this and have managed to make it harder for other to get started by using such fancy words to express ourselves!

There were several useful talks for our project, specifically Joel Eaton’s explanation of the points to consider when choosing file formats and Alexandra Everleigh’s pragmatic description of how to curate digital material on virtually no budget and little practical experience. I found these a very encouraging case for action. Procrastination and theorising won’t save digital material. Even if we’re not certain of the right approach (and can we ever be?), we need to take some practical action. The kind of guidance and support we’re aiming for will have the same ‘no-nonsense’ approach.

Thanks to DPC, BLPAC and all the speakers for giving us more ideas.