Draft Research Data Policy

We currently do not have an official policy that addresses critical aspects associated with research data, e.g.,  collection, retention,  security, access, etc.

The Office of the Vice Provost for Research in collaboration with others has produced a

Draft Policy

It’s a 3-pager. Please take a look and  leave comments below. Totally anonymous unless you identify yourself in the post itself.

Others: Stanford, Columbia, MIT, Johns Hopkins, Michigan

Print Friendly, PDF & Email

9 thoughts on “Draft Research Data Policy

  1. There are two main issues with this proposed policy.

    First, keeping data for 7 years is infeasible for most social sciences work. It is particularly difficult in areas where the research does not collect and own the data themselves. For example, many governments allow special access to data. The research does not own the data, and must return the data to the government agency when the research project is complete. Often, there are limits (around 2-3 years) over which the researcher can hold the data. Most government agencies would balk at a researcher request to hold the data for a period of time after the research project ends. They would be concerned that the potential for breeches of privacy would be increased by this type of policy. This type of policy will limit Cornell researchers from accessing this type of administrative data. Without access to this type of data, social science research will suffer. I think the policy could get around this with an exception for research with data where the provider did not allow researchers to retain the data. In this case, Cornell policy could follow policies at journals where researchers are required to either deposit data, or to clearly make public the process by which they obtained the data and agree to assist others in obtaining the data to the extent possible.

    Second, as others have mentioned, this is an unfunded mandate on faculty. If Cornell is going to require this type of thing, it needs to fund it and make facilities available. There could be a central pool of funds, distributed to faculty based on the expenses required for each specific project. Alternatively, each faculty member could have a set space for storage, with increases in capacity for those that need it.

  2. For those wondering what is included by the definition of data, 2 CFR 200.315 states the following:

    “Research data means the recorded factual material commonly accepted in the scientific community as necessary to validate research findings, but not any of the following: preliminary analyses, drafts of scientific papers, plans for future research, peer reviews, or communications with colleagues. This ‘‘recorded’’ material excludes physical objects (e.g., laboratory samples). Research data also do not include:
    (i) Trade secrets, commercial information, materials necessary to be held confidential by a researcher until they are published, or similar information which is protected under law; and
    (ii) Personnel and medical information and similar information the disclosure of which would constitute a clearly unwarranted invasion of personal privacy, such as information that could be used to identify a particular person in a research study.

  3. Should the reference to federal regulations be 2 CFR §200.0-521, not 0-251? Record retention requirements are covered by 2 §200.333-337.

  4. A fine example of “sound management principles” (ie, corporate model of university governance) as a rationale for requiring research staff to cover the cost of protecting institutional interests.

  5. The requirement to store data for 7 years post-closeout is fine, but to then argue that the cost should be born as a direct cost of sponsored research is problematic because there is no more grant at that point. It seems that archival storage needs to be supported by some other means for this post-award period. Personally, I believe archival storage that the university mandates beyond the support of a grant (if possible) should be considered overhead.
    I have not seen an NIH grant where archival data storage was a valued component of direct cost expenditure. A future grant is not going to allow storage of unrelated data as a direct cost. It is likely that such resource expenditure will have to be massaged into modular budgets, which means fewer resources for actual people and supplies to conduct new research.
    I agree that cloud storage is inexpensive (in particular Globus) but does this count as “in facility” storage as required by this draft? I agree that lab notebooks are one form of data, but as you know we have terabytes of image files, genetic databases, statistical analyses etc. that wouldn’t show up in a lab notebook. That seems to be unclear enough to pose an issue down the road.
    The policy currently dictates that it is the PIs responsibility to make sure the data is available for others to replicate, but there is sometimes a certain fog of war with respect to file names and such that would be super onerous for PIs to have to completely manage. I’ve developed file tree organizational rules for all my students to follow, etc., but there is just little incentive for them to fix confusions with these when they are finishing up and there is only the threat of not signing the thesis document (while still paying for the student) to make it happen. I think there should be instead a collaboration with the graduate school to emphasize the importance of organized data storage, and make the students agree to some text that they will/have organized their data as a condition of finishing their degree. Making it the student’s responsibility, mentored and verified by the PI, is much more transparent and appropriate as a training feature than a last-minute PI responsibility that a student can forego in the name of getting work done.

  6. An interesting aspect not addressed by the proposed policy pertains to how the Cornell administration is enabling storage and dissemination of data for an extended period AFTER the end of the project (‘Research data must be retained for a minimum of seven years after the final project closeout. ‘). What IT infrastructure will the Cornell administration be purchasing to enable faculty members to operate within FAIR data principles? Without such infrastructure the policy commits faculty to ‘Collecting, maintaining, retaining, and providing access to research data for the periods required by this
    policy;’ without enabling such activities. This is a significant omission and appears to be abdicated back to faculty members with the following; “Note: Costs associated with the preservation and security of research data during the term of a sponsored
    award are typically allowable direct costs of conducting research. The Cornell University IT department
    provides secure cloud storage at very affordable rates. For example, a few terabytes, more than sufficient for
    several years of research data for most faculty, will cost less than a few hundred dollars per year. Those
    faculty who work with extremely large data sets may want to take advantage of archives available to their
    discipline.” So far as I am aware costs AFTER the end of the project period are inadmissible in federal grants. Note: placing data on any commercial cloud is not compliance with FAIR data principles.

  7. If we have to keep data for 7 years past closeout, how do we pay for it? We can’t spend money past the end of the grant, but in this policy we have to retain data past the grant’s end (we also typically can’t spend money from other, current, grants to do this, obviously). For projects such as those on which I work, with 10s to 100s of TB, this would be a very substantial cost for which there is no domain archive to host it free of charge, and no obvious mechanism to find the necessary funds.

  8. In general this looks like a reasonable policy. However I see a couple areas/issues where some refinement is needed.

    1. Many forms of data have open repositories that are mandated by the federal program that support the research. For example, all of my primary field data are required to be at one at the end of the data collection, and to be openly available within 2 years or less of collection. Many of the systems to collect those data go straight from the field equipment to the data center. In these cases, the policy should make clear that the data do not also have to sit at Cornell – there is open access within the conventions of the discipline.

    2. Open access after some limited moratorium is now a common feature of all NSF awards I receive. Any Cornell archive would need to comply with that; there should be some acknowledgement.

    3. What constitutes “data”? Is this policy about the primary records of instruments, or about more advanced secondary data products that more closely resemble what appears in publications? There is a big debate about this in my general discipline (Earth and ocean science), with different answers in different subdisciplines. In some areas, the reproducibility/fraud-potential issues are within the later processing and the algorithms that go with it. I am not at all sure what I would need to archive under the proposed Cornell policy.

    4. When the researcher leaves Cornell, who’s responsibility is it to shoulder the continued cost? Often the lab closes down, and the infrastructure goes away. Putting this on the original department seems potentially onerous.

    5. Some kind of “data” like physical samples are hugely costly to maintain, as they take up considerable space (think rocks, water column measurements, etc.). Will Cornell help develop infrastructure to keep them archived? If not, point #4 in particular seems like an issue.

    Geoff Abers, EAS

Leave a Reply

Your email address will not be published.