LibGuides: Data Management and Sharing Plans: During your research

Data Management during your research

OVERVIEW

Good data management practices occur throughout the research process. Even if your project did not require a formal data management plan, thinking about data management during your research will help ensure your data are secure, well-documented, and better prepared for later dissemination and archiving.

This page will examine the following issues:

Documentation & Metadata
Finding data
Managing sensitive data

I. DOCUMENTATION & METADATA

Getting the basics down

Who? Who contributed to the project (authors, research assistants, etc.)?
What? What kind(s) of data and analysis were used?
When? When was the data collected? When was analysis performed? Any other pertinent dates?
Where? Does the project involve a particular geographic area, such as the state of Minnesota, or the Twin Cities, or Antarctica?
Why? What is the impetus for the project? What questions are you trying to answer?

Getting a little more in-depth
- Imagine that you have to leave the project as is for a couple months and then come back to it. What are the most important aspects of the project you'd need help remembering? Some examples:
file handling (how are they named, how are they divided)
processing steps (how to get from point A to B)
field abbreviation/name glossary (now what does ABC3130 stand for again?)

Now imagine if you had to leave the project and come back after six months or a year. What else would you add to the list?

Need Help? Download an Example Readme.txt (plain text file) template that can be adapted for your data.

Standardizing your documentation

With the “raw material” documenting your project down, the next step is to standardize the formatting. The standard to use depends on the discipline and/or format of your data. A few standards are listed below. Again this isn’t intended to be exhaustive, but rather descriptive.

Type of Data	Discipline/s	Standard
----	Social and Behavioral Sciences	Data Documentation Initiative (DDI)
----	Ecology	Ecological Metadata Language (EML)
Spatial	----	Content Standard for Digital Geospatial Metadata (CSDGM)/FGDC/ISO 19115
Biodiversity	Life Sciences	Darwin Core

A more comprehensive list of disciplinary metadata standards is availablefrom the Digital Curation Centre.

Naming your files

File names should be:

unique
consistent
informative when they are quickly scanned
It is also best to use names that will help the files fall into a useful order when they are saved.

One goal for file naming is to give enough information so that either the creator or a new user can figure out where the information in the file fits into the project.

Elements that may be included in your file names are date, project name, type of data, location, and version. There are other features to consider as you design your file naming plan described on this google doc.

Recommendations and Best Practices

Documentation tools such as a general metadata schema. Librarians and metadata specialists have produced a number of useful schema to describe research data and scholarship.
Training in Data Management The libraries offer training and best practices for managing digital information the right way, such as better filenames and preservation file formats.

II. FINDING DATA

The library subscribes to many data archives and resources for you to find and access essential data for your research.

Consult this guide to get a comprehensive overview of the many archives & resources CSUN University Library subscribes to.

III. Managing Sensitive Data

One of the challenges of sharing human subjects data is the risk that your data may identify an individual, either directly or indirectly. Additionally, the information in your dataset may be legally protected or sensitive, which could lead to legal repercussions for you and/or bring harm to the individual if that information is released and linked to that individual’s identity.

Data Sharing Concerns

Disclosure is the unauthorized release of information that may identify an individual research participant or organization. Examples of disclosive information include:

Direct identifiers or Personally Identifiable Information (PII), such as name, address, social security number, and phone number.
Indirect identifiers, such as zip code, birthdate, education, and race/ethnicity, that could be used in combination to uniquely identify an individual.
Information in a dataset that can be linked with outside information, from sources such as social media, administrative data, or other public datasets, that results in identification of an individual.

Legally protected data have restrictions placed on them by law. Examples include:

Family Education Rights and Privacy Act (FERPA) protected educational records data, such as grades
Health Insurance Portability and Accountability Act (HIPAA) protected medical or healthcare data

Sensitive data include any information that may cause harm, legal jeopardy, or reputational damage to the subject if disclosed. Such data may or may not be legally protected. Examples include:

Criminal of illegal behaviors, such as drug use
Mental health information
Sexual behaviors
Information about minors or other vulnerable populations

Before sharing human subjects data publicly, the dataset should have a low disclosure risk or be free of disclosive information. This involves removing both direct identifiers AND indirect identifiers that may pose a disclosure risk.

If your data contain legally protected or sensitive data, or if the removal of identifiers limit the usefulness of your data, consider sharing through archives with restricted access repositories, such as the Inter-University Consortium for Political and Social Research (ICPSR).

In addition to the content of the data, the agreement made with participants in your IRB can also limit the extent to which human subjects data can be shared.