Research data management

The process of collecting, processing, storing and making data accessible must be planned in advance. To achieve this, a Data Management Plan (DMP) must prepared, specifying how the research data created during the project’s execution should be managed before and after its completion.

Data Management Plan creation tools

Since June 2019, National Science Center obligates all applicants to fill the Data Management Plan application n. The guidelines for Data Management Plan is on the website. During the process of filing the NCN grant application, we recommend designating Polish Platform of Medical Research of Wroclaw Medical University as repository of the research data.


Short instruction for filling the DATA MANAGEMENT PLAN in the project application


Authors:
Izabela Czeszek, Justyna Zawada
Department of Scientific Information and Library Promotion,
Library of Wroclaw Medical University
November 2024

Informacje wstępne

Data Management Plan is a brief description of how research data, that was generated or used in the project, will be collected, processed, stored and shared (1-2 sentences for each section). This is a plan that can be modified during the course of the scientific project. Information regarding the sharing of research data created during the project and related publications should be included in the annual report. All the changes in the finished project have to be included in the final report.

Useful contacts:

IT infrastructure at WMU

  • Network resource – created by CI at the request of the researcher; can be shared with various individuals locally
  • Cloud – every employee has 2 GB of cloud storage
  • External Cloud (One Drive) – every employee and student is entitled to 100 GB of cloud storage in One Drive with automatic backups of stored files
  • More information.

Institutional repository at WMU

WMU has infrastructure prepared for depositing research data resulting from scientific research. It’s Polish Platform of Medical Research of Wroclaw Medical University (Repository PPM-UMW). The Repository assigns a unique digital identifier DOI to the deposited research data sets, meets the FAIR requirements (Findable, Accessible, Interoperable, Reusable), and is indexed in international repository registers, including Re3data.org.

1. Description of Data and Acquisition or Reuse of Existing Data
1.1. How will new data be acquired or generated, or how existing data will be reused?
  • It should be indicated whether new research data will be generated in the project or whether secondary (existing) data will be used.
  • For new data, it is necessary to:
    • specify how the data will be collected, e.g., during interviews, observations, experimental studies, measurements, etc. In the case of experimental studies involving humans, consider contacting the Bioethics Committee
    • specify the type of data, e.g., textual (notes, transcriptions), numerical, measurement, statistical, survey questionnaires, images, sketches, photos (X-rays, CT, ultrasound, from interviews), samples, codes, etc.
    • indicate the equipment and software used
    • indicate if data will need to be digitalized, e.g. analogue data or collected/generated in paper format like maps, photographs, notes
  • For secondary data, it is necessary to:
    • indicate the source
    • specify the terms (conditions) under which they will be used (type of license)
1.2. What data (i.e., types, formats, volumes) will be acquired or generated in the project?
  • It is necessary to specify the types of data generated in the project, e.g. textual, numerical, sequenced data, measurement data, statistical data, survey results, images, photos .
  • It is necessary to specify open formats used for storing data, e.g.:
    • text files 🠊.txt, .rtf, .odt;
    • spreadsheets 🠊.csv;
    • online surveys 🠊 .html;
    • recordings 🠊 .wav, .aif, .flac .wv, .apl, .mp3;
    • photos, images 🠊 .bmp, .gif, .jpg, .png;
    • databases 🠊 .spv, .dbs;

    or other formats that are specific for files related to research data

  • Indicate the estimated volume of data in MB/GB/TB (this value may change during the course of the project)
2. Documentation and data quality
2.1. What metadata and documentation (e.g., methodology and data acquisition and organization methods) will accompany the data in the project?

Research data generated in the project that is finances from public funding will be shared publicly in the open repository. That is why it’s important to record metadata and maintain documentation that will enable future reuse of these data.

Metadata – information (data) that describes research data in a way that can be read by both humans and machines (computers) in the future.

  • It is necessary to specify which metadata will be recorded: data type, author(s), permanent digital identification of the researcher (ORCID), title of the research data set, file titles, keywords, abstract, year of creation, etc.

Documentation – should include information that allows understanding and interpretation of the data. It can be a text file (README).

  • It is necessary to specify what information about the produced data will be collected, e.g., project description; description of research methodology; context of the research (method of interpreting the research); legend of abbreviations used in the data files
  • It is necessary to specify how folders and files were organized
  • At that phase, it is recommended to make a decision about the choice of a repository and to check what kind of standard for metadata is used in it. Then, continue to make project documentation in accordance to that standard.

If the research data will be deposited in the Polish Platform of Medical Research of Wroclaw Medical Uniwersity the data deposition form will require the following metadata: data type, author/authors, contact person, language of the research data, language of the metadata, title of the dataset, keywords, abstract, year of creation, README file. Additionally, information about the related project along with the project number, as well as related publications and other research data linked to the research data collected in the project, if any, will be required. The entered metadata will be saved in the Dublin Core standard.

2.2. What kind of quality control tools will be used
  • It should be specified what control measures will be applied during data acquisition, for example:
    • calibration of equipment before each data acquisition cycle
    • repetition of experiments, for example: by two different authorized researchers
    • data entry validation system. For example – one person enters the data, another verifies it or the entered data is checked by two independent persons
    • other control methods typical for the specific scientific discipline
  • It should also be specified what kind of safeguards against unauthorized data modification will be implemented.
3. Storing data and creating backups during the project
3.1. How the data and metadata will be stored during the project? How will their backup copies be created?

It is recommended to follow the rule of 3-2-1, which means: 3 copies on 2 different media with 1 medium stored in a separate location. (IT infrastructure at WMU)

  • Specify where the data will be stored during the project (for example: in a database available after logging in to a password-protected, secured computer, on the university cloud with automatic backups or on One Drive cloud as an external tool
  • Specify if and how often backup copies of saved data and metadata will be made on the computer (for example: once a day) and will be updated to the latest version
  • Specify whether the automatic backups be created using cloud based solutions
3.2. How will sensitive data be protected during the project?

In order to ensure the protection of sensitive data, it is recommended to consult with, for example, the Data Protection Office IOD and the IT center.

  • It should be stated based on which laws/regulations that are functioning in the university, the sensitive data will be stored. For example: Personal Data Protection Policy of the Wroclaw Medical University (Regulation No. 93/XV R/2018 of the Rector of Wroclaw Medical University from August 1, 2018, Appendix No. 1, as amended) and in accordance with the guidelines of the Data Protection Officer (IOD) and the Chief Information Officer (CI).
  • It should be stated how the data lost due to any incident will be recovered, for example: by using backup created in the cloud.
4. Legal requirements, codes of conduct
4.1. If personal data is processed, how will compliance with personal data regulations and protection be ensured?
  • It should be stated in the plan if personal data will not be collected
  • If personal data will be used and processed:
    • It is recommended to contact Data Protection Officer (IOD);
    • It should be stated whether the data will be collected and processed in accordance with RODO regulations
    • It should be indicated whether the study participants have been informed about the RODO regulations and the university’s Personal Data Protection Policy and whether the researches have obtained informed written consent for processing of personal data from the participants
    • It should be indicated whether before making data open access will it be secured through anonymization, pseudoanonymization or encryption ((it should be noted that the encryption key should be stored in a location separate from the data itself)
    • It should be indicated who will be authorized to access sensitive data and under what conditions
4.2. How do you plan on ensuring compliance with other regulations, such as: intellectual property rights and ownership rights? Which regulations are applicable in that case?
  • It should be indicated who will be the author – the owner of the copyright and intellectual property rights to the data obtained during the research
  • If data will be obtained from 3rd party, it should be indicated if there will be any restrictions in relation to reuse of the data
  • It should be indicated under which license the obtained data will be made available
    • It is recommended to use the licenses required by the grant provider – in the case of NCN these would be the Creative Commons CC 0 or CC BY licenses. More about licenses.
5. Sharing and long term storing data
5.1. When and how the project data will be shared? Are there any restrictions/prohibitions regarding their disclosure?
  • It should be specified when the data will be made available – will it be during the project or after it’s finished? NCN allows for publication of data no longer than the time of the publication of the article that is based on it
  • It should be specified in which repository the data will be shared and for how long it will be kept there (NCN requires for at least 10 years)
  • If there will be an embargo, its duration should be indicated and there has to be a reason specified for it
  • It is necessary to indicate any potential restrictions on data sharing which may arise, for example: from planned commercialization, legal reasons, data confidentiality, lack of consent from research participants – if such exist, they should be indicated
  • It is recommended that the data sharing follows the rule:
    • Data should be as open as possible and as closed as necessary
5.2. How will the selection of data to be preserved be approached and where it will be stored for the long-term (data repository, archive)?
  • It is necessary to indicate how the data selection will be approached and the decision on which data will be kept, deleted or shared
  • Remember that it is necessary to make data that is fundamental for publication that is a result of the project, publicly available
  • It should be indicated where the data that won’t be shared will be kept
  • It should be indicated in which repository the project data will be shared and if it adheres to the principles of FAIR data

The Polish Platform of Medical Research of Wroclaw Medical Uniwersityrepository enables the location of deposited data through a unique digital identifier DOI and meets the requirements of the FAIR principles.

5.3. What methods or software allows for access to the databases and its use?
  • It should be indicated which formats will be used to store data for the declared period of time
  • It should be indicated whether reading and reusing the data will require specialized tools/software.
  • When sharing research data from publicly funded projects, the use of open formats should be pursued, for example:
    • text files 🠊.txt, .rtf, .odt;
    • preadsheets 🠊.csv;
    • online surveys 🠊 .html;
    • recordings 🠊 .wav, .aif, .flac .wv, .apl, .mp3;
    • photos, images 🠊 .bmp, .gif, .jpg, .png;
    • databases 🠊 .spv, .dbs;
5.4. How will you ensure the use of a unique and permanently assigned identifier (such as a digital object identifier DOI) for each dataset?

A unique digital identifier is an identifier that is permanently assigned to the document or project datasets, that allows to locate the document/dataset along with its associated metadata, citation tracking and reuse.

Unique digital identifiers are assigned to research datasets in research data repositories.

  • It should be indicated whether the shared research data will be permanently assigned a unique digital identifier, such as DOI (it is recommended to check in advance in a chosen open repository)
  • If other permanent digital identifiers will be assigned to the data, it should be specified which ones

In the Polish Platform of Medical Research of Wroclaw Medical University repository, a unique digital identifier DOI is assigned to the deposited project data.

6. Tasks related to data management and resources
6.1.Who will be responsible for the data management (who will be looking after it?)

The data steward is a person or an institution responsible for data management over long period of time – initially during the process of the data creation, its collection, taking of its quality and security, processing, creating backups, archiving and after the data is transferred to a repository (long them archiving, sharing).

  • It should be indicated who will be responsible for project data during the project and to what extent – the project manager or other team member or data steward
  • Is should be indicated who will be responsible for data management after finishing the project, for example the data steward
  • In the case of partnership projects, the division of responsibilities related to comprehensive research data management should be planned
6.2. What resources will be allocated to data management and ensuring compliance with FAIR principles?
  • It should be indicated if the resources (people, time, hardware or software) and finances for long term archiving project data and data management during the project and its completion will have to be secured
  • It should be indicated what is the estimated cost and financing source, for example – 2% of indirect costs allocated for this purpose in the project