The EURISCO-EVA Information System, an innovative approach to the data management of multi-site crop evaluation data


Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, 06466, Germany
European Cooperative Programme for Plant Genetic Resources (ECPGR), Rome, 00153, Italy

Abstract

This paper introduces EURISCO-EVA, an extension of the European Search Catalogue for Plant Genetic Resources (EURISCO) hosted at and maintained by the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben on behalf of the European Cooperative Programme for Plant Genetic Resources (ECPGR). This information system facilitates standardized data collection, sharing and analysis for plant genetic resources for food and agriculture (PGRFA) characterization and evaluation. In the framework of the European Evaluation Network (EVA), public–private partnerships aiming at the evaluation of crop accessions conserved in European genebanks, EURISCO-EVA provides a standardized data repository for multi-site evaluations of different crops. Through centralizing metadata maintenance, EURISCO-EVA ensures uniformity in trait definitions, experimental designs and passport data, promoting the efficient exchange of observed phenotypic data. EURISCO-EVA currently stores more than half a million phenotypic data points for 4,845 PGRFA accessions from 6 genera and 17 species, collected through 382 phenotypic experiments conducted at 115 experimental locations across 33 countries, involving 89 project partners. This platform offers a user-friendly web interface, empowering its users with features such as map-based filtering of trial locations, statistical overviews and customizable reports. EURISCO-EVA’s robust administrative functionalities, coupled with standardization efforts, enhance data quality and harmonization, providing a robust and scalable system for storage of and access to crop evaluation data that could be further enhanced by adding analysis modules. EURISCO-EVA also formed the basis for the data management of two research projects (AGENT and INCREASE) under the European Union Horizon 2020 funding programme, providing the background organization of complex datasets used to address future challenges in European agriculture.

Keywords

Crop evaluation, Genebank, Information system, Plant Genetic Resources, Metadata

Introduction

In the coming decades, a growing population, climate change and the need to protect ecosystems will create new challenges for agriculture and global food security. To address these challenges, sustainable farming and increased crop production are required. Developing crop varieties with resilient traits like disease resistance, drought and heat tolerance will be crucial to achieving these goals (McCouch et al., 2013; Pixley, Cairns, Lopez-Ridaura, & Ojiewo, 2023). This requires access to a diverse pool of plant genetic resources for food and agriculture (PGRFA) conserved ex situ by genebanks to identify and incorporate valuable traits into new crop varieties (King et al., 2024; Sanchez et al., 2023).

The accessibility of these PGRFA accessions and their related passport, characterization and evaluation data, is strictly linked with the existence and updating of information systems. This involves gathering data for accessions from many germplasm collections and projects into centralized sources, facilitating a smooth flow of germplasm material and data among institutions. Therefore, developing an information system that collects data from various sources in a standardized format and creates searchable datasets on genetic resources is key to PGRFA access and sustainable use (ECPGR, GenRes Bridge Project Consortium, ERFP & EUFORGEN, 2021; Guzzon & Ardenghi, 2018; Khoury, Laliberté, & Guarino, 2010). A standardized phenotypic data platform is essential for interoperability, enabling integration and comparison of data from diverse sources, allowing also for the feeding and operation of international information systems on plant genetic resources (PGR) (Weise, Lohwasser, & Oppermann, 2020). A standardized information system promotes reproducibility and validation of research findings, fostering transparency and collaboration among researchers. Standardization streamlines data integration and analysis processes, improving efficiency and reducing duplication of effort and resources, allowing the implementation of the FAIR (Findable, Accessible, Interoperable, Reusable) data principles (Papoutsoglou, Athanasiadis, Visser, & Finkers, 2023; Wilkinson, Dumontier, & Aalbersberg, 2016). Data standardization ensures also data quality and integrity by defining clear guidelines for collection, storage and validation. In this framework, MIAPPE (Minimum Information About a Plant Phenotyping Experiment) provides a community data standard for the plant phenotyping domain (Krajewski et al., 2015; Papoutsoglou et al., 2020).

To exploit the genetic wealth of PGR conserved ex situ in genebanks, multi-site pre-breeding characterization and evaluation of PGRFA are fundamental to inform and speed up the complex process of crossing, selection and testing of plant material needed to produce a new elite cultivar and preparing material ready to be incorporated into crop breeding programmes (Cockel, Guzzon, Gianella, & Müller, 2022). In Europe, the European Evaluation Network (EVA) for PGRFA (http://www.ecpgr.org/eva/), coordinated by the European Cooperative Programme for Plant Genetic Resources (ECPGR), is an international initiative aimed at increasing the use of crop genetic diversity as well as the diversity of stakeholders in plant breeding. In joint efforts with partners from both public and private sectors, EVA is producing standardized evaluation data for various crop cultivars and landraces found in European genebanks. This data includes both phenotypic characteristics and genotypic information, which can be used to identify suitable breeding materials and genetic markers for relevant traits. EVA operates through specific networks tailored to different crops, including cereals and vegetables. Initially established as five different crop networks (carrot, lettuce, maize, pepper, and wheat and barley), ECPGR launched a new EVA network on legumes in 2024, which covers seven different crop groups: chickpea, common bean, faba bean, lentil, lupin, pea and orphan legumes, thus vastly expanding the project partnership. EVA provides an opportunity to promote the sustainable use of PGRFA to facilitate the adaptation of European agriculture to climate change and to contribute towards achieving related Sustainable Development Goals.

In this paper we describe the EURISCO-EVA Information System, which was developed as an extension of the European Search Catalogue for Plant Genetic Resources (EURISCO; see Kotni, Hintum, Maggioni, Oppermann, and Weise (2023); Weise, Oppermann, Maggioni, Hintum, and Knüpffer (2017)) and is a service provided by ECPGR to the PGRFA user community. The system provides partners with a central data repository and allows the collection of standardized phenotypic data in the framework of EVA. It features filter and display options and can facilitate the publication of datasets after the project embargo through integration with EURISCO. EURISCO-EVA served as a blueprint for data management infrastructures developed in other European projects like AGENT (https://agent-project.eu/) and INCREASE (https://www.pulsesincrease.eu/). By adopting a common model and protocols, these projects standardize and thereby facilitate the exchange of data and information among different databases and systems. This interoperability enhances collaboration and coordination among various stakeholders involved in PGRFA conservation and breeding efforts across Europe and beyond.

The EURISCO-EVA Information System

Database content

The EURISCO-EVA database currently stores the data of five crop networks: carrot, lettuce, pepper, maize, and wheat and barley. The wheat and barley network accounts for three different crops, barley (Hordeum vulgare L.), durum wheat (Triticum turgidum L. subsp. durum (Desf.) Husn.) and common wheat (Triticum aestivum L.). In the lettuce network, data on wild prickly lettuce (Lactuca serriola L.) are stored together with data on cultivated lettuce (Lactuca sativa L.). In the pepper network, five species are considered: Capsicum annuum L., C. baccatum L., C. chacoense Hunz., C. chinense Jacq., C. frutescens L. At the time of writing, these five networks thus cover a total of eight crops accounting for a total of 4,845 accessions and 282 phenotypic traits with data. The 89 network partners work in 33 countries, carrying out characterization and evaluation activities in 115 experiment locations. As of June 2024, more than 500,000 total phenotypic data points have been collected in 382 phenotypic trials and this number is continuously growing. Table 1 provides an overview of the data of the different crop networks and Figure 1 presents a summary of the data points, phenotypic trials, traits and evaluated accessions by country as of June 2024.

Table 1: Summary overview of the data available on EURISCO-EVA of the current five EVA networks (as of 11 June 2024). Countries of operation are those where trials are performed, experiment locations refer to locations within countries.

All Networks

Carrot

Lettuce

Maize

Pepper

Wheat & Barley

Crops

8

1

2

1

1

3

Accessions evaluated

4,845

67

291

861

181

3,445

Partner institutes

89

14

12

18

15

47

Countries of operation

33

8

8

9

13

25

Experiment locations

115

14

6

30

10

58

Phenotyping experiments evaluated

382

27

13

63

15

264

Traits evaluated

282

138

21

51

26

46

Phenotypic data points

510,097

88,199

10,217

90,359

19,327

301,995

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/a100b32d-89b1-4c62-9dd1-471dd73411a9image1.png
Figure 1: Summary of evaluation data and metadata in EURISCO-EVA for all crop networks disaggregated by evaluation countries. a) Total number of evaluation data points obtained by country, b) Total number of phenotypic trials evaluated by country, c) Total number of traits evaluated by country, and d) Total number of accessions evaluated by country (data as of 11 June 2024).

In addition to phenotypic evaluation data, EURISCO-EVA stores relevant metadata such as accession passport data, trait and method definitions, information on phenotypic trials, network partners and genotyping experiments. A new record for any of the metadata can be created or the existing record can be updated as needed. A new partner can join an existing network; however, they will only have access to the network's data created on or after their joining date. The EVA accessions’ passport data follow the Multi-Crop Passport Descriptors standard (Alercia, Diulgheroff, & Mackay, 2015) along with some EVA-specific identifiers, e.g. material type (original accession, single-seed descent line, cross or check), EVA ID, male and female parent for crosses, and parent DOIs, where applicable. These project-specific parameters allow categorization of the accessions in the network and can be adjusted centrally by the EVA coordinator. Each trait is defined by a unique trait acronym, its trait name, a detailed trait method description, a Crop Ontology term, trait group, measurement unit and allowed scores. The allowed trait scores are of two types, metric and rating scores, and define the range of allowed values or the allowed scoring entries, respectively. Traits can be further grouped into different categories, such as morphological, agronomic, quality or (a-)biotic stress traits, to facilitate searches and filtering. The trial definition broadly consists of the trial location (ideally by GPS coordinates), experimental and field design as well as meteorological and soil conditions, which are important parameters when comparing data from multiple locations. The evaluation data are stored as trait scores that were observed for the accessions under defined trial conditions. The system can also host additional data like accession images and links to genotypic data repositories. For every partner involved in the project, there is information on the location of the organization and the trials as well as contact information of the responsible persons.

Technical requirements, features and technology

The EURISCO-EVA infrastructure was developed as an extension of the European Search Catalogue for Plant Genetic Resources (EURISCO), operated by the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Germany, on behalf of and under the supervision of ECPGR, providing additional necessary functionality beyond the existing EURISCO infrastructure in the framework of the abovementioned EVA network. The EURISCO-EVA Information System is also maintained on behalf of ECPGR and has been developed by IPK. It has been available online to the EVA network partners since 2022 and its management is handled by the ECPGR EVA coordinator.

The primary access point to EURISCO-EVA is its web interface (https://eva.ipk-gatersleben.de/), developed using the Oracle Application Express (APEX) technology, version 21. Partners can use their login credentials to access their network and view their network’s data collected on or after their joining date. There are two entry points: a general homepage and a main page for each crop network. After logging in, users first access the common homepage, which introduces the EVA project, offers user manuals, provides downloadable templates (see Supplemental data 1 and 2) for recording or uploading data, and includes general information about the data and metadata stored in the system.

Every crop network has its homepage with the same type of information, accessible to the network partners only. The network homepage has four tabs (Figure 2). The first tab ‘Trial Locations’ shows a map giving users an overview of the geographic diversity of the experimental trial locations. The data on the map can be filtered and searched using one or more available filters, such as crop, year, experiment group, organization, country or trial ID. The second tab ‘All Data’ provides on the left-hand side a statistical overview of available data for the crop network and the total data of all crop networks (Figure 2). Moreover, it has several cards that act as hyperlinks to generate reports on accession passport and phenotypic data, partner information, specifics of the phenotypic traits as well as phenotyping and genotyping experiments. The third tab ‘Available Metadata’ provides numerical information on relevant available metadata, including passport data, trial details, trait definition and network partner details, already grouped by important parameters such as accession material type or trait group. The hyperlinks in the statistical reports open a filtered report corresponding to the selected parameter. The fourth tab ‘Available Observed Data’ displays phenotypic data already grouped by crop or species, predefined experiment group or year as well as country of origin of the accessions and institutes maintaining the material.

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/a100b32d-89b1-4c62-9dd1-471dd73411a9image2.png
Figure 2: EURISCO-EVA crop homepage for the EVA Wheat and Barley network showing overview statistics of the network on the left and the shortcut cards leading to reports for metadata and phenotypic data. Additional tabs provide access to pre-filtered data and metadata reports tabs (data as of 25 September 2024).

The user interface of this application is designed to offer multiple methods for retrieving and filtering data to suit different needs and preferences. Below is a detailed overview of these methods and how they work:

1. Default search and filter options: The default search feature allows users to quickly find information by entering keywords or criteria into a search bar. For example, if a user is looking for data on ‘Genotype A,’ they can simply type this term into the search bar, and the system will display all relevant records containing the string ‘Genotype A.’ Additionally, users can apply basic filters from a predefined set of options such as date ranges or data categories to narrow down the search results.

2. One-click filters: For more common searches, the interface provides one-click filters. By clicking on them, users can instantly retrieve the data without having to apply a filter manually. For example, to retrieve phenotypic data specifically for wheat, simply click on ‘Wheat’ in the one-click filter menu labelled ‘Grouped by Crop’.

3. Advanced searches with interconnected drop-down filters: The advanced search functionality provides a more detailed and precise approach to data retrieval through a series of interconnected drop-down menus, where the selection made in one filter dynamically updates the options available in the following. It provides only those options in the drop-down menus that will lead to actual records in the search results. For instance, if ‘Trial 2024’ is selected in one drop-down menu, the subsequent trait options will be limited to those associated with ‘Trial 2024.’ This dynamic interaction ensures that users are presented with only relevant choices, facilitating a more refined and accurate search.

4. Phenotypic data reports: The interface also includes several types of phenotypic data reports to cater to different reporting needs:

  • Default report: Provides a general overview of the data, with each individual data point listed.

  • Overview report: Groups data by trait and trial, offering a broader perspective on the data collected.

  • Customizable pivot report: Users can create customized pivot reports by selecting a crop and up to five specific traits, along with optional parameters. The report displays the selected traits side-by-side, allowing for a comparative view.

5. Report download options: All reports generated through the interface can be easily downloaded in Excel, CSV, HTML or PDF format. This functionality allows users to further analyze or share their data outside the application.

This combination of search methods and reporting tools provides users with a robust mechanism for data retrieval and analysis, ensuring they can access and manipulate the information in ways that best suit their needs.

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/a100b32d-89b1-4c62-9dd1-471dd73411a9image3.jpg
Figure 3: Examples of data visualization in the EURISCO-EVA user interface. a) Comparison between distribution of two trait scores using box plot, b) Histogram showing the frequency of scores of a specific trait.

The chart functionality within the web interface enhances user engagement by offering a flexible and interactive experience (Figure 3). Users can customize charts by selecting data through interconnected drop-down menus, ensuring precision in data selection. Moreover, the chart feature provides access to detailed pages, offering in-depth insights into the viewed data, such as displaying the distribution of data, the frequency of observed values across all (or selected) experiments or comparisons of data collected over different timepoints. Beyond its customizable nature, the chart functionality serves as a gateway to further analysis via the download of associated data as described above. Users can explore detailed reports presenting the total data points per experiment, providing a comprehensive overview of the dataset. Additionally, the chart facilitates the visualization of statistical measures, including mean, variance, median and mode. This multifaceted approach not only empowers users to create personalized visualizations but also supports comprehensive data exploration and analysis through detailed information pages and statistical insights, as can be seen in Figure 3.

Data standardization and upload

In EVA, a big effort was put into the standardization of data collection. The standardization process involves the central maintenance of metadata, i.e. trait definitions, experimental design and passport data. The partners collecting phenotypic data reference these metadata, and the uniform methods and scales of the collected observed phenotypic data make the data easy to understand, analyze and exchange among providers and users.

Generic data collection templates that include all important information for collecting multilocation trial data were developed in order to ease the upload of the data. To reduce the load on background programmes for reading and validating data, and to make the processing faster, a data collection template consisting of two Excel files was created and is available in the documents section of the EVA website (see https://www.ecpgr.org/eva/documents-and-links/evaluation-protocols-and-templates and the current version v1 included as Supplemental data 1 and 2). One data collection template (Supplemental data 2) was designed for collecting and updating the observed evaluation data with minimal trial details and is used by the data providers. The second collection template (Supplemental data 1) is used for creating and updating metadata, including trial details, trait definitions, accession passport data and partner details. This template is centrally completed by the EVA coordinator with input from partners, making the overall data collection simpler, faster and less prone to error. The EURISCO-EVA data templates aim to fulfil the MIAPPE standard in the best possible way while keeping them simple for use by diverse stakeholders. Moreover, most of the accessions that are part of EVA are already documented in EURISCO. To keep the passport data in the two systems consistent, a process was implemented that automatically synchronizes the passport data from EURISCO to the EURISCO-EVA Information System.

The overall phenotypic data upload and management consists of three parts (Figure 4): 1) data file upload: the website features a streamlined four-step file uploader, allowing users to easily upload the phenotypic data using the aforementioned generic data collection template, 2) data processing: the background import programme initiates upon file upload, reads data from the file, validates the information, and subsequently writes it into the designated database tables, 3) data presentation: data is retrieved from these tables and is utilized on the front end to generate reports and charts on the web application (Figure 3; Figure 2).

Users are only able to upload data for experiments that they are responsible for. If errors are detected during data upload and validation, a message is logged and shared with the user, with a description of the error and an explanation of how to fix it in the data template. Most errors are with formatting or values that are outside the allowed range and the error log enables users to easily identify and fix the issues. Once the processing is successfully finished, an email with the processing log is sent to the user, confirming the successful upload.

https://s3-us-west-2.amazonaws.com/typeset-prod-media-server/a100b32d-89b1-4c62-9dd1-471dd73411a9image4.png
Figure 4: EURISCO-EVA dataflow and capabilities overview.

Apart from the evaluation data, up to five images per accession can be uploaded to illustrate its appearance and specific characteristics. The images are associated with the accession and thus enrich the passport data available for accessions in EURISCO-EVA. The image uploader allows the user to map the images also to a certain trait or trial, where applicable, which makes the images searchable. Moreover, users can upload additional files to their experiment that contain further trial information. Since the data in these files are not written into the database, they can include various file formats providing for example graphical representation of field layouts or initial statistical analyses. These files are included in the trial detail report and are available for download by all users.

Database implementation

The EURISCO-EVA Information System is based on an Oracle relational database management system version 19c. The foundational database schema encompasses 49 tables (see Supplemental data 3), while the business logic was implemented using PL/SQL, primarily focused on ensuring data quality, enhancing performance, enabling user-specific download capabilities, facilitating reporting tasks, automating data manipulations and uploads, and executing scheduled removal of unnecessary data. The EURISCO-EVA Information System provides a set of functionalities for administrators to maintain the website content. The website is highly scalable to new crop networks and is currently being expanded to host the data of the EVA Legumes network, which started operations in 2024, as well as a demo network for the interested public.

EVA’s data model is trial-based, and designed for collaborative, multi-environment evaluations, incorporating data from both public and private partners. Unlike accession-based models like EURISCO, which focus on cataloguing genetic resources, EVA emphasizes ongoing evaluation data collected continuously throughout active trials. Moreover, EVA standardizes trait definitions and measurement scales across all experiments, ensuring that data from one partner is fully compatible and usable by all other partners. This consistency significantly improves data comparability and usability, making it highly valuable for advanced data analysis, including comparative studies, large-scale data mining and breeding decision support. The combination of continuous data collection and standardized traits enables EVA to deliver more actionable, high-quality insights for breeding and research compared to existing solutions.

Compared with other information systems developed for managing data on PGRFA characterization and evaluation such as Germinate (Shaw et al., 2017) or Grassroots (Bian, Tyrrell, Olvera, & Davey, 2017), EURISCO-EVA is not an open-source project software that can be easily installed and applied to new projects. Instead, EURISCO-EVA is operated by ECPGR as a long-term service to the PGRFA community and is very closely linked to EURISCO as the central European PGR information system.

Outlook

Since its inception, EURISCO-EVA has facilitated data curation and management and enabled the analysis of complex datasets by the existing EVA networks, which have produced several publications (Balconi et al., 2024; Goritschnig et al., 2023; Tripodi et al., 2023). Although EURISCO-EVA has been built as a platform with restricted access to the data, its use of the same background system as EURISCO can easily facilitate the incorporation of phenotypic data in the public database after the end of the data embargo periods. Ensuring the public availability of the generated evaluation data is one of the core values of the EVA networks and will be provided through EURISCO. However, some discussions are still ongoing about how to ensure that useful data is available to the public (e.g. all raw data vs. experiment means, considering also the quality of data from individual trials).

The EURISCO-EVA Information System has a wide range of features for a beginner as well as an advanced user among EVA network partners. A beginner user, who may find manual searches and filters challenging, can easily open reports by applying a pre-defined filter with just one click. A more advanced user can apply searches and filters with several drop-down selection lists to customize their reports. The data collection uploader is self-explanatory, highly intuitive and easy to debug. Besides common features allowing users to view, filter, visualize, email and download data, the web application provides additional features like pivot reporting and data visualization, which take user input to dynamically create custom pivot reports or highly intuitive charts. It is a highly scalable system in which the administrator can easily configure a new network with the existing import programme and template for partners. As mentioned, the EVA Legumes network has recently been added.

In the future, the error management of data processing could be made more robust and user-friendly. Also, an immediate value could be added to the EURISCO-EVA Information System by integrating a data analysis module into it. Moreover, an open-access demo network will be configured so that new potential partners can have the feel of EURISCO-EVA’s user interface, background programmes, easy-to-use reports and data visualization features, without accessing restricted data. The EURISCO-EVA Information System is an ongoing initiative that integrates data on PGRFA evaluations from multiple partners and locations. Considering that crop characterization and evaluation data are often scattered in various data sources and publications and lack standardization (Ćwiek-Kupczyńsk, Altmann, & Arend, 2016), the EURISCO-EVA Information System provides a user-friendly and versatile environment that enhances data interoperability as well as standardization (by considering uniform traits and methods) of phenotypic data in the framework of the evaluation activities of the EVA initiative.

Conclusions

In this paper, we described the development and implementation of EURISCO-EVA, an information system for PGRFA, which supports the management of metadata and experimental phenotypic data for EVA, with the possibility to provide FAIR public access to data after an embargo through its interoperability with EURISCO. EURISCO-EVA is being maintained by ECPGR and upgrading with additional elements will be possible in the future. EURISCO-EVA provides a gateway for important evaluation data describing genetic resources, adding value to genebank collections and enabling users across the globe to access phenotypic data for genebank accessions through EURISCO.

Acknowledgements

The authors wish to thank all partners of the EVA networks for their feedback on earlier versions of the database and suggestions for improvements and new functionalities. This work was supported by the German Federal Ministry of Food and Agriculture through grant GenRes 2019-2 to ECPGR for the implementation of the EVA networks. The authors are grateful to the two reviewers for their useful comments and suggestions on an earlier version of this manuscript.

Supplemental data

Supplemental data 1: Metadata creation templates

Supplemental data 2: Phenotypic data collection tem-plates for users

Supplemental data 3: Foundational database scheme of EURISCO-EVA

Authors contribution

Suman Kumar was responsible for the initial conception and design. Suman Kumar also wrote the initial draft of the manuscript, with Filippo Guzzon contributing by enhancing and refining the content. Sandra Goritschnig and Stephan Weise reviewed and edited the manuscript, providing critical input to ensure its relevance. All authors read and approved the final manuscript.

Conflict of interest statement

The authors declare that they have no conflicts of interest.