SDIS Help

NOAA Ocean Acidification Program
Scientific Data Information System

Table of Contents
Introduction
Quick Tips
Main Page
New Submission
Column Identification
Manage Data File
Preview Plots
Metadata Entry
Supplemental Documents
Submit To Archive
Clone a Submission
Supported Browsers

There is a 10 minute video tutorial of the SDIS available.

Introduction

The Ocean Acidification Program’s Scientific Data Information System (SDIS, the “Dashboard”) is used to upload new and updated datasets for submission for archival at NCEI.

For certain observation and file types, the tool can run data sanity checks to identify obvious or potential errors, and generate preview plots to visually check the data.

The workflow generally follows the tool buttons down the left-hand side, starting with New Submission down to Submit To Archive. After creating a new submission and uploading a data file, you can identify the data columns, upload a new or updated data file if necessary, generate preview plots, edit or upload dataset metadata, and upload any additional supplemental documents.

In addition, you can clone an existing submission to copy the metadata from one submission to start the submission process for another data file having the same or very similar metadata.

To submit a dataset to NCEI, you must enter at least the minimum required metadata. Without that, you will not be able to submit.

Quick Tips

A few tips and suggestions from the NCEI archive crew to help expedite the archival process.

Main Page: Dataset List

On the main page of the application, you will see all of the datasets that you have uploaded, and their status in the various stages of preparation for submission. If you have not uploaded any datasets, your list will be empty.

Along the left side of the page are the tools that are available, including starting a new submission, identifying data columns, managing the data file, entering metadata, adding supplemental documents, previewing the data, and submitting the dataset to the archive. Only those tools that are available for the currently selected dataset will be enabled. To select a dataset, click on the checkbox in the first column.

Top

Uploading Data Files

On the upload page, you are first asked to specify the observation type. This is because the different observation types are handled differently. Currently, data checking and preview plots are only available for timeseries, surface measurements, and profiles.

You may upload data files in any format, but only Excel spreadsheets[ 1 ] and delimited (eg. CSV, tab-delimited) text files with a known format can be checked for errors and previewed. Readable files have the following general format:

The first line with at least 5 successive columns with non-numeric values will be assumed to be the column headers. All subsequent (non-comment) lines must have the same number of columns as there are headers. Lines at the beginning of the file before the column headers will be ignored, provided they do not contain multiple columns, as will any line that begins with the ‘#’ character, assumed to be a comment line.

Some common errors that have been seen include extra commas at the end of lines causing unequal numbers of columns in rows, whole lines that are quoted, and introductory rows before the column headers that have more than 5 columns.

The following is an example of the start of an acceptable CSV file with units as part of the column name header line. The data header and data values lines were truncated to make it easier to see the format.


Expocode: 42AZ20020427 
Ship: Lollypop 
PI: Waters, R. 
JD_GMT, DATE_UTC_ddmmyyyy, TIME_UTC_hh:mm:ss, LAT_dec_degree, LONG_dec_degree, ... 
110.79219, 19042012, 19:00:45, 12.638, -59.239, ... 
110.79391, 19042012, 19:03:14, 12.633, -59.233, ... 
110.79564, 19042012, 19:05:43, 12.628, -59.228, ... 
110.79736, 19042012, 19:08:12, 12.622, -59.222, ... 

The following is the start of another acceptable CSV file which has units as a second header line and uses a slightly different format for the lines of metadata.


Season = Winter 
Vessel Name = Lollypop 
Investigator = Waters, R. 
JD_GMT, DATE_UTC, TIME_UTC, LAT, LONG, ... 
  Jan1=1, ddmmyyyy, hh:mm:ss, dec_deg., dec_deg., ... 
110.79219, 19042012, 19:00:45, 12.638, -59.239, ... 
110.79391, 19042012, 19:03:14, 12.633, -59.233, ... 
110.79564, 19042012, 19:05:43, 12.628, -59.228, ... 
110.79736, 19042012, 19:08:12, 12.622, -59.222, ... 

1. Currently only single-sheet spreadsheets are supported. Back

Top

Identifying Data Columns

After you upload your datafile, and provided the observation type is one that can be checked and the data file can be read by the system, you can choose to run the data checks on your dataset. To do this, you must first identify the variable type for each column using the DataType Selector. When you upload a data file, the Dashboard will examine it and attempt to identify the column header names and units according to your previous selections. However, the first time you upload a file you will have to go through and identify all the columns.

Note that this is an optional step. It is not required for data submission, and it is not available for all types of data.

The data columns and their suggested identities (their “data type”) will be shown to you in the “Identify Columns” page to allow you to modify these identities as needed. Non-standard missing values for any data columns can also be specified.

Columns of data that are not recognized within the application at this time can be marked as the “other” data type. This will prevent warnings about columns with unknown data types. Any data column marked as the “other” data type will remain in the original data file but will be ignored for data checking purposes. If a column is left unspecified, you will get a warning that the data type for the column is unknown. Unknown variables will also be ignored for data checking purposes.

Currently there is no way for the user to add new data column types. In addition, currently there can only be one column of any given type, ie you may not have two or more columns of the same element with different units.

Once all columns are identified to your satisfaction, you can opt to check your data for errors using the “Check Data” button on the “Identify Columns” page.

This will help verify that the columns are correctly identified. You will be notified of unreasonable values in the data, either because of incorrect data type, incorrect data units, or non-standard missing-data values. If errors are due to incorrect identification of the data columns, you can return to the “Identify Columns” page to correct the data column types and formats and recheck the data without having to upload the data file again.

Note that all observational data samples must have a valid longitude, latitude, date, and time of measurement. Furthermore, the data samples must be ordered in ascending time order - from the beginning of the cruise to the end of the cruise - with no duplication of time values. Datasets with longitude, latitude, or time errors are marked as such in your list of datasets and little can be done with these datasets other than see what problems the Dashboard discovered in the data. You should correct the errors discovered and upload the updated data to the Dashboard to continue work with such a dataset.

The Dashboard has an extensive listing of common column header names/units and their column types. When you identify columns that were not recognized, which were incorrectly identified, or which uses a non-standard missing value, the Dashboard saves this information as your personal customization of the Dashboard. Future uploads of data files will then use your personalized list of column headers and their types, and assign the types you last gave for these column headers. So if you use column names that the Dashboard does not recognize, you only have to identify the columns once, and the Dashboard “learns” these column names.

Some common mistakes in the date format include the ordering of the day, month, and year as well as whether the year is two digits or four. For full-date formats without separators between the day, month, and year, the exact number of digits for that format must be present. Any standard separator, not just the one shown, is acceptable in date formats with separators.

Note that the SDIS does not make any changes to your data and you cannot edit your data within the SDIS.

If you need to make changes to your data, edit your copy, and then upload the edited file using the Manage Data File tool.

Top

Data Checker Messages

If the data check found any errors or issues with the data, they are shown on the Data Errors and Warnings page.

The errors and warnings will have a brief description of the problem, as well as the row and column if applicable.

Double-clicking on the error or warning row in the list will take you to the row and column in the data.

Top

Manage Data File

The Manage Data File tool is used to upload a new data file for a submission record. This file will replace the existing file.

Top

Previewing Data

After uploading and checking the data for a dataset, you may be able to examine plots of your data on the “Preview Data” page.

This is an optional step and is only available for certain observation types and files in recognized formats.

These plots allow you to quickly detect inconsistencies or other errors in the data you provided. If you discover errors that need to be corrected, you should correct your data file and re-upload the updated dataset. Sometimes the error may just be a misidentification of a data column, its units, or its missing value. In this case, you can correct the identity of the data columns in the Dashboard without having to upload the data.

Please note that for larger datasets, the generation of the plots can take as long as a minute or more.

Top

Metadata Entry

Before you submit a dataset for archival, you must provide metadata for the data you have uploaded. This metadata should identify each of the data columns in your data file (including data columns that are marked as “IGNORED” data types) and describe details of how the data was collected. The integrated MetadataEditor assists in this effort.

The Metadata Editor provides editable forms to cover all of the requested OAP dataset metadata corresponding to the NCEI OADS metadata SubmissionForm Excel spreadsheet. If you have a completed Excel SubmissionForm from a prior similar data submission, you can upload that form into the Metadata Editor and then edit only the changed fields. Otherwise, you must fill out the metadata for those fields that are pertinent to your dataset.

The SDIS will fill in some fields, such as Data Submitter information and -- if the dataset was checked -- the geospatial and temporal bounds. If filling out the metadata from scratch, you may find it helpful to fill in common information, and then download that to be used as a metadata template for future submissions.

While there is extensive metadata that is requested - and is desirable in order to better understand and use the data, there is only a limited number of absolutely required fields without which you will be unable to submit for archival. Those fields include the Data Submitter name, organization, and contact information, at least one PI identified, and citation Title, Abstract, and List of Authors. In addition, at least one data variable must be defined, and all defined variables must include at minimum the variable abbreviation in the data file, the full variable name, and units if applicable.

Top

Supplemental Documents

Within the SDIS, you can also upload any additional content in whatever format you chose, using the “Supplemental Documents” page. The files you upload will be simply uploaded and associated with the dataset; the contents are not examined or modified in any way. Because of this, the uploaded files should be in standard formats that will be easy to read by scientists using or reviewing your data.

Top

Submit to Archive

After entering or uploading metadata for the dataset, you are then able to submit the data, metadata, and any additional documentation, for archival.

If the metadata are incomplete, you will not be able to submit. See the Metadata Entry section for the minimum required metadata.

On the Submit To Archive you have the option to add a submission comment. This is only for any special instructions to the archive staff and should not be used for metadata or other important information about the dataset, as it will not be archived with the dataset. You may also request that a DOI be minted for the dataset. If the automatic data checker found what it thought to be errors in the data, or the dataset has not been checked, you will be required to acknowledge that you are submitting the dataset in spite of these findings. You will also be required to agree to the NCEI Publication Policy Agreement.

Top

Clone Submission

Cloning an existing submission record will create a new submission record without a data file with a copy of the metadata from the existing submission.

After cloning a record, you are required to check the metadata to be sure that all the fields are still appropriate for the new data file. If this has not been done, the Submit to Archive page will report that the metadata is incomplete and disallow submission.

Use the Manage Data File tool to upload the data file for the new submission record.

Top

Supported Browsers

This application has been tested on

It is not currently supported on Internet Explorer. We apologize for any inconvenience.