Tumgik
#forYourNextDP
Text
validating named locations
You can (and should!) use the named location validation to validate any named locations in your ingest, EVEN IF those locations aren’t the “location of record” in the database. The way this will most often come up is when ingesting data from an external lab. The location on the data is the location where the sample was collected, let’s say plotID. But you’re also ingesting the lab’s name, usually laboratoryName. In entryValidationRulesParser, put [NAMED_LOCATION_TYPE(External lab)]. Don’t use an LOV to validation the external lab name.
0 notes
Text
new rules for sampleGroups
New guidance from Ross on sampleGroups: we need to provide a value in sampleGroups for any field that goes to the sample management system. We had been populating sampleGroups only for sample IDs, fates, and barcodes, which is fine if that's all you're sending to SMS. But if, say, processedDate is going to SMS as well, populate sampleGroups to indicate which sample in the table it's associated with.
Figuring out which fields should go to SMS is an entirely separate problem, of course. Use your best judgement for now, and in the new system, updating a workbook to populate sampleGroups and smsFieldName for a field where they were NA before shouldn't be a huge deal.
Ingest prep checklist and transposed template have been updated with this change.
0 notes
Text
Test datasets
ATBD authors: Some wisdom on test datasets
The current plan for testing in the new CI pipeline anticipates your test dataset will be ingested into a CI test DB. This means it must look EXACTLY like the data going in via fulcrum and/or the spreadsheet, and the outputs must look EXACTLY as they would coming out.
To this end, you will need to make your namedLocations in your golden input look like a real CI named location (e.g. UKFS_010.mammalGrid.mam NOT UKFS_010 [this is also needed to join with the spatial data via API, see below)]. You will also need to make your dateTimes look as they would coming in via spreadsheet or via fulcrum (YYYY-mm-DD or YYYY-mm-DDTHH:MM). Please bear this in mind making test sets and ATBDs going forward.
You'll also need to make your golden datasets contain spatial data that matches the expected output from CI. The only way to ensure this actually happens is to use the same spatial data. With the release of CI's new API, I wrote some code to pull spatial data directly from the CI servers, which should help with the alignment of test vs real outputs in the algorithm testing phase. It is in devTOS->atbdlibrary->get_localityInfo. Given a set of named locations (and yes, they must be real named locations, not plotIDs), it will return to you the lat/long/elev and a bunch of other stuff CI has stored. Please use this going forward rather than faking and/or taking a snapshot of the spatial data. Remember - to use the new functions you'll need to reinstall the library. Note that geodeticDatum doesn't seem to be available via the API, so you may need to 'fake' that on in your test datasets too. If you want some example code/workflow, look in the rpt ATBD, it's working there.
0 notes
Text
publication "usage" update
Small update to the publication workbook: we’ve dropped “transition” as an option in the “usage” column. It’s redundant with usage=“both” and it’s not terribly useful. “both” and “publication” are the options for usage, and should be applied at the table level, i.e. usage shouldn’t have different values for different fields within a table.
0 notes
Text
How to update your existing ATBD in 8 ..or maybe 17...easy steps.
If you are starting an ATBD anew, just pull and clone the ATBD library and ignore this.  If you started an ATBD and have noticed changes in the template, here’s what you need to do to be Agile compliant without starting over.  Mostly it’s careful copy and pasting.  
1. Replace your logo.png with the new one, which is in with the ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\logo.png -> the new logo won’t have the trademark in it, if you want to check you did it right
2. Replace your first section (starting with – fontsize: 11pt THROUGH word_document: default –) with the corresponding section in: ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\skeleton.Rmd
3. Replace your section (starting with [//:] TEMPLATE SECTION 1 THROUGH [//]: TEMPLATE SECTION 2) with the corresponding section in: ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\skeleton.Rmd
4. Search for ‘Remove the next three lines for ATBDs’, and delete the next 3 lines
5. Replace your section (starting with ## PURPOSE THROUGH ## SCOPE) with the corresponding section in: ~devTOS\atbdLibrary\inst\rmarkdown\templates\atbdTemplate\skeleton\skeleton.Rmd
6. Delete the variable reported table
7. Add to variables reported section a final sentence: Some variables described in this document may be for NEON internal use only and will not appear in downloaded data. These are indicated with **downloadPkg** = “none” in `r pubName` (`r ADlist[“pub”,“ref”]`). You may need to adjust the reference in the above sentence to whatever reference your pubwb tables, depending on how you set up that reference.
8. Copy in the new data constraints and validation sections (copy and paste from the skeleton.rmd to replace your existing text. But, look before you paste over if you want to retain any notes to your fulcrum buddy in RED).  You should now have sentences about ## User Interface Specifications: all forms, not things about webUIs vs MDRs.  Your new section should end with ‘1.  All date fields can be entered as dates or dateTimes, the parser will interpret whether time is included based on the formatting.’
9. (updated 10/5/2016) smsOnly fields can occur in your example data.  You may want to remove them when simulating the parser steps (e.g. before you start implementing your algorithm) since these fields will be ignored in any de-duping, etc, as they will not be available in PDR)
More steps added 10/3/2016
10.Adjust your code so it writes out the namedLocation in the L1 goldenData
11.Reformat dateTime fields as necessary to match the preferred CI formatting
12. Make sure your namedLocations are REAL ones that exist, and that you are using the API to populate things looked up from the spatial data table.
13. Make sure your Equals:type samples [EXIST] , if specified by the workflow
14. Samples -> make sure you are passing both the barcode and the id (but not the fate)
15. For any calculations/logic done on sampleIDs, paste in example syntax to algorithm implementation from the skeleton (’In every instance in the algorithm in which a sample tag (generally corresponding to a fieldName of the form xxxSampleID) is used to look up data records, the lookup should be first attempted via the sample barcode. If the sample barcode is not populated, proceed using the sample tag. on using sample tag if it exists, otherwise sample barcode’)
16. Add text (Populate the location description values…)and code from the skeleton to populate the publication location-y things (domainID, plotID, locationID, etc. Copy and paste the sentence from the template that begins ‘ “The named location for each”
17. Make sure your de-duping says whether to treat NULL values as different, or resolve, and that the code and language match.
Updates 10/10/2016
18. Delete the section on sample creation rules, formerly started with ‘ ## ##Sample creation rules
19.It is not necessary to include a list of fields that are NOT passed from L0 to L1 (though if you have it you can keep it, it can be hard to keep up to date
20. Add transitionID to the golden L0 and L1
21. Make sure column headers on golden_in match entryLabelIfDifferentFromFieldName
22. Specify whether you want fields that are NOT passed L0-> L1 in the dedupe check
23. Put your testing files in CI_files subdirectory and name them correctly and clean out any extra bonus files on there so there’s no confusion
24.If you have taxon fuzzing, copy in the new syntax with namedLocation instead of dXX, and where the redaction is folded in.  If you are copying and pasting from the template,the sentence starts with ‘ For each record *p* of `r pTable[“id”]` where **targetTaxaPresent** is ‘Yes’
0 notes
Text
UID for Fulcrum tables
Tables ingested via Fulcrum should include a uid field, contra earlier guidance. It's parserToCreate should read [CREATE_UID], just like in tables ingested by spreadsheet.
0 notes
Text
placeholder quality flag
EVERY ingest table needs to contain a placeholder quality flag field, called dataQF. We'll use this field as a catchall for new/uncategorized external lab quality flags, manual flagging via SOM, placeholding for uncertainty in what will be available from external labs.
dataQF should be passed through to L1, but with downloadPkg=none.
0 notes
Text
named location validation
New update to ingest workbook template - see mosquito example:
All named locations must have a [NAMEDLOCATIONTYPE()] validation, even if they are populated via a DERIVE_FROM_SAMPLE_TREE() in the parserToCreate field. For mosquitoes, this means the data associated with field samples gets a location validation of [NAMEDLOCATIONTYPE(OS Plot - mos)], and the data associated with sample mixtures gets a location validation of [NAMEDLOCATIONTYPE(SITE)], because plots are mixed within sites and so the smallest common location for the mixtures is the site. This extra validation means the parser will reject data if the lab attempts to send back data after mixing samples across sites - because then the smallest common location would be domain or realm, neither of which is a valid location type for these data.
If you don't have any sample mixtures, the named location type of the derived location will be the same as the type on the original sample. If you do have sample mixtures, be careful to include all possible location types that will result.
0 notes
Text
Cleaning up text formatting - will be done on ingest, no longer necessary to include in your ATBD
Hi All,
In the wonderful new world of The Parser, Team Parser has agreed to take on the stripping of empty whitespaces, conversion of double quotes to single quotes, etc during the INGEST process.
What this means for you:
1. Use the new ATBD template so it's documented
2. Remember to include the [ASCII] function in the form or parser validation on free text entry string fields to ensure that your data ENTERs the system without nonascii characters (if desired)
3. DELETE (if you had included it) the algorithm in the ATBD to remove special characters.  
We want the ATBD coding to be as efficient as humanly possible, so no need to put this in two places!
0 notes
Text
De-duping when one of the keys is not a required field
For all de-deduping algorithms where one of the ‘keys’ is not a REQUIRED field, make explicit how CI should deal with null values. Generally we would want to either flag with -1 (not checked)? OR treat NULL as one of the unique values equivalent to any other value.
0 notes
Text
noDataOutComeUI vs noDataOutcomePDA - can no longer differ
With the move to fulcrum, we can’t make different rules for PDAs vs webUIs.  Thus for your purposes, there should be only one set of rules for NULL behavior.  Recommend (as always) to set namedLocation, dates (both required by CI), and very critical elements of the data (critical meaning you’d rather have them just throw away the record then submit a record without those data) be noDataOutcome = fail, and everything else = warn.  Anything they might forget to have noted, but the data are still useful without, should result in a warn.
0 notes
Text
What's required in each table?
For indexing by CI:
Each table MUST have a date (or 2 dates, e.g. start/end, set/collect)
Each table MUST have a namedLocation (something that exists in the TOS spatial data/aquatic LHDD).
If you have samples, these values can be looked up by CI on ingest, but placeholder for them must be in the DIWB.
For synergy with other NEON dataproducts, it is recommended to have:
personnel, including whatever is relevant of:collectedBy, recordedBy, enteredBy, 
measuredBy.  
These are useful for traceability on observer error and tracking down problems.  But they are not *required* by CI.  So if your external lab isn’t returning it, we’ll live with it.
For neon technicians, these names are anonymized using the proper code in the pubwb.  If your external lab is fine with you publishing their real names, feel free to leave them asIs.
0 notes
Text
Fixes to readme_template
The readme_template in how-to-make-a-data-product -> Publication Workbook OS has been updated to reflect the following changes:
1. Link to documents changed from neoninc to neonscience
2. long dashes (non-utf8 text) replaced with regular dashes
3. Non-utf-8 quotes replaced with regular quotes.
Please use this template going forward.  I updated the existing readmes on the portal this AM, and I can manually fix anyone else’s who has already made one with the old template, if you tell me which ones to fix.
0 notes
Text
rowNumbers in example datasets
CI wants row numbers in golden datasets to illustrate and track operations performed on records in example tables. This post will be updated when this is added to the atbdLibrary skeleton.
0 notes