rbatt / trawldata Goto Github PK
View Code? Open in Web Editor NEWCollate and clean bottom trawl survey data
Collate and clean bottom trawl survey data
@mpinsky @JWMorley @bselden What are some variables that change far more over time than they do over space?
E.g., bottom temperature might change among years, but there's also a fair bit of spatial variabilty within a region, too, I'm guessing (!).
I was wondering if ENSO would be a good predictor (or NAO, PDO out west)
We have a few predictors that change over space but not time (lat, lon, depth, rugosity [coming soon]), but not as much that really varies primarily over time (can use year as a predictor!).
Just something to think about as a future enhancement.
In 2013, there's a really cold 'stratum' (according to my definition) in SEUS. It's the southern-most stratum, and the region-wide average temperature in 2013 is about 4 degrees cooler than the long term average.
@JWMorley Have you seen anything like this in your analysis of these data? I haven't dug into the raw data yet.
One of the slowest steps in my current workflow is aggregation. In particular, aggregating within a species-haul. This is only needed to aggregated among individuals or sex, or if taxonomy has changed, to aggregate species previous ID'd as different taxa but which should actually be grouped.
My (informed) guess is that most of the time there is only 1 row of data per species-haulid combination. Thus, it might be a speed-up to only perform the aggregation on subsets with more than 1 row of data.
An example could be to define a new column or external indexing/ reference vector like in the following:
# clean.ebs[,nrow(.SD),by=c("haulid","spp")] # IS ACTUALLY SLOW!
multiple_rows <- clean.ebs[,length(wtcpue),by=c("haulid","spp")][,V1 > 1] # snappy :)
Note that it takes a lot longer to count the rows in .SD
than to simply identify the length of one of the columns. This is probably because the .SD
data.table still has to be populated (I think, not sure), and also contains all of the columns (although, even clean.ebs[,length(wtcpue),by=c("haulid","spp")][,nrow(.SD),by=c("haulid","spp")]
is slow, so the extra columns isn't the issue).
After identifying rows that would form .SD's with more than 1 row, can probably add an i
logical vector at the final aggregation step. But this could be slow as it would evaluate the i
for each combination in by
(so it would skip using the aggregation function, but would still go through each combination). A faster alternative may be to split the data.table into 2 data.tables: 1 that needs aggregation, and the other that doesn't. Then perform aggregation on the appropriate data.table, and recombine the two.
Note that this approach might be hard to implement because it assumes that if there's only 1 row in a .SD
, then nothing needs to be modified. For aggregation, this is generally true. However, the nAgg
column would still need to be added to the non-aggregated portion (all would have nAgg := 1
), and the lu
and drop
functions alter the value even if there's only 1 (as might a custom function).
However, something like this could be pretty handy, and checks/ warnings could be added alongside an option argument to skip aggregation for .SD
's with 1 column. Test could take the form of, say, X[1, j={...}, by=c(byCols)] == X[1]
to make sure that nothing changes when functions are applied to only 1 row of data.
Might also be a good idea to an argument like skip_single_aggs=FALSE
.
The key place to make the change would probably be on this line: https://github.com/rBatt/trawlData/blob/development/R/trawlAgg.R#L230
If desired, I wrote a code to restrict the SODA bottom temperatures to the 200m depth contour
The last two digits of CRUISE6 in raw.neus do NOT appear to encode the correct sampling month.
In the surveys labeled "fall", the month that is calculated using those last two digits implies that sampling occurs every month except January and February.
I checked this against the raw data for cod from the trawl survey available on the OBIS website http://www.iobis.org/mapper/?dataset=1435, and found that the monthcollected for the fall survey only included September-December, suggesting that there is an error in the way that datetime is calculated for neus files in the trawlData package
Code highlighting the discrepancy between the months sampled in clean.neus and the cod OBIS data file is attached below, as is the cod OBIS data file itself.
@rBatt
#1 For those species with an NA conflict field, and a BS-batch flag (from the batch download I did from WORMS), the spp will show the accepted name. But if this is different from the species it matched in ref, the genus will still be the old genus.
Example:
ref=BARBATIA DOMIGESIS
species that was matched in the database (does not appear in file)=Barbatia domingensis
spp=accepted name=Acar domingensis
species=Acar domingensis
genus=Barbatia
See http://www.marinespecies.org/aphia.php?p=taxdetails&id=582484
Will need to subset the data by the BS-batch flag, create a temporary genus column that is a split of spp, then run something along the lines of ifelse(genus.temp == genus, genus, genus.temp)
Like what created these: https://github.com/rBatt/trawl/tree/master/Figures/stratTolFigs
That function is already in repo: https://github.com/rBatt/trawlData/blob/development/R/formatStrat.R#L72-L185
Just needs to be dusted off.
Then need to add a function that can use it to help the user decide how to trim the strata.
Function should have defaults, but user can specify. Maybe this should be integrated into clean.trimCols
Looking here: rBatt/trawl@d91331f
Can see that tolerances I had chosen before were c(ai=3, ebs=5, gmex=4, goa=3, neus=5, newf=4, sa=0, sgulf=2, shelf=6, wcann=2, wctri=3)
I think the year can be retrieved from the first four digits of the haulid, unless someone knows differently
clean.wcann$year <- substr(clean.wcann$haulid,1,4)
expand.data might be too much of a brute for this; but we'll have to see how unwieldy it gets with a simpler but more memory-hungry approach
aggData(X=mini_data, FUN=mean, bio_lvl="spp",space_lvl="stratum",time_lvl="year", bioCols="wtcpue",envCols=c("stemp","btemp"), metaCols=c("datetime","reg"), meta.action=c("unique1"))
gives
Error in as(NA, class(x)) :
c("no method or default for coercing “logical” to “POSIXct”", "no method or default for coercing “logical” to “POSIXt”")
In addition: Warning messages:
1: In if (class(x) == "integer64") { :
the condition has length > 1 and only the first element will be used
2: In if (is.na(i)) { :
the condition has length > 1 and only the first element will be used
No bottom temperature in the Scotian Shelf in 2011. @mpinsky sorry to keep pinging you on things like this, but any ideas?
pinskylab/OceanAdapt#45 and pinskylab/OceanAdapt#44; related to #6 here.
There's now a few things swirling around related to my confusion on the issue:
haulid
, or is that eventname? Jim says latter; need to checkhaulid
, and/or make sure that Jim's corrections are working as intendedEven though I might be using COLLECTIONNUMBER as haulid
instead of EVENTNAME, that still doesn't explain why I'm not geting some rows returned (because I'm still referring to the same columns as Jim; switching the column name won't affect the subsetting).
trimData(clean.ai)
gives:
Error in setcolorder(X, cols4order) : x is not a data.table
In addition: Warning message:
In is.na(c.match) : is.na() applied to non-(list or vector) of type 'NULL'
Most regions need updating.
Can get US updates from the OA repo, generally.
Need new data requests for all of the Canadian data sets.
I tracked this back to NA seasons in this year. Most other years the region had season defined in some way other than my own getSeason()
, so I wrongly assumed it was there for all years.
Sometimes the weight is 0, even though the count is positive.
Could fix with length/ weight regressions, particularly for neus (which has length data).
Would need to update spp.key by adding parameter columns, and getting parameters from rfishbase. Then in clean.columns could add a step to fill in these cases using the regression.
Instead of relying on fishbase, could also just find average weight per individual, or the regression, from the trawl data itself.
library(httr)
POST('http://oceanadapt.rutgers.edu/download/',
encode = c("form"),
body = list('page-action'='submit-info',
'my-name'='Luke+From+R',
'my-email'='[email protected]',
'my-institution'='inst',
'my-purpose'='R testing ')
)
E.g., from clean.wcann
:
ref haulid weight Individual.Average.Weight..kg.
1: Abraliopsis felis 200603010029 0.010 NA
2: Abraliopsis felis 200803010013 0.010 NA
3: Abraliopsis felis 201003008060 0.070 NA
4: Acanthephyra curtirostris 200303003074 0.002 NA
5: Acanthephyra curtirostris 200303006160 0.002 NA
---
217510: fish unident. 201403008024 0.010 0.01
217511: fish unident. 201403017132 0.420 0.42
217512: fish unident. 201403020009 0.160 0.16
217513: fish unident. 201403020189 1.250 NA
217514: shark unident. 200303006151 2.300 2.30
Survey year vessel Cruise.Leg Trawl.Performance date
1: Groundfish Slope and Shelf Combination Survey NA Ms. Julie 1 Fisheries Assessment Acceptable 6/2/2006
2: Groundfish Slope and Shelf Combination Survey NA Ms. Julie 1 Fisheries Assessment Acceptable 5/19/2008
3: Groundfish Slope and Shelf Combination Survey NA Excalibur 2 Fisheries Assessment Acceptable 9/7/2010
4: Groundfish Slope and Shelf Combination Survey NA Blue Horizon 3 Fisheries Assessment Acceptable 9/28/2003
5: Groundfish Slope and Shelf Combination Survey NA Captain Jack 5 Fisheries Assessment Acceptable 8/12/2003
---
217510: Groundfish Slope and Shelf Combination Survey NA Excalibur 1 Fisheries Assessment Acceptable 8/28/2014
217511: Groundfish Slope and Shelf Combination Survey NA Noahs Ark 4 Fisheries Assessment Acceptable 7/2/2014
217512: Groundfish Slope and Shelf Combination Survey NA Last Straw 1 Fisheries Assessment Acceptable 5/26/2014
217513: Groundfish Slope and Shelf Combination Survey NA Last Straw 5 Fisheries Assessment Acceptable 7/19/2014
217514: Groundfish Slope and Shelf Combination Survey NA Captain Jack 5 Fisheries Assessment Acceptable 8/10/2003
datetime lat lon Best.Position.Type depth Best.Depth.Type towduration towarea btemp stratum
1: <NA> 47.17305 -124.9230 Gear Track Midpoint 174.1 Bottom Depth 16.90 1.615875 7.46 47.5--150
2: <NA> 47.53331 -125.1849 Gear Track Midpoint 562.5 Bottom Depth 15.80 1.573020 4.71 47.5--150
3: <NA> 43.96620 -124.9836 Gear Track Midpoint 601.4 Bottom Depth 15.68 1.687510 4.75 43.5--150
4: <NA> 41.81069 -124.8967 Gear Track Midpoint 974.6 Bottom Depth 25.10 2.998476 3.57 41.5--150
5: <NA> 32.84721 -117.8039 Gear Track Midpoint 1072.3 Bottom Depth 30.60 2.828254 4.20 32.5--150
---
217510: <NA> 47.23652 -125.0741 Gear Start Haulback 754.1 Bottom Depth 17.68 1.564785 4.22 47.5--150
217511: <NA> 37.01164 -122.6967 Gear Track Midpoint 568.7 Bottom Depth 20.35 1.744080 6.28 37.5--150
217512: <NA> 46.85403 -125.1106 Gear Track Midpoint 689.0 Bottom Depth 24.57 2.202347 4.55 46.5--150
217513: <NA> 33.23764 -117.5581 Gear Track Midpoint 390.1 Bottom Depth 18.00 1.654792 8.30 33.5--150
217514: <NA> 33.41229 -118.1399 Vessel Track Midpoint 640.8 Bottom Depth 23.33 2.485380 6.01 33.5--150
Found for:
Most of the functions are intended to interact with data files. However, they currently expect the data files to be part of the package, and don't typically require that the data object be passed as an argument to the function. This will cause problems when the data files are not installed along with the package itself.
A new scheme might go like this:
d <- get_data_file('ai') # new function not yet implemented
trawlData_operation(d) # any old trawl data function, will now require that the data.table be passed as argument
Need a vignette and documentation that'll respond to ?trawlData
Add this metadata for ETOPO depth (object "depth") in trawlData package
From https://www.ngdc.noaa.gov/mgg/global/relief/ETOPO1/docs/ETOPO1.pdf
Table 1: Specifications for ETOPO1.
Versions: Ice Surface, Bedrock
Coverage Area Global: -180º to 180º; -90º to 90º
Coordinate System: Geographic decimal degrees
Horizontal Datum: World Geodetic System of 1984 (WGS 84)
Vertical Datum: Sea Level
Vertical Units: Meters
Cell Size: 1 arc-minute
Grid Format: Multiple: netCDF, g98, binary float, tiff, xyz
these can probably just be taken from the recent Ocean Adapt udpate
> ((X[,unique(depth.min)])[1101:1150])
[1] "207" "275" "240" "101" "222" "59" "118" "111" "127" "0413" "0608" "0631" "0643" "0558" "0551" "310" "190" "217" "407" "315" "271" "317"
[23] "195" "255" "9" "197" "199" "OO12" "0127" "0573" "0568" "0531" "0559" "0607" "0591" "0123" "213" "215" "236" "0398" "0572" "0000" "0565" "0557"
[45] "0465" "0644" "0626" "0634" "0665" "0571"
See any odd ones in there? Maybe a "0012"? Well, there are also "" and "0 16" in there too. Have to handle these better. Occurs for a few columns, and at least for newf.
Particularly the trawl data sets. URL's where possible. Emails as well.
Found one species, but file name is not name for that species in gmex I don't think.
Checking on phone. Need double check we have right lion fish species and picture file name.
https://github.com/mpinsky/OceanAdapt/tree/testFread/testFread
In contact w/ data.table developers about solution
Given scientific name, could look up common and add it to plot. Often cumbersome to provide both.
@JWMorley did a nice job of putting together logic to trim data from SA (or 'seus') for OceanAdapt. Many of those same steps likely need to be taken here, too.
Some of those steps are more a matter of preference than others. Nonetheless, excellent place to start.
Is outdated
http://www.marinespecies.org/aphia.php?p=taxdetails&id=530071
Not changing now b/c I just finished running the model, but will need to change in spp.key and the picture file name.
I think hadisst is 0.5 deg. Forget years
https://github.com/rBatt/trawlData/blob/development/R/clean.trimRow.R
In there I need to add a column that has a key indicating why I am suggesting the row be dropped.
Then can write a helper function to execute the row trimming, with specific interpretation for the flags. That would be a good place to document what each flag means in each region
This is a feature that I know @mpinsky needs too. At some point, I might ask for help in describing the reasoning behind dropping some of those rows.
Need to make the package more lightweight by reducing the size of the associated data sets.
One step here is to drop extra columns where possible.
There are 3 basic approaches I'm going to take:
date
and time
separate, and I can drop those 2 after I create datetime
CATCHJOIN
or ALTERATIONDESC
).That 3rd category is tricky, because it represents a loss of information relative to what is provided in raw data. That's what I want feedback on in this Issue: which columns from the raw data do I need to keep?
Below I'll make a list of columns, organized under a few categories. I'm open to any feedback as to which columns would be needed; I can add more options if something is suggested that I don't have, but I'll use checking a box as a way of indicating that I intend to keep the column. The goal is to have the package contain only 1 data set per region, and raw data available by download (possibly via a package function). In other words, if a column isn't included here, it won't be easily accessible elsewhere.
Most of the following columns will have the same name in all regions. Or there will be a similar equivalent in the regions that have it. If editing this list and adding a column that only needs to be included for a particular region and doesn't need to be included for other regions even if the column exists, please specify which region.
Time and Location of Sample
reg
year
season
datetime
lon
lat
stratum
(the region's definition, not my custom definition)haulid
Species ID and Characteristics
spp
common
sex
taxLvl
trophicLevel
trophicLevel.se
Additional Method Metadata
station
cruise
vessel
towduration
towarea
gearsize
geartype
comments
survey
(e.g., summer groundfish)Environmental and Sample Data
effort
stratumarea
btemp
stemp
depth
bsalin
ssalin
bdo
sdo
wind
wave
pressure
Biological Measurements
cnt
weight
length
cntcpue
wtcpue
NUMLEN
(neus only)Other
keep.row
row_flag
Many of the columns don't have values that change among every row. In particular, many of the "meta data" columns don't vary within a haul, and the species taxonomy columns don't change at all (across species or regions). Just like we save all the species taxonomy (etc) information in they spp.key
data.table, we could save many of the haul- or cruise- specific information in separate data.tables. In fact, many of the raw data sets arrive in such a format, where environmental, survey, and biological data are separated. While this makes it less convenient to access the data, it makes it so that we can provide more information while staying under CRAN size limits. So there is definitely room to compromise.
Suddenly I noticed that a whole bunch of species are missing the trophic level information in spp.key and in the data sets. I have no idea when this happened.
Making these corrections ... I need to write more scripts to check these things. It could have even my original code that integrated trophicLevel incorrectly.
match.tl <- match.tbl(spp.key[,spp], taxInfo[,spp], taxInfo[,trophicLevel], exact=TRUE)
match.tl[,sum(!is.na(val))] #1093
spp.key[,sum(!is.na(trophicLevel))] # 40!! :(
spp.key[is.na(trophicLevel), trophicLevel:=match.tl[spp.key[,is.na(trophicLevel)], val]]
spp.key[,sum(!is.na(trophicLevel))] # 1118!! :)
match.tl <- match.tbl(spp.key[,spp], taxInfo[,spp], taxInfo[,trophicDiet], exact=TRUE)
match.tl[,sum(!is.na(val))] #847
spp.key[,sum(!is.na(trophicDiet))] # 874
spp.key[is.na(trophicDiet), trophicDiet:=match.tl[spp.key[,is.na(trophicDiet)], val]]
spp.key[,sum(!is.na(trophicDiet))] # 878
match.tl <- match.tbl(spp.key[,spp], taxInfo[,spp], taxInfo[,trophicOrig], exact=TRUE)
match.tl[,sum(!is.na(val))] #847
spp.key[,sum(!is.na(trophicOrig))] # 874
spp.key[is.na(trophicOrig), trophicOrig:=match.tl[spp.key[,is.na(trophicOrig)], val]]
spp.key[,sum(!is.na(trophicOrig))] # 878
match.tl <- match.tbl(spp.key[,spp], taxInfo[,spp], taxInfo[,Picture], exact=TRUE)
match.tl[,sum(!is.na(val))] #1613
spp.key[,sum(!is.na(Picture))] # 1684
spp.key[is.na(Picture), Picture:=match.tl[spp.key[,is.na(Picture)], val]]
spp.key[,sum(!is.na(Picture))] # 1686
@bselden @JWMorley do you know anything about this? I might go back to check where it happened, just because I need to know if part of my code is broken. I'm hoping someone just accidentally deleted a couple values (but that the accident was limited to trophicLevel !).
It's hard to think of how to do this. The ultimate goal is to have a visual qa/qc. So need to represent the data in a way that will make it easy to see weird numbers. It's hard to see that for colors, making the maps kinda pointless I think.
Could plot 1 stratum at a time, plot regional mean/min/max/median ...
Maybe have a 3 panel plot. Top panel has the x-axis as time, second has lon, third has lat. The y axis is any variable. This would not require any aggregating.
Variables need to be split by different ID columns though. E.g., it wouldn't make sense to group wtcpue
together for all species. But creating a separate plot for each species might be rough. Maybe set an option in the function to only bother plotting levels in the ID column that show up sometimes. So you could do plot_raw(X, "wtcpue", by=c("spp"), min_n_obs=20)
, and that would interactively (or save all?) plot the 3 panel for each species that showed up at least 20 times.
I think all aggregating should be left out of the function. User could do it by X <- trawlAgg()
, or just standard aggregation.
A bell could be to color points that lie outside n standard deviations or something. A whistle could be to use outlier detection, like Noah used for NCEAS ice data.
The rows are per individual due to having length information, but the wtcpue column is for the species.
This is a problem because the intuition only works when you assign that the original species names are all correct.
I checked, and this raises very few problem for NEUS if approached simply (i.e., take the mean of the wtcpue within each unique combination of spp-haulid). However, there are a couple cases for which there was a species name correction. So 2 taxonimic ID's originally had their own (different) wtcpue's in a given haul, and each of those taxa may have had some individuals lengthed. So the wtcpue value is repeated several times for the taxon. But after correcting taxonomy, the 2 taxa are actually the same species. So you can't simply take the average (what you would do if all same taxa and duplicated wtcpue, as was probably intended interpretation) or the sum of wtcpue (if multiple rows for the same species-haul did not have duplicated wtcpue).
I hope this issue does not apply to sex too, but it could (i.e., when sex is listed, is the wtcpue sex-specific, or for the whole spp?).
One approach is to first aggregate while including wtcpue as a factor. This can be done with trawlAgg()
, because usually at this stage of data processing both space_lvl
and time_lvl
are "haulid"
, so one of those (probably time) can be changed to "wtcpue"
. However, this might become challenging when there are NA's etc for wtcpue ... idk how the grouping would work.
Another approach could be to make the bioFun
argument something like function(x)sumna(una(x))
, where x
is "wtcpue"
passed to bioCols
argument. This assumes equivalent wtcpue are from duplicated rows that shouldn't be summed together to get the total wtcpue for a species in a haul. May or may not be true.
Yet another approach could be to aggregate not by "spp"
, but by the original taxonomic ID column first. In that first aggregation, do bioFun = meanna
. Then do the subsequent aggregation by "spp"
with bioFun = sumna
. This assumes that duplicate rows for a species within a haul should not be summed. It also obscures the potentially problematic scenario of there actually being multiple wtcpue values .... maybe instead of meanna could do something that lists the unique values, and hopefully throws an error when there's more than 1.
Should have this R code, at least for reference, if not as an actual exported function.
This is here: rBatt/trawl#104
Some of the files are here: working on finding soda
Only in clean wctri. Raw wctri has finite WEIGHT.
toarea is 0, effort 0, wtcpue and cntcpue Inf
Ultimately results, I think, from 0 towdistances (towarea calculated from towdistance, towarea is effort, effort in denominator for wtcpue).
If you look at raw.wctri[DISTANCE_FISHED<=0]
for the raw, (becomes towdistance after name cleaning), you can see 67 rows that lead to the problem there. There are also 14 rows with 0 duration.
If I do
clean.wctri[!is.finite(wtcpue) & !is.na(wtcpue), lu(haulid)]
[1] 6
clean.wctri[haulid%in%clean.wctri[!is.finite(wtcpue) & !is.na(wtcpue), haulid], mean(towduration[towduration!=0]), by="haulid"]
haulid V1
1: 83-199201-257 NaN
2: 19-198901-180 0.25
3: 19-198601-298 0.08
4: 73-198901- 16 0.08
5: 37-199202-157 0.01
6: 5-198001-139 0.10
I see that there are 6 hauls that are problematic. But 5 of those 6 actually have the correct information for other rows. 1 of the 6 hauls apparently doesn't have any true information.
But ultimately this indicates that everything can be fixed for most by just filling with the mean.
Just to further put mind at ease:
clean.wctri[haulid%in%clean.wctri[!is.finite(wtcpue) & !is.na(wtcpue), haulid], lu(towduration[towduration!=0], na.rm=TRUE), by="haulid"]
haulid V1
1: 83-199201-257 0
2: 19-198901-180 1
3: 19-198601-298 1
4: 73-198901- 16 1
5: 37-199202-157 1
6: 5-198001-139 1
So there's only 1 unique value anyway ... which makes sense, as these are haul-specific values.
To fix, in clean.columns, something like the following should work:
X[towduration==0, towduration:=NA]
X[towdistance==0, towdistance:=NA]
X[towarea==0, towarea:=NA]
X[,c("towduration", "towdistance", "towarea"):=lapply(list(towduration, towdistance, towarea), fill.mean), by=c("haulid")]
This repo will become an R package, but it's still in development.
The file spp.key.csv has all of the known "raw" (as-entered) taxonomic identifiers (species names) from all regions. But it needs to be checked.
Most species have had something found. The "raw" column is named "ref", and the "corrected" column is named "spp".
Looking through, some of the "corrected" spp names are clearly wrong, as are some of the common names.
Feel free to make corrections, and commit/ push the changes. But please use Git. You may want to install git lfs before downloading this repo (otherwise, the large file storage might break, or you'll end up with bigger files than you want; I'm not sure what happens).
Note that each value in "ref" is unique, but the "spp" values are not. Make sure you do not create any inconsistencies as you edit the file. E.g., if you see that spp=="zoroaster" does not actually have a common name of "frogfish", don't change the common name to "seastar" on only 1 line ... make sure that the updated file has the same common name for all "zoroaster".
I can explain further when you decide to take a look. Just let me know.
Having the gridded data is nice, but it'd be much better to match it up to the survey data in a seamless way (i.e., make it part of the data.table).
There are 23,000 rows in the clean.newf file that have lat and lon that are NA. This matches how many rows had NA for lat.end. In contrast, there are only 16 rows where lat.start was NA.
It looks like lat was calculated from the mean of (lat.start, lat.end). In the cases in which lat.end was NA, can we assign lat to be the the starting lat instead of NA?
See this gist
https://gist.github.com/bselden/cc86c0e9e219b8cd4ffc
Based on this document: http://www.gulfofmaine-census.org/data-mapping/visualizations/national-marine-fisheries-service-data-overview/#stratum_table
Prefix 1=offshore strata North of Cape Hatteras --> Shore="Offshore_N"
Prefix 3= inshore strata North of Cape Hatteras--> Shore="Inshore_N"
Prefix 5= Scotian Shelf--> Shore="Scotian"
Prefix 7= inshore S of Hatteras--> Shore="Inshore_S"
Prefix 8= offshore S of Hatteras --> Shore="Offshore_S"
Stratum area for inshore strata N (prefix 3) were compiled from Table 4 in http://www.nefsc.noaa.gov/publications/tm/pdfs/tmfnec52.pdf
I can add these new strata to the neus-neusStrata.csv in the trawlData/inst/extdata/neus folder of the repo, unless something else special needs to be done to make sure the area in square nautical miles gets converted during the data processing.
Gmex has very little data in 2015
NEUS has NA bottom temperature in 2015
Not sure what's going on. The NEUS issue is not related to cleaning; it's missing in the raw data.
@mpinsky Any ideas about not having bottom temperature for NEUS?
need to mention that git lfs (git lfs pull
, etc) is needed for initial repo clone/ pull .... can't install the package w/ the pointers instead of the data files
Was trying to play with the remake setup. Keep getting fread
errors:
Restoring previous version of data/raw.gmex.RData
Error in fread(...) :
Expected sep (',') but new line or EOF ends field 41 on line 52870 when reading data: 174730,936,88,1401,88004,8,319,28,54.24,N,90,33.08,W,16.2,B192,5/7/14,409,28,55.56,N,90,33.24,W,,BGBCPNNNCASXOX,,,23.1,1016,11.2,102,0.6,4,S,LA,,7,14,,,"Additional gears used: OY
Something up with fread
?
@JWMorley what's the deal here?
What are the ones I have to watch out for??
It looks like they get Anchoa hepsetus, Anchoa lyolepis, and Anchoa mitchilli every year.
I have Macrocoeloma camptocerum all years from 1989-1995 (except 1993), then never again (going up until 2012).
I see Anchoa cubana in 1989 only, Lobopilumnus agassizii in 1989 and 1993 only, and Engraulis eurystole in 1990 only.
My code:
clean.sa[grepl("Anchovy", common, ignore.case=TRUE), sort(una(spp)), by="year"][,table(year, V1)]
So I just searched for any species that had 'anchovy' in the common name to get that summary. But I' know you know better.
List changes that need to be made, with the code to make them.
spp.key[spp=="Homaxinella amphispicula", common:="firm finger sponge"]
spp.key[spp=="Isodictya rigida", common:="soft finger sponge"]
spp.key[spp=="Leptasterias coei", common:="aleutian six-rayed sea star"]
spp.key[spp=="Neoesperiopsis infundibula", common:="rough China hat sponge"]
spp.key[spp=="Neptunea amianta", common:="white neptune"]
spp.key[spp=="Reinhardtius stomias", spp:="Atheresthes stomias"]
spp.key[ref=="Atheresthes stomias", spp:="Atheresthes stomias"]
spp.key[ref=="CROSS PAPPOSUS", spp:="Crossaster papposus"]
check_and_set(wrong="Cross papposus", corrected="Crossaster papposus")
firstA declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.