Comments (7)
So I did find some time already. With the development version, things seem to read fine both for current and historical data. https://github.com/brry/rdwd#latest-version
See the examples at
https://github.com/brry/rdwd/blob/master/man/readDWD.stand.Rd#L41-L60
- I kept 1/8 units since that's the normal measure for cloudiness.
- Unit names are updated ("0.1" removed) and conversion automated.
- NA string conversion is automated
- readDWD should be able to handle several links at once
- Q column names are renamed (I hadn't even seen your edits yet, so if we both have the same idea independently, it must be a good idea ^^)
I didn't see the data.table idea yet and kept speed as a ToDo for the moment. Hope to find time for that soon. It optimally handles .gz files directly as read.fwf does
from rdwd.
Hi,
I'm on vacation and will try to have a look at it soon. Just to confirm we have the same errors, can you verify these?
For the first file (subdaily_standard_format_kl_10381_00_akt.txt) I get:
Error in names(x) <- value : 'names' attribute [51] must be the same length as the vector [9]
For the second file (subdaily_standard_format_kl_10381_bis_1999.txt.gz) I get:
Error: $ operator is invalid for atomic vectors
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
NAs introduced by coercion
2: In readheader(4, asnum = TRUE) : NAs introduced by coercion
3: In doTryCatch(return(expr), name, parentenv, handler) :
NAs introduced by coercion
4: Error in readBin(confile, what = raw(), n = n, endian = "little") :
invalid 'n' argument
from rdwd.
I had a quick look and saw that the data format is entirely different from all other datasets.
Can you help to develop a reading function? I won't have that amount of time this week nor early next week.
I understand the section "Considerations for applications" in the description file (1) on a quick glance to say that most (and much more) data should be available at ftp://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/kl
Does it make sense for you to have a look at that data? It is relatively well tested within rdwd.
(1) ftp://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/subdaily/standard_format/DESCRIPTION_obsgermany_climate_subdaily_standard_format_en.pdf
from rdwd.
I couldn't resist trying some stuff. Here's the intermediate results. Let me know if this is in the direction you need. You'll need the very recent dev version of rdwd for the recent/historical URL selection.
# towards readDWD.stand
library(rdwd)
link <- selectDWD(id=10381, res="subdaily", var="standard_format", per="r")
file <- dataDWD(link, dir=localtestdir(), read=FALSE)
# sf <- readDWD(file, varnames=FALSE)
# read format description:
format_url <- paste0(dwdbase,"/subdaily/standard_format/formate_kl.html")
format_html <- readLines(format_url, encoding="UTF-8")
format_html <- gsub(" ", "", format_html)
format_html <- gsub("°", "deg", format_html)
format_html <- format_html[!grepl("Formatbeschreibung", format_html)]
rdwd:::checkSuggestedPackage("XML", "readDWD.stand")
format_info <- XML::readHTMLTable(doc=format_html, header=TRUE, stringsAsFactors=FALSE)[[1]]
# get column widths:
width <- diff(as.numeric(format_info$Pos))
width <- c(width, 200)
# ToDo: consider moving to a function, caching/saving output (updated along with indexes?)
# read fixed width dataset
sf <- read.fwf(file, widths=width, na.strings="-999")#, n=10) # this takes >20secs to read
# ToDo: definitely look at data.table/vroom/... for speedup here!
colnames(sf) <- format_info$Label
sf$Date <- as.Date(paste(sf$JA,sf$MO,sf$TA,sep="-"), "%F")
for(i in c("P1","P2","P3","PM","TXK","TNK","TRK","TGK","T1","T2","T3","TMK",
"TF1","TF2","TF3","VP1","VP2","VP3","VPM", "FMK","NM","SDK",
"R1","R2","R3","RSK", "FXK","ASH","WAAS","WASH"))
sf[,i] <- sf[,i]/10
plot(sf$Date, sf$SHK, type="l") # ToDo: NA corrections with format_info$Fehlk
# Plot all columns:
{
pdf("sf_columns.pdf", width=9)
par(mfrow=c(2,1),mar=c(2,3,2,0.1), mgp=c(3,0.7,0), las=1)
for(i in 2:ncol(sf)-1) plot(sf$Date, sf[,i], type="l", main=colnames(sf)[i], ylab="")
dev.off()
}
openFile("sf_columns.pdf")
from rdwd.
Pretty nice work and coding, good job!
It works well but keep in mind a few things:
-
Some values arent kept as 1/10th but as 1/8, you can get it from the "Einheit" column. Also you probably should convert the units at all without also indicating the resulting units!
-
Maybe dont open the file with na.strings="-999", but later get NA-values by the "Fehlk" column
-
The "read.fwf" only works with file[1], so it cant read 2 files yet
-
Wouldnt it to make sense to make the "quality-identifier" Q a non-unique name? Maybe with the reffering variable_name +"-Q" ?
Also, feel free to use my data.table approach (with help from here: https://stackoverflow.com/q/24715894/6358363):
library(stringi)
library(data.table)
cols = list(beg = as.numeric(format_info$Pos), end = as.numeric(format_info$Pos)[-1L] - 1L)
sf2 <- fread(file[2], sep = "\n", header = FALSE)
sf2 <- sf2[ , lapply(seq_len(length(cols$beg)), function(ii) stri_sub(V1, cols$beg[ii], cols$end[ii]))]
colnames(sf) <- format_info$Label
from rdwd.
As of Version 1.1.25
, reading speed is now fine too. I use readr
, which is about as fast as data.table
, but already returns numerical columns where appropriate.
Can you help out by testing the whole thing on some more files?
from rdwd.
Works properly and is also fast! Thanks
from rdwd.
Related Issues (20)
- Unzip function for windows HOT 15
- readDwd is creating error in a windows pc HOT 3
- website in documentations
- doc references
- add use case: values at locations in grid
- grib2 reading fails HOT 8
- Several links to ftp://opendata.dwd.de in documentation fail HOT 2
- air_temperature history hourly data link is not working. HOT 2
- local test dir
- metaInfo() - start/end date HOT 7
- expand per="hr" to all options HOT 1
- the 'wininet' method for ftp:// URLs is defunct HOT 3
- reduce index size HOT 9
- properly vectorize selectDWD in expansive mode HOT 10
- hr argument
- Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools** HOT 2
- website update
- use https HOT 3
- Update behaviour of dataDWD
- Downloads of historic data fails HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rdwd.