stevenwinfo / haskell-soda Goto Github PK
View Code? Open in Web Editor NEWHaskell bindings for the Socrata Open Data API
License: MIT License
Haskell bindings for the Socrata Open Data API
License: MIT License
There's quite a few small things that should be done before opening this up to the public.
Of course, there are the other issues with the "opening up to the public" milestone as well.
Currently, the bindings can only make requests without SODA authentication tokens which means that those requests will be limited. If an application using these bindings wanted to make a lot of queries, it would want to add the authentication token, so it will need to be able to add it.
I'm not sure if this is necessary for the public release, but it should definitely be done at some point.
The aggregate SODA functions have some unique things about them that are currently not reflected in these bindings. For example, you can't use them in $where clauses. This means I'll have to separate out those functions from the other SODA functions and use them slightly differently throughout the code. It will probably involve separating them out into another type, as well as probably making some typeclasses.
There are also some more involved things with aggregates, such as requiring columns to be in the $group clause, however, those will be more involved and I'll probably create another issue for those later.
Many of the SodaFunc constructors, which currently have the type something like a -> a -> b
can actually have different types for the first two parameters, like +
. They can only allow a subset of SodaType types though, like the numeric ones.
Another related problem is that SodaFunc constructors like within_circle
currently allow any SodaType for the first parameter, whereas it should really only be geometric SodaTypes.
These two problems can be solved by creating a bunch of typeclasses that just specify different subsets of SodaType that need to be used. They don't need to have any methods.
There might be another, more terse way of going about this, but I think this is the simplest way that doesn't add any more boilerplate for the user.
The metadata/system fields are always the same for all datasets, so I might as well make named constants that people can add to the query to get that information.
(As a side note, I should see what other kinds of metadata that I can get from datasets and that come with all queries).
I mentioned this when closing #6, but I think that it should be possible to specify the $select
parameters with a user created record type, which we could create a function which fills out that record type and returns that. This would make some things a little more convenient and consistent over the current method.
Users can do this themselves by parsing the string response, however, it would be nice to incorporate it through the whole process.
Currently, only Checkbox typed columns are considered to be able to have null values, but any column of any type can actually have null values. This means our model is a bit off. Functionally it also means we can't currently filter a text field where it isn't (or is) null. Need to change all of the types to add that value. Possibly by making them maybe types, or by making them sum types with another value. I suppose both would be similar, although the latter might have one less constructor depending on the type.
Subqueries actually have a different representation of from regular queries which looks a little more like traditional SQL queries. It would require rewriting the representations of many of the types so I just haven't gotten around to it. It should be pretty simple though.
Once this becomes a little more stable, tested, and many of the problems are hammered out, we can make the package available the way that most Haskell packages are provided, through Hackage. I'm not sure when this will be though. After we do this we could consider adding it to Stackage as well.
Some SodaFuncs have more limited input types than the GADT currently expresses. Typeclasses and instances will need to be made to put additional constraints on the constructors. I think they'll be typeclasses without methods.
Because the subquery parameter represents a lot of different parts of a query differently than a basic query, it would require rewriting how the functionality for how a query is represented for a lot of different parts. Because I've been working on other parts, and I've wanted to hurry up and make the basic functionality work I've skipped over this part.
It shouldn't be too difficult, but it will probably require a lot of busywork of going through all of the URL parameter representations and making new representations for those that differ.
There are a handful of tests currently, but they are not very extensive and don't provide a lot of coverage. Write more tests that will make you more confident when something is changed that it still works.
The complex example is pretty construed, doesn't really display the things I wanted it to well, and isn't really clear on what it's doing. The second dataset also wasn't a great choice either because it isn't really clear what the data it contains in it is. It also isn't very closely related to the first set so finding ways to compare them aren't clear.
I think that my definition of the Case
constructor might be a little restrictive because I think the soda level function can return differently typed values. We could make an existential type just to hide those types, but that would be yet another thing to keep track of.
I don't know if getting rid of some or any of the following is possible, but I'd like to try.
SodaVal
on all of the SODA values.Proxy a
value or something like that could make it simpler.Expr
on a lot of thingsSome of the literal SODA values inside of functions appear differently when displayed as parameters. Mostly, these are the geometric types in things like within_circle()
.
Most of the code doesn't have the Haddock documentation comments. Go over the code and add them.
There's a list of things to add or improve about the current README.
Right now, it just gets back a big string as a response from a query. This means unless the user parses/deserializes that string on their own, that these API bindings aren't very useful.
I'm not very familiar with the deserializing libraries like aeson so I'll have to read up before implementing.
I'm not sure if this can, or even should, be implemented without the user specifying the types of what is returned, so I might have to mention that you have to specify the types when querying. I would also like to see if I can have the definition of column variables somehow interact with the specification of response types.
There were a few things I knew that needed to be checked at runtime, but with the addition of the aggregate stuff, the basics of this should really be implemented before being made public.
Some of the binary operators can have different types on the left and right of the operator. This makes it difficult to decide what the resulting type of the operator should be. Right now it just arbitrarily picks the one on the right.
I think that maybe the Constructors should be exported with type a -> b -> c
where they can put a type annotation to explicitly say what type they want, and then the infix operator will have type a -> b -> b
. Possibly also give a hint in the operator that it is the one on the right, like changing $+
to $+>
or something.
If people made their own instances of SodaType types, the library could operate in unintended ways. The SODA also only recognizes the already defined types, so from a domain perspective, adding external types wouldn't make sense. If we make a supertypeclass for SodaType, we can export toUrlPart, but people can't make instances of the class.
Because SODA returns SODA Double
s as JSON strings, the FromJSON
parsing of those fields into Haskell's Double
s breaks and returns nothing. You also can't override an already declared instance for a specific type, which means we can't write a new instance FromJSON
instance for Double
to be like SodaNum
's instance. This means that right now, the interpreted responses just don't contain Double
values (it's been a while since I tested, so I can't remember if that's exactly what happens).
I actually haven't seen too many datasets with the Double
type, but this is still a pretty big issue. The only solution that I can think of right now would be to change the Double
type into a newtype like SodaNum
and Money
. Having to deal with the newtype for those types is already pretty annoying though, and it seems like you should be able to use a basic datatype in an easier way. If there's no other way though, then I guess we'll have to do it to make responses work correctly.
Either make the parametric binary operator types simpler, have the operator hint which type it is using, or make it very clear in the documentation which it is. Ideally, a combination of all three.
Right now, I'm pretty sure that, req returns IO exceptions. I think I'd rather it return an Either type with a custom exception/error type or something cleaner. It mentions in the Req documentation that you're able to do that, but I just need to look into how to do it.
Right now, the library is probably exporting a lot more things than it needs to so that should be tightened up to only the things that need to be exported. I also know that a lot of other libraries have a particular file dedicated to specifying what is publicly exported, and other exporting schemes, so possibly consider something like that.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.