Right now, in order to add a dataset, one needs to: Add the da

In the case of docs, it would be read from thedocs.md file. </blockqu

Proposal: Dynamically generate dataset functions at compile time about explorer HOT 7 CLOSED

elixir-explorer commented on August 16, 2024

Proposal: Dynamically generate dataset functions at compile time

from explorer.

Comments (7)

josevalim commented on August 16, 2024

It is fine to repeat code when it is 2 LOC. If that still bothers, you can do:

def fossil_fuels, do: read!("fossil_fuels.csv")

And the duplication is effectively gone while being clear what is the docs, what is the function name, etc.

For example, what happens if we start removing the duplication and then we want to add specs or doc metadata? Or we want to add some post-processing to one of the datasets?

from explorer.

Benjamin-Philip commented on August 16, 2024

For example, what happens if we start removing the duplication and then we want to add specs or doc metadata? Or we want to add some post-processing to one of the datasets?

So, my assumption is that all datasets are similar and needs to be managed in the same manner. So, if we want to add specs, they would all have the same spec which would be hardcoded in the generation. In the case of docs, it would be read from thedocs.md file.

Won't a dataset be processed before being added to this repo?

However, I get your point of a special case for one dataset. In this case, I would remove this dataset from the list to iterate and write the dataset function by hand.

And the duplication is effectively gone

Another solution is writing a private function with the dataset name as parameter to load the file, and then writing each function by hand.

from explorer.

josevalim commented on August 16, 2024

In the case of docs, it would be read from thedocs.md file.

Docs metadata are Elixir code, so we would need to start parsing the .md file and evaluating Elixir code. The point is exactly that if your duplication is 2 LOC, that is mostly declaring stuff with docs, specs, and not really any logic, then attempting to remove the duplication is most likely just adding indirection. For example, everyone expects docs to be in the @doc annotation, not really in .md, and so on.

EDIT: to further add to this, I don't think the plan is to have many datasets as part of this repo either, especially as scidata is meant to cover the larger ones.

from explorer.

Benjamin-Philip commented on August 16, 2024

Docs metadata are Elixir code, so we would need to start parsing the .md file and evaluating Elixir code. The point is exactly that if your duplication is 2 LOC, that is mostly declaring stuff with docs, specs, and not really any logic, then attempting to remove the duplication is most likely just adding indirection. For example, everyone expects docs to be in the @doc annotation, not really in .md, and so on.

I meant reading the docs.md file at compile time and assigning it to the @doc annotation. So:

@doc unquote(read!("datasets/#{name}/docs.md"))

EDIT: to further add to this, I don't think the plan is to have many datasets as part of this repo either, especially as scidata is meant to cover the larger ones.

What do you think of this:

def fossil_fuels, do: read_dataset("fossil_fuels")

defp read_dataset(name) do
  @datasets_dir
  |> Path.join(name <> ".csv")
  |> Dataframe.read_csv!()
end

from explorer.

josevalim commented on August 16, 2024

I meant reading the docs.md file at compile time and assigning it to the @doc annotation. So:

I understood. What I mean in the context of someone reading the source code later on, they wouldn't expect the docs in a markdown file.

What do you think of this:

Right, that's pretty much what I had in mind for read!. :)

from explorer.

Benjamin-Philip commented on August 16, 2024

I understood. What I mean in the context of someone reading the source code later on, they wouldn't expect the docs in a markdown file.

Yes that's an issue.

Right, that's pretty much what I had in mind for read!. :)

Shall I submit a pr then?

from explorer.

josevalim commented on August 16, 2024

It is fine by me. I don't mind the current code but extracting it also works.

from explorer.

Proposal: Dynamically generate dataset functions at compile time about explorer HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs