GithubHelp home page GithubHelp logo

Comments (17)

Kabouik avatar Kabouik commented on September 5, 2024 1

Awesome! Now the json file indeed is much easier to read, and I can simplify the bash script:

print() {
    if [ -z "$1" ]; then
        cod="$(fzf < "$WE_DIR"/setup/codes.txt)"
        jello < "$WE_DIR"/data/"$cod".json
    else
        cod=$(echo "$1" | tr '[:lower:]' '[:upper:]')
        jello < "$WE_DIR"/data/"$cod".json
    fi
}

table() {
    print "$1" | jtbl -n | less -S
}

This is getting exciting! I need to focus on better implementing fzf (handle multi selections, offer only available data and not the full list, possibly prompt for scraping if not already done) and adding human-readable categories, but on the Python side I think my main issue will be when repeating requests multiple times: this appends the new result to the existing json file, but won't check for duplicates. I'll check your repositories, I see that there are a lot of json manipulation tools and maybe there's already something to deal with this issue.

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024 1

I opened a feature request to investigate how to do this. (#7)

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

Hi there - thank you for your question!

I'd have to see if there is a way to "detect" this type of format and output the table the way you describe. Alternatively, I could add a program flag that supports this structure. In the mean time, it's not too difficult to transform this structure into something that jtbl likes. You can use a tool like jq or jello. Here is how you could do it in jello using Python syntax:

$ cat data.json | jello '\
result = []
for date, location in zip(_.Date, _.Location):
    result.append({"date": date, "location": location})
result' | jtbl
date                 location
-------------------  -----------------------------------------
2021-08-24 21:00:00  Indonesia - TERNATE/BABULLAH
2021-08-24 21:00:00  Bolivia - TRINIDAD
2021-08-24 21:00:00  United States - PENSACOLA, FL
2021-08-24 21:00:00  Russian Federation - IVDEL'

from jtbl.

Kabouik avatar Kabouik commented on September 5, 2024

Thanks for the quick answer @kellyjonbrazil! I am no expert at all so I am not even sure that this json format is optimal, but I was told that at least it is correct. In fact, I would be happy to change it if someone more educated about json thinks it can be improved for the type of data I'm dealing with.

I'll monitor this issue closely in case you can implement a flag for this type of data (if I don't change my json structure first), but until then the trick you shared will just do! It is not straightforward to integrate it in the wrapper bash script I am working on to scrap and manage the data, though. I would like to avoid having to use an extra script file just for showing tables, so if I could do that just in bash, it would be easier for me.

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

There is no one right way to do it, but if you want the data to work with jtbl natively, it should look like this:

[
  {
    "date": "2021-08-24 21:00:00",
    "location": "Indonesia - TERNATE/BABULLAH"
  },
  {
    "date": "2021-08-24 21:00:00",
    "location": "Bolivia - TRINIDAD"
  },
  {
    "date": "2021-08-24 21:00:00",
    "location": "United States - PENSACOLA, FL"
  },
...
]

The jello JSON processing above is converting the original format into this format. This format is more descriptive than your original format because it pairs the associated data together, so no one needs to guess that the date and locations are related.

You can drop the jello query directly into your current pipeline. There is no need for another file. I was just using cat data.json as a placeholder for your existing pipeline. Simply insert the jello part between your pipeline and jtbl with another pipe:

$ <existing bash commands> | jello '\
result = []
for date, location in zip(_.Date, _.Location):
    result.append({"date": date, "location": location})
result' | jtbl

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

I was taking a look at your Bash script and it looks like you would like to add the table functionality here:

print() {
    cod=$(echo "$1" | tr '[:lower:]' '[:upper:]') 
    jello < "$WE_DIR"/data/"$cod".json
}

table() {
        cod=$(echo "$1" | tr '[:lower:]' '[:upper:]') 
        #cat "$WE_DIR"/data/"$cod".json | jello '\
        #result = []
        #for d, t, x in zip(_.Date, _.Title, _.Details):
        #    result.append({"Date": d, "Title: t, "Details": x})
        #result' | jtbl -n
        printf "Table output not implemented yet. :("
}

If your print function works ok, then it seems you could do something like this:

table() {
        cod=$(echo "$1" | tr '[:lower:]' '[:upper:]') 
        jello '\
result = []
for d, t, x in zip(_.Date, _.Title, _.Details):
    result.append({"Date": d, "Title: t, "Details": x})
result' < "$WE_DIR"/data/"$cod".json | jtbl -n
}

Note, you cannot indent the jello lines because the formatting is important to the Python interpreter.

from jtbl.

Kabouik avatar Kabouik commented on September 5, 2024

I don't intend to turn this issue into a support thread for my own script (though I wouldn't mind the help!) but I'm getting an error with the table() function you posted. This is something I tried myself after your first post (I remember I tried without the indentation too at the time), but it failed.

With the above table(), I get this:

$ we -t aat
jello:  Query Exception:  SyntaxError
        invalid syntax (<unknown>, line 4)
            result.append({"Date": d, "Title: t, "Details": x})
        query: \\nresult = []\nfor d, t, x in zip ...  "Title: t, "Details": x})\nresult
        data: {'Date': ['2021-08-25 07:54:16'],  ... .org/eventList/details/111229/0']}

But the more I think about it, the more I think the format that jtbl expects natively would indeed be more appropriate for me. Therefore, the best way might actually be to do the jello conversion directly in my .py script so that it saves the json file in the proper format. Everything that has to do with fiddling with the Python code concerns me, though!

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

No worries! Looks like we are just missing a quotation mark for Title. This should work:

table() {
        cod=$(echo "$1" | tr '[:lower:]' '[:upper:]') 
        jello '\
result = []
for d, t, x in zip(_.Date, _.Title, _.Details):
    result.append({"Date": d, "Title": t, "Details": x})
result' < "$WE_DIR"/data/"$cod".json | jtbl -n
}

Nice script!

from jtbl.

Kabouik avatar Kabouik commented on September 5, 2024

Nice, now that works! I Integrated it in devel, but I'm still wondering if it wouldn't be better to just alter the Python script and directly save data in a more appropriate json structure.

Thanks for the kind words. This is merely an experiment, I don't even have a real use for it, but maybe one day if it becomes feature complete and I can ascertain the reliability of the website I scrap. I'm currently trying to add some fzf magic into devel to better handle codes and suggest them when none is provided, but I don't see myself progressing much in the actual todo list in the near future since most of it probably depends on the Python part.

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

I started in Bash and then started learning Python a few years ago with similar projects like yours. To me, understanding both is very liberating - Python is so powerful, but not too hard to learn. Keep it up!

You could actually just drop this part into your python script:

result = []
for d, t, x in zip(myvar["Date"], myvar["Title"], myvar["Details"]):
    result.append({"Date": d, "Title": t, "Details": x})

Just change myvar to whatever variable name that corresponds to the JSON data (more correctly, dictionary) in the script. Then just use result instead of the original variable.

Notice I just changed the code to use ["Date"] instead of .Date. This is because jello does some fancy stuff behind the scenes to allow dot notation, but this is not a native Python feature for accessing dictionary attributes.

from jtbl.

Kabouik avatar Kabouik commented on September 5, 2024

I am getting an error with that but I suppose this is just me failing to properly merge your suggestion into the existing Python code:

$ WE_DIR=(pwd) scripts/./AAT.py                                       4 changed files  devel 
Traceback (most recent call last):
  File "/home/mathieu/Projects/worldevents/scripts/./AAT.py", line 24, in <module>
    for d, l, t, x in zip(json_data["Date"], json_data["Location"], json_data["Title"], json_data["Details"]):
TypeError: string indices must be integers
json_data = json.dumps({"Date": AAT, "Location": locs, "Title": titles, "Details": details})
    result = []
    for d, l, t, x in zip(json_data["Date"], json_data["Location"], json_data["Title"], json_data["Details"]):
        result.append({"Date": d, "Location": l, "Title": t, "Details": x})
        print(result, file=open(wedir+'/data/AAT.json', 'a'))
        print(f'Appended {len(AAT)} event(s) to {wedir}/data/AAT.json. \033[32;1m✔\033[0m')

As I'm not sure about the difference between the json variable and a json dictionnary, I also tried replacing json_data with AAT, locs, titles and details, respectively, in the for d, l, t, x line, but no dice. I need to read about what those commands do and expect.

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

Ah yes, it is a bit confusing between JSON and a Python Dictionary at first. Basically a Python Dictionary is the data structure and you can load JSON directly into a Dictionary or dump a Dictionary to a JSON string.

In this case it is not working because json_data is a string, not a Dictionary. This is because of the json.dumps function.

To make this work you would do this instead:

json_data = {"Date": AAT, "Location": locs, "Title": titles, "Details": details}

result = []
for d, l, t, x in zip(json_data["Date"], json_data["Location"], json_data["Title"], json_data["Details"]):
    result.append({"Date": d, "Location": l, "Title": t, "Details": x})

result = json.dumps(result)
print(result, file=open(wedir+'/data/AAT.json', 'a'))
print(f'Appended {len(AAT)} event(s) to {wedir}/data/AAT.json. \033[32;1m✔\033[0m')

This way json_data is just a Dictionary and then we finally convert result to a JSON string so it can be printed.

I didn't rename the variables just to keep things consistent, but in this case it might make sense to rename them because json_data is no longer JSON, it is a Dictionary. So something simple like data might be better. Then you could change the final result name to something like json_data, since it is a JSON string.

Note, even this is probably not the most efficient way to do this since we are creating a dictionary and reformatting it, but it gets the job done. Without getting too deep into it, you could probably just generate the correct data structure in the first place with something like:

result = []
for d, l, t, x in zip(AAT, locs, titles, details):
    result.append({"Date": d, "Location": l, "Title": t, "Details": x})

This way you can get rid of json_data completely. Sorry, sometimes it takes a few iterations for me to see how to make it more efficient! :)

from jtbl.

Kabouik avatar Kabouik commented on September 5, 2024

I now have something almost functional but noticed something weird with jtbl -t:

WTR.json
--------
  latitude    longitude  time          title         location      details       source        category      description
----------  -----------  ------------  ------------  ------------  ------------  ------------  ------------  -------------
 -35.015      -55.3376   2021-08-30 1  Uruguay - EV  Uruguay, Sou  https://rsoe  http://www.m  Traffic inci  Container sh
  -4.46859    -74.124    2021-08-30 0  Peru - Over   Peru, South   https://rsoe  https://www.  Traffic inci  More than 20
  38.8853       1.43524  2021-08-29 1  Spain - Fift  Spain, Europ  https://rsoe  https://www.  Traffic inci  Fifteen peop

AAT.json
--------
  latitude    longitude  time          title         location      details       source        category      description
----------  -----------  ------------  ------------  ------------  ------------  ------------  ------------  -------------
   30.2762     -89.7816  2021-08-31 1  United State  United State  https://rsoe  https://eu.u  Biological o  Hurricane Id

Notice how the columns are not ordered in the same way as in the raw json files:

WTR.json
--------
{
  "time": "2021-08-30 14:39:50",
  "title": "Uruguay - EVER container ship accident in Rio de la Plata, Uruguay",
  "location": "Uruguay, South America",
  "details": "https://rsoe-edis.org/eventList/details/113189/0",
  "source": "http://www.maritimebulletin.net/2021/08/30/ever-container-ship-accident-in-rio-de-la-plata-uruguay/",
  "category": "Traffic incident - Water accident",
  "latitude": "-35.015046",
  "longitude": "-55.337593",
  "description": "Blah blah."
}
{
  "time": "2021-08-30 09:22:42",
  "title": "Peru - Over 20 dead, dozens missing after vessel collision in Peru",
  "location": "Peru, South America",
  "details": "https://rsoe-edis.org/eventList/details/113112/0",
  "source": "https://www.bignewsnetwork.com/news/270936218/over-20-dead-dozens-missing-after-vessel-collision-in-peru?utm_source=feeds.bignewsnetwork.com&utm_medium=referral",
  "category": "Traffic incident - Water accident",
  "latitude": "-4.468586",
  "longitude": "-74.12399",
  "description": "Blah blah."
}
{
  "time": "2021-08-29 10:29:21",
  "title": "Spain - Fifteen injured in Ibiza ferry accident",
  "location": "Spain, Europe",
  "details": "https://rsoe-edis.org/eventList/details/112817/0",
  "source": "https://www.majorcadailybulletin.com/news/local/2021/08/29/88771/ibiza-ferry-accident-leaves-fifteen-injured.html",
  "category": "Traffic incident - Water accident",
  "latitude": "38.88534",
  "longitude": "1.435239",
  "description": "Blah blah."
}

AAT.json
--------
{
  "time": "2021-08-31 10:29:55",
  "title": "United States - Man attacked by alligator in flooded Louisiana waters after Hurricane Ida",
  "location": "United States, North America",
  "details": "https://rsoe-edis.org/eventList/details/113530/0",
  "source": "https://eu.usatoday.com/story/news/nation/2021/08/30/hurricane-ida-man-attacked-alligator-flooded-louisiana-waters/5660363001/",
  "category": "Biological origin - Animal attack",
  "latitude": "30.27621",
  "longitude": "-89.78162",
  "description": "Blah blah."
}

Any idea what may cause this? The tables are generated this way (${typ} is an array of files selected in fzf):

            for i in "${typ[@]}"
                do printf '\n%s\n--------\n' "$i"
                jtbl -t < "$i"
            done

I also noticed that cat {} | jtbl -t as fzf preview command does not make the order of columns consistent, despite the raw json files all being structured the same way. For some files, columns are in the same order as variables in the json files, sometimes they are mixed like above. Printing those same files that show differently with the fzf preview in all cases yields the mixed columns above, so there must be something different between cat {} | jtbl -t and jtbl -t < "$i" (where i is looped through the array), although they both sort columns in differently than the json file.

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

Yes, it is true that jtbl doesn't attempt to keep the ordering of columns while it tries to truncate or resize them to fit. I believe (but not sure) the columns will stay the same when using the -n option, which skips the column resizing logic.

I'd have to dig in a little further to see if there is a way to preserve ordering, but it would take me a while to understand the resizing code as I did that a while back and it was a bit complex. :)

Also, note that field ordering has no intrinsic importance in JSON. An API may change the order of fields at any time and even fields between records may not be in the same order, so it is hard to define what the behavior should be. I suppose jtbl could see the ordering of the first object and then keep everything the same as that.

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

Looking at the data it looks like the general order of columns is smallest to longest.

That is, jtbl checks all of the rows (including the header) and finds the largest cell for each column and basically truncates/resizes them and prints them out in the order of which column has the least-longest cell to the column with the largest cell.

Hope that makes sense! :)

from jtbl.

Kabouik avatar Kabouik commented on September 5, 2024

from jtbl.

kellyjonbrazil avatar kellyjonbrazil commented on September 5, 2024

Looks like I have the column reordering issue fixed in the dev branch. Continuing to test, but I should be able to release this fix in the next version shortly.

% echo '{
  "time": "2021-08-30 14:39:50",
  "title": "Uruguay - EVER container ship accident in Rio de la Plata, Uruguay",
  "location": "Uruguay, South America",
  "details": "https://rsoe-edis.org/eventList/details/113189/0",
  "source": "http://www.maritimebulletin.net/2021/08/30/ever-container-ship-accident-in-rio-de-la-plata-uruguay/",
  "category": "Traffic incident - Water accident",
  "latitude": "-35.015046",
  "longitude": "-55.337593",
  "description": "Blah blah."
}
{
  "time": "2021-08-30 09:22:42",
  "title": "Peru - Over 20 dead, dozens missing after vessel collision in Peru",
  "location": "Peru, South America",
  "details": "https://rsoe-edis.org/eventList/details/113112/0",
  "source": "https://www.bignewsnetwork.com/news/270936218/over-20-dead-dozens-missing-after-vessel-collision-in-peru?utm_source=feeds.bignewsnetwork.com&utm_medium=referral",
  "category": "Traffic incident - Water accident",
  "latitude": "-4.468586",
  "longitude": "-74.12399",
  "description": "Blah blah."
}
{
  "time": "2021-08-29 10:29:21",
  "title": "Spain - Fifteen injured in Ibiza ferry accident",
  "location": "Spain, Europe",
  "details": "https://rsoe-edis.org/eventList/details/112817/0",
  "source": "https://www.majorcadailybulletin.com/news/local/2021/08/29/88771/ibiza-ferry-accident-leaves-fifteen-injured.html",
  "category": "Traffic incident - Water accident",
  "latitude": "38.88534",
  "longitude": "1.435239",
  "description": "Blah blah."}' | jq -c | jtbl
╒══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤══════════════════╤════════════╤═════════════╤═══════════════╕
│ time             │ title            │ location         │ details          │ source           │ category         │   latitude │   longitude │ description   │
╞══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪══════════════════╪════════════╪═════════════╪═══════════════╡
│ 2021-08-30 14:39 │ Uruguay - EVER c │ Uruguay, South A │ https://rsoe-edi │ http://www.marit │ Traffic incident │  -35.015   │   -55.3376  │ Blah blah.    │
│ :50              │ ontainer ship ac │ merica           │ s.org/eventList/ │ imebulletin.net/ │  - Water acciden │            │             │               │
│                  │ cident in Rio de │                  │ details/113189/0 │ 2021/08/30/ever- │ t                │            │             │               │
│                  │  la Plata, Urugu │                  │                  │ container-ship-a │                  │            │             │               │
│                  │ ay               │                  │                  │ ccident-in-rio-d │                  │            │             │               │
│                  │                  │                  │                  │ e-la-plata-urugu │                  │            │             │               │
│                  │                  │                  │                  │ ay/              │                  │            │             │               │
├──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────┼─────────────┼───────────────┤
│ 2021-08-30 09:22 │ Peru - Over 20 d │ Peru, South Amer │ https://rsoe-edi │ https://www.bign │ Traffic incident │   -4.46859 │   -74.124   │ Blah blah.    │
│ :42              │ ead, dozens miss │ ica              │ s.org/eventList/ │ ewsnetwork.com/n │  - Water acciden │            │             │               │
│                  │ ing after vessel │                  │ details/113112/0 │ ews/270936218/ov │ t                │            │             │               │
│                  │  collision in Pe │                  │                  │ er-20-dead-dozen │                  │            │             │               │
│                  │ ru               │                  │                  │ s-missing-after- │                  │            │             │               │
│                  │                  │                  │                  │ vessel-collision │                  │            │             │               │
│                  │                  │                  │                  │ -in-peru?utm_sou │                  │            │             │               │
│                  │                  │                  │                  │ rce=feeds.bignew │                  │            │             │               │
│                  │                  │                  │                  │ snetwork.com&utm │                  │            │             │               │
│                  │                  │                  │                  │ _medium=referral │                  │            │             │               │
├──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼──────────────────┼────────────┼─────────────┼───────────────┤
│ 2021-08-29 10:29 │ Spain - Fifteen  │ Spain, Europe    │ https://rsoe-edi │ https://www.majo │ Traffic incident │   38.8853  │     1.43524 │ Blah blah.    │
│ :21              │ injured in Ibiza │                  │ s.org/eventList/ │ rcadailybulletin │  - Water acciden │            │             │               │
│                  │  ferry accident  │                  │ details/112817/0 │ .com/news/local/ │ t                │            │             │               │
│                  │                  │                  │                  │ 2021/08/29/88771 │                  │            │             │               │
│                  │                  │                  │                  │ /ibiza-ferry-acc │                  │            │             │               │
│                  │                  │                  │                  │ ident-leaves-fif │                  │            │             │               │
│                  │                  │                  │                  │ teen-injured.htm │                  │            │             │               │
│                  │                  │                  │                  │ l                │                  │            │             │               │
╘══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧══════════════════╧════════════╧═════════════╧═══════════════╛

from jtbl.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.