GithubHelp home page GithubHelp logo

yaronkoren / miga Goto Github PK

View Code? Open in Web Editor NEW
20.0 20.0 7.0 1.4 MB

A Javascript application (with some PHP) for viewing and browsing arbitrary structured data

License: GNU General Public License v3.0

JavaScript 70.22% PHP 26.63% CSS 3.15%

miga's People

Contributors

jqnatividad avatar yaronkoren avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

miga's Issues

SemanticMediaWikiImporter.php can use bigger chunks

Currently this importer is using limit=100. Limit of 500 works very well, and reduces the queries by 1/5th making imports much faster for large data sets.

-$askURL .= "&p%5Bformat%5D=csv&p%5Bheaders%5D=hide&p%5Blimit%5D=100";
+$askURL .= "&p%5Bformat%5D=csv&p%5Bheaders%5D=hide&p%5Blimit%5D=500";

Loading interrupted by request for my storage

I'm not sure if this is something that can be coded around, but on the iPhone if the user is prompted in Safari that the site they are visiting needs more storage the loading process freezes and requires a refresh of the page.

For Wikinosh, if you delete the local storage data and reload this will trigger at 5MB, freezing the loading. Then a refresh will start it again at which point it triggers at 10MB, stopping the load. Then a final refresh will load the complete data set.

Number of search results per page should be dynamic

I am not sure if this is possible now, but there should be a way to switch the display of search results to be dynamic. If there are 651 results we shouldn't show 500 on first page and only 151 on second page but spread them out a little.

651 is relatively a small result set and 500/page is relatively too much. I would have preferred if it were displaying a max 100/page for results set of this size.

Search autocompletion

The search interface would be so much better if it had autocompletion, or just showed in someway a realtime list of matches.

Get unformatted values in SemanticMediaWikiImporter.php

Currently when importing numeric data the CSV file gets numbers with comma separators. See last column in this line:

"40% Bran Flakes Cereal, Kellogg's",93,5,0.54,0,0,220,22.15,3.976,5.112,3.58,,"1,599"

I tried simply adding "#-" to the Ask query like so:

-       $askURL .= urlencode( '?' . $propertyName . "\n" );
+       $askURL .= urlencode( '?' . $propertyName . "#-\n" );

However the commas persist.

Revisit IndexedDB?

It now appears that IndexedDB is supported across all major browsers (http://caniuse.com/#search=indexeddb)

For the most part, one installation we had was very happy with MigaDV. Except, as you well know, most enterprise environments are still Windows. We created workarounds (detect browser, redirect to a static page; created a user-friendly way to install chrome with an extended installer, etc), but it was the main thing they had an issue with.

I realize it will be non-trivial though as WebSQL is relational, and IndexedDB is a kv-store, so all the dynamic sql generation for faceted searching will have to be re-implemented.

Consider bypassing DataFileReader.php to allow much faster loading of large files

So when I was working with #1 and #4 I start to get a CSV file that is farily large with 22,000 rows at 1.2MB of data. This actually wouldn't be much of a problem at all if nginx gzips it in transit.

% curl -I http://wikinosh.com/miga/apps/wikinosh/Food.csv
HTTP/1.1 200 OK
Server: nginx/1.4.1
Date: Fri, 26 Jul 2013 03:51:44 GMT
Content-Type: application/octet-stream
Content-Length: 1219451
Last-Modified: Fri, 26 Jul 2013 03:46:21 GMT
Connection: keep-alive
ETag: "51f1f10d-129b7b"
Accept-Ranges: bytes

That's big, and probably a problem. However, if I can pull it directly using gzip via nginx, it is only 99k! Note content-length.

% curl -I -H 'Accept-Encoding: gzip,deflate' http://wikinosh.com/miga/apps/wikinosh/Food.csv
HTTP/1.1 200 OK
Server: nginx/1.4.1
Date: Fri, 26 Jul 2013 03:54:01 GMT
Content-Type: application/octet-stream
Content-Length: 99558
Last-Modified: Fri, 26 Jul 2013 03:53:03 GMT
Connection: keep-alive
Vary: Accept-Encoding
ETag: "51f1f29f-184e6"
Content-Encoding: gzip

If you configure your webserver properly to allow this to pull of a gzip'd file this is super slick.

However, the use of DataFileReader.php and PHP getting in the middle kills this solution.

Suggestion, allow the front end to pull the CSV directly. jQuery will then use the compression if it's available, and very large data sets will not present a challenge in transmission.

Duplication of data in local storage?

I'm not sure how to debug this, but after reloading the data set for Wikinosh many times the local storage being used was over 34MB. When I delete the data and start fresh, the data is only around 18MB. It seems like the more times I completely reload the data the larger it gets. Perhaps there are some records that are not being purged on a reload?

pure javascript

This is awesome. It's just what I was looking for, but the need of a php server broke my legs =(, I can't use it. Is it not possible to have it in a pure javascript/html with data stored anywhere: dropbox, github pages, locally, etc...?

SemanticMediaWikiImporter.php not ending when all data is exported

Now that issue #1 is closed I reran the importer to generate data. It should have stopped with 21,516 rows (http://wikinosh.com/wiki/Category:Food) but the file went to 22,900 rows and then I CTRL-C'd the task.

http://wikinosh.com/miga/apps/wikinosh/Food.csv

Looking at the contents of that file, it seems to not be entirely sequential. Actually, doing a

grep "Agar Seaweed" Food.csv

on that CSV shows the same data over and over. Ugh. Not sure what is causing this issue, it's likely some SMW issue. The net result is that the importer never stops and runs forever.

If you want to test yourself, this is my import settings:

<?php
$gImportFileName = "Food.csv";
$gImportSpecialAskURL = "http://wikinosh.com/wiki/Special:Ask";
$gImportCategoryName = "Category:Food";
$gImportFields = array(
        'Name' => '_name',
        'Calories' => 'Has calories',
        'Fat' => 'Has fat',
        'Carbohydrates' => 'Has carbohydrates',
        'Protein' => 'Has protein'
);

Make MigaDV more SEO-friendly

MigaDV aims to make data publishing easier. And since people find content through search engines, perhaps, MigaDV should also make finding the published data easier.

However, javascript sites are not normally indexable by search engines. Existing workarounds include creating Sitemaps and HTML snapshots.

Maybe, during "compilation", a simpler static version of the site is generated which is SEO-friendly. The simple page, can perhaps, then have a redirect to the "real" page. Robots.txt can even be told to use the simple static site.

Be more careful when building a new CSV

While running the importer Miga should take more care with the existing CSV. Ideally:

  1. Leave the existing CSV alone.
  2. Create the new file using a temp filename.
  3. If fails, leave temp file for debugging and give error.
  4. If success, move original CSV to an archive name and swap in new CSV.

Currently while building a large data set users could get partial results. Also, a failed rebuild destroys your currently working data.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.