GithubHelp home page GithubHelp logo

johari / minicell Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 1.06 MB

(wip) A rich visicalc dialect with new datatypes inside cells. Recalc or die. ๐Ÿดโ€โ˜ ๏ธ

Haskell 50.44% Shell 0.54% Elm 33.68% CSS 1.94% HTML 1.64% Makefile 6.95% Python 4.81%
build-system elm haskell shake visicalc-dialect

minicell's People

Contributors

johari avatar

Stargazers

 avatar

Watchers

 avatar  avatar

minicell's Issues

Data Extraction (engine)

Populate and extract V (vertices) and E (edges) from semi-structured data.

=X(ยท)

The most basic way to extract a graph from a table is to extract it from the incidence matrix. We implemented this particular feature in b5d538c. These are the lines that are responsible for it:

EApp "X" [ECellRange (rhoL, kappaL) (rhoR, kappaR)] -> do
let headerRow = [ (rhoL, kappa) | kappa <- [ (kappaL + 1) .. kappaR] ]
headerColumn = [ (rho, kappaL) | rho <- [ (rhoL+1) .. rhoR ] ]
matrix = [ (rho, kappa) | rho <- [rhoL+1 .. rhoR], kappa <- [kappaL+1 .. kappaR] ]
verticesWithAddr <- sequence [ do val <- eval model (ECellRef addr); return (addr, val) | addr <- (headerRow ++ headerColumn) ]
edgesWithAddr <- sequence [ do val <- eval model (ECellRef addr); return (addr, val) | addr <- matrix ]
let newVertices = (catMaybes $ (maybeVertex <$> verticesWithAddr))
let newEdges = catMaybes $
[
do
((rho, kappa), i) <- maybeEdge s
v1 <- lookup (rho, kappaL) newVertices
v2 <- lookup (rhoL, kappa) newVertices
return $ (v1, v2, i)
| s <- edgesWithAddr
]
let (newNodes, nm) = mkNodes new (snd <$> newVertices)
return $ EGraphFGL $ mkGraph newNodes (fromMaybe [] $ mkEdges nm newEdges)
where
maybeVertex (addr, (ESLit s)) = Just (addr, s)
maybeVertex _ = Nothing
maybeEdge (addr, (EILit i)) = Just (addr, i)
maybeEdge _ = Nothing

See also

Multiline raw string, Markup, Template engines and Transclusion

Features

  • Including other documents (e.g. header, footer)
  • Reading values from cells and converting it to static html
    • A EYouTube value could be rendered as an embedded iframe
    • An image cell could be rendered as an <img> tag
    • A video cell could be rendered via <video> tag
  • erb
  • mustache
  • markdown

See also

Related work

Papers

Graph modeling inside spreadsheets

  • NodeXL

Extraction (data wrangling)

  • FlashFill (string transformations)
  • CheckCell (data debugging)
  • ExceLint (Formula errors)

Products

Microsoft Excel

I'm not saying that Excel is *the best *spreadsheet program out there. But it is relatively simple, quick, and powerful and the combination of those three qualities as well as its relative prevalence makes it the go to choice for most people and industries. It's also what will probably continue to drive its popularity, despite all its quirks[1].

[1] Wikipedia Entry: Microsoft Excel Quirks (http://en.wikipedia.org/wiki/Microsoft_excel#Quirks)

COMET (backend)

This is a meta-issue tracking everything we need from backend to support Full COMET.

Roadmap

  • (0.0.3)
    • Receiving pings from WebSocket server to fetch /minicell/all.json
  • (0.0.4)
    • Receiving payload through websocket, without a need to call /minicell/all.json
    • (Multi-user) A websocket server will broadcast updates to each client

Backend

  • Serving multiple clients via WebSocket
  • Needs to create a dependency tree (#39)
  • Incrementally re-evaluate values
    • Cheap way is to just re-compute everything..
    • Expensive way is to think harder about incremental computation, specially for expensive graph computations

A more challenging example: Seconds since epoch

One extreme example is =UNIXTIME() changes value every second.

  • Pure pull: (cheapest solution) to have the frontend fetch values periodically.
  • True comet: Initial pull and then push from server

Implementing this issue means we get much closer to a prototype of a collaborative graphsheet environment! :)

Frontend

  • If backend immediately return the list of affected cells after update (along with their values) then everything is easy.. It's just a matter of updating our comet registry inside the client in one go

See also

  • #28, which aims to address this in the frontend... (#28 might give some easier answers to this problem)

Opening and saving .xls{,x} files

It is more important for us to

  • Read .xlsx files

at this stage in the process.

However, it'd be nice if we could

  • write .xlsx files as well.

Pinned sideviews

Often times, I prefer to see the graphical rendering of a graph that is stored in one cell, but in the mean time, I want to edit another cell..

For example, suppose A1 holds a graph of cities, A2 and A3 hold string literals that specify a source and a sink, and A4=SP(A1, A2,A3).

It is preferable that I can pin the side-view of A1 while I'm editing A2 and A3.

Later on, we can extend this issue to pin side-view of multiple cells, not just one!

I think this would be a super-useful usability improvement!

Diagrams

I'm doing little experiments so I can bring Haskell's diagrams package (1, 2 and 3) into Minicell. It only took a few lines of code, and I got an SVG in sideview after a few minutes! (via diagrams-svg)

This past week I've been writing a lot in my notebooks about how Minicell can benefit from the magic of Diagrams, even more so when we implement = UNIXTIME() and let cell values depend on time. (See #33).

I've already been thinking about simple animations that I can make by superimposing multiple cells containing basic shapes (We are not too far from an actual interactive demo for this issue, I think!)

The following image is taken from 2:

This issue brings us:

  • Basic 2d animation support (specially parametric ones)

Formula bar

We don't have a formula bar.

A formula bar is an essential part of the spreadsheet interface.

How routes work in backend

anyRoute modelTVar req res =
case pathInfo req of
[ "minicell", "all.json" ] -> do
endpointShowAll modelTVar req res
[ "minicell", cometKey, "show.json" ] -> do
endpointShow modelTVar cometKey req res
[ "minicell", cometKey, "write.json" ] -> do
let cometAddress = (cometKeyToAddr $ T.unpack $ cometKey)
(params, files) <- parseRequestBody lbsBackEnd req
-- Upload files and get urls
fileUrls <- storeFiles files
print (params, fileUrls)
case files of
[] -> do
let ((_,formula):_) = params -- FIXME: lookup the parameter by name
case parse cellContent "REPL" (BU.toString formula) of
Left err -> do
let val = CometSLit cometAddress (show err)
res $ responseLBS status200
[(hContentType, "application/json")]
(encode val)
Right ast -> do
-- TODO: update the global database
-- TODO: delegate CometSLit transformation to a separate function
atomically $ do
modifyTVar modelTVar (Mini.modifyModelWithNewCellValue cometAddress ast)
endpointShow modelTVar cometKey req res
_ -> do
-- TODO: what if multiple files were dropped on one cell?
atomically $ do
modifyTVar modelTVar (Mini.modifyModelWithNewCellValue cometAddress (EImage (fileUrls !! 0)))
endpointShow modelTVar cometKey req res
url ->
res $ responseLBS status404 [(hContentType, "application/json")] (encode $ "Invalid URL " ++ show url)

See also

  • HTTP Servlets #23

GitHub interop

This is a large issue, better be broken into multiple other ones

One issue is combining different namespaces with each other. These are examples of namespaces:

  • GitHub issues (e.g. #42, #19, etc.)
  • Labels (e.g. Frontend, Haskell, P0, ...)
  • Fragments of source files at a particular SHA1 snapshot

Now if you think about it, there are multiple ways these namespaces are linked internally and also among each other:

  • Issues routinely reference each other (for example, this issue so far is linking itself to #42 and #19)
  • Issues routinely include fragments of code. Example:
    case cellValue of
    EGraphFGL g -> do
    let dot = showDot (fglToDot g)
    let dotPath = "../build/minicell-cache/" ++ (addrToExcelStyle cometAddress) ++ ".dot"
    let pngPath = "../build/minicell-cache/" ++ (addrToExcelStyle cometAddress) ++ ".png"
    writeFile dotPath dot
    system ("dot -Tpng -o" ++ pngPath ++ " " ++ dotPath)
    return $ CometImage cometAddress ("/minicell-cache/" ++ (addrToExcelStyle cometAddress) ++ ".png")
    ESLit s -> return $ CometSLit cometAddress s
    EILit i -> return $ CometILit cometAddress i
    EImage src -> return $ CometImage cometAddress src
    _ -> return $ CometSLit cometAddress (show cellValue)
  • Issues link to external urls (e.g. https://minicell.info)
  • Commits link themselves to issues (e.g. ??)
  • Issues link to commits (e.g. 48a2cc3)

Running example: Adding dependOn to GitHub issues

In a software project, issues in the bug tracker certainly do depend on each other, however GitHub doesn't provide any means to explicitly declare dependency between issues.

What's missing from GitHub issues?

Ticket dependency: Bugzilla has a feature that lets you describe relationships between tickets. Each ticket can dependOn another ticket, and one ticket could have many other tickets that depend on it.

Is this a good fit for Minicell?

This information structure (where one issue depends on completion of another issue) is an example of a graph1, and I think Minicell can do a great job providing an interoperability with GitHub issues.

  • Import from GitHub to Minicell
    • An =IMPORT_JSON function will load data from json
    • Display one issue per row
      • tags
      • issue title
      • open or closed?
      • various timestamps
      • assigned people
  • Write to GitHub issues from inside Minicell
    • Change the title of an issue
    • Add tags
    • Close/Open issue

1: it's an example of a lattice

How would it look like?

After we read GitHub data, we can wrangle graphs. To begin with, we can grep the issue description and mine links (e.g. #14 in the markdown) and use Minicell graph primitives to express interconnections among issues in our bug tracker.

Pulling GitHub issue data inside Minicell = First step towards bootstrapping

Would it be possible to someday use Minicell itself to track all the bugs and tickets related to implementation of Minicell?

See also

Video and Audio

I've been intending to bring a basic support for video and audio through ffmpeg for a while. It's wise to offload the idea into a GitHub issue.

This issue aims to simply the following tasks

  • How about we could slice (or trim) a video? (by precision of milliseconds)
  • How can we convert a video cell into an audio cell?
  • How can we sequence 2 videos one after another?

The following ffmpeg one-liner performs some of these:

$ ffmpeg -i z.mkv -ss 00:02:54 -t 49.1 -vn acodec copy sijal.mp3

Apart form this, ffmpeg can apply various filters on a video stream.

Furthermore, it can combine two or many video streams into one.

List of concrete actions

  • Add a video type to Minicell's backend (EExpr)
  • Add a video player to Minicell's frontend
  • Implement a handful of video editing formulas
    • the eval function will use ffmpeg in the background

Audio specific

  • Visualizing sound (as in this and this)

Video specific

YouTube

There's a wealth of content on YouTube, but there's no granular access to YouTube videos. Often times, I'm only interested in a portion of a longer video. Although YouTube supports links to a specific point of a video (and also an ending, if you use the embedded player) it still isn't convenient to access, retrieve and mix portions of YouTube videos with each other.

See also

Related work

COMET (frontend)

Motivation

Many values change in the backend, but the frontend is not notified about these updates.

Roadmap

  • (0.0.2)
    • (Singe-user) Frontend calls /minicell/all.json after every write operation
  • (0.0.3)
    • Receiving pings from WebSocket server to fetch /minicell/all.json
  • (0.0.4)
    • Receiving payload through websocket, without a need to call /minicell/all.json

What we currently do (Only pull)

Right now we re-fetch everything after each individual write to a cell. (Pull)

See ( ... , cometUpdateAll) in line 396:

CometUpdate cometKey res ->
case res of
Ok payload ->
let
val = cometValueTOEExpr payload
in
--( { model | cometStorage = Dict.insert cometKey (Debug.log (Debug.toString cometKey) val) model.cometStorage}, Cmd.none )
( { model | cometStorage = Dict.insert cometKey val model.cometStorage}, cometUpdateAll )

See also

  • #33, which is a meta-issue for COMET support in backend

haskell/fgl โŸถ elm-community/graph

fgl related tasks

  • Manipulate graph values by reading from other cells (see #13)
  • Convert to JSON and send over wire

elm-graph related tasks

  • Inside elm, read from JSON and convert to a graph type

Future work

We plan to provide means to manipulate the graph inside the client. I estimate that task to be as big this issue.

Dot and Graphviz

I love webgraphviz.com. It's probably my favorite tool for thought. I use it everyday, and I've recommended it to my friends. It's not your typical writing tool, of course. But if you're open-minded, you can get a lot out of it.

What is cool about webgraphviz.com?

  • (Suitable for fast comprehension) It gives you a visual output.
  • (Suitable for fast modification) You can modify the visual output with an easy-to-type textual syntax.
  • (Approachable) It's as permissive as HTML, even if your document has errors in it, it does its best to generate a graph for you.
  • It's dead simple.

Parsing graphviz files

Consider this example:

digraph G {
post -> {title body};
title -> "making websites for your friends";
body -> poppet;
glitch;
poppet -> "poppet.us";
nima;
"poppet.us" -> before_nima -> {wordpress} -> {database admin_interface};
"poppet.us" -> nima -> {jekyll digital_ocean glitch};
digital_ocean -> service -> {google_spreadsheets custom_haskell};
custom_haskell -> {no_database no_admin_interface};
google_spreadsheets -> modifiable_by_molly;
admin_interface -> modifiable_by_molly;
no_admin_interface -> google_spreadsheets;
no_database -> google_spreadsheets;
fedwiki -> eric -> glitch;
glitch -> {RCTE help_friends javascript ask_for_help}
glitch -> modifiable_by_molly;
}

Right now, we provide a basic support for parsing this file, although we are not converting it to an EGraphFGL value. I don't know why the implementation of dotToGraph in graphviz package does not pass on node and edge labels. Nevertheless, we can implement it ourselves using graphNodes and graphEdges

EApp "DOT" [expr] -> do
ESLit dot <- eval model expr
let dotGraph = parseDotGraph $ fromString dot :: DotGraph String
print $ graphNodes dotGraph
print $ graphEdges dotGraph
let okayGraph = mapDotGraph (const 0) dotGraph :: DotGraph Node
print $ (dotToGraph (okayGraph) :: Gr Data.GraphViz.Attributes.Complete.Attributes Data.GraphViz.Attributes.Complete.Attributes)
return $ ESLit (show dotGraph)

What we have in Minicell

  • render fgl graphs via dot and display as png inside the side-view

What could be done in Minicell

  • Make it so that we can type in a graph (using graphviz syntax) inside a textarea, and have Minicell backend parse and render the desired graph. (See #41)

GraphQL

It would be nice to explore how graphsheets can provide graphql endpoints.

I think our datatypes and querying capabilities are rich enough to handle some interesting basic examples.

I don't have time to implement this myself as of late 2018. Getting the core UI working is more urgent at this point.

GraphQL

  • Export sheets to GraphQL
    • Serve GraphQL queries
  • Import from GraphQL

Replace formula value via COMET

APART FROM  HANDLING ERRONEOUS FORMULAS
THIS ISSUE IS DONE

Elm updating Haskell via POST requests

The SimpleServer.hs currently stores the model inside a TVar.

[ "minicell", cometKey, "write.json" ] -> do
let cometAddress = (cometKeyToAddr $ T.unpack $ cometKey)
(params, files) <- parseRequestBody lbsBackEnd req
print (params, length files)
-- TODO: do something with files!
let ((_,formula):_) = params -- FIXME: lookup the parameter by name
case parse cellContent "REPL" (BU.toString formula) of
Left err -> do
let val = CometString cometAddress (show err)
res $ responseLBS status200
[(hContentType, "application/json")]
(encode val)
Right ast -> do
-- TODO: update the global database
-- TODO: delegate CometString transformation to a separate function
-- res <- eval model ast
atomically $ do
modifyTVar modelTVar (Mini.modifyModelWithNewCellValue cometAddress ast)
let val = CometString (cometKeyToAddr $ T.unpack $ cometKey) (show ast)
res $ responseLBS status200
[(hContentType, "application/json")]
(encode val)

This means we can use a command like this to write a new value to a cell:

$ curl -X post  http://localhost:3000/minicell/A2/write.json -d 'formula=''=SP(A1)'

(taken from 33df1f9#diff-7d442b7eb49f5fc377f51e74b291cfc1R40)

What needs to be done

  • We need to implement new functionality so that we make calls to this endpoint when we switch from Edit mode to Idle mode. The logic of this is implemented in the Save.
    • Additionally, we need to provide feedback to user when formula is invalid. The backend would reject invalid formulas.
  • Then we need to make it so that the backend return the same value that the /show.json endpoint returns
  • We must guide the interface to use the json result of /write.json and update the cell
    • We must remove the functionality that routes the string to Elm parser, instead try to incorporate the json result of write.json via Elm
    • we handle this by page refresh, but dependent cells must be updated too

Save addr ->
let (rho, kappa) = addr in
({ model | database = updateCellValue model.database addr (currentBuffer model.database addr |> parseBufferToEExpr model)
, mode = IdleMode (rho+1, kappa)
}
, Cmd.none)

  • Then we need to update COMET parsing in Elm

CometUpdate cometKey res ->
case res of
Ok payload ->
let
valueType = D.decodeValue (D.field "valueType" D.string) payload
val = case valueType of
Ok "EILit" ->
case D.decodeValue (D.field "value" D.int) payload of
Ok i -> EILit i
Err err -> EError (Debug.toString err)
Ok "ESLit" ->
case D.decodeValue (D.field "value" D.string) payload of
Ok i -> ESLit i
Err err -> EError (Debug.toString err)
Ok "EImage" ->
case D.decodeValue (D.field "value" D.string) payload of
Ok i -> EImage i
Err err -> EError (Debug.toString err)
_ -> EError ("COMET value not implemented" ++ (Debug.toString valueType))
in
( { model | cometStorage = Dict.insert cometKey (Debug.log (Debug.toString cometKey) val) model.cometStorage}, Cmd.none )
Err err ->
( { model | cometStorage = Dict.insert cometKey (Debug.toString err |> EError) model.cometStorage}, Cmd.none )

  • Then we should re-retrieve all cells that are modified due to this change in value.
    • Perhaps as a result of a successful update, the backend could respond with cells that need update, along with the new updated value. This way we can update the frontend without extra complexity.

Gunrock

Gunrock is a high-performance graph processing library that runs on GPU.

Setting up Gunrock and working with it directly requires a lot of work, and is intimidating for non-programmers as well as casual programmers (say a social network expert).

I propose we implement a Gunrock backend for Minicell. This way, non-programmers and casual programmers can run high-performance computations without investing time in setting up Gunrock and implementing code in C++ or Gunrock's Python API.

It seems like libgunrock.so is pretty solid. Here's an example usage with Python's FFI:

https://github.com/gunrock/gunrock/blob/ea18455ad9a760d12a2535dd4eea655ecbc69c78/python/breadth_first_search.py#L5-L26

I think it will be nice to try using libgunrock.so in Haskell.

The evaluator (#13) could have a new graph types for Gunrock, and the table of operations can use Gunrock to load large graphs into memory. Then we can use Gunrock to perform high-performance computations on graphs inside cells.

evalIO

We implemented a mock evaluator inside Elm, but it's time step up and plug in a real parser and evaluator to the system.

This issue intends to

  • finalize the operations table
  • implement the parser
  • implement the evaluator

Finalize operation table

From high-priority to low priority:

  • Graph primitives as mentioned in the manuscript
  • Importing from flat JSON or CSV (see #25)
  • Hypermedia primitives (see #22)
  • HTTP Servlets (#23)
  • Time and date
  • Current position of mouse

Margin notes

I experimented with a toy evaluator inside Haskell. The main heavy lifting is done inside the evalIO function. I will post a link once they are added to the repo.

I plan to implement these operations as part of the operations provided by Minicell

  • Referencing other cells
  • Graph primitives
    • max flow of a given graph
    • shortest path in a graph
    • neighborhood of a given vertex
  • Arithmetic primitives

Future work

  • JSON primitives
  • Import from HTTP endpoints

Images, drop zone and file upload

I made a demo of a simple drag and drop mechanism last week (before December 9th) and I really like it. This week (December 16th) I implemented a basic feature in backend to accept file uploads for each cell.

Describing a recipe in Minicell

I can upload files via $ curl now.. Here's an example:

minicell/Makefile

Lines 26 to 28 in 9bba94b

poppet:
curl -X post http://localhost:3000/minicell/A1/write.json -d 'formula=''https://www.youtube.com/watch?v=nLQRtCEX-E0'
curl -X post http://localhost:3000/minicell/A2/write.json -F 'file1=''@/Users/nima/code/2017/poppet-hs/assets/og-image.jpg'

And as demonstrated in the video, the frontend supports

  • basic drag/drop and also
  • support for cells that contain an image

However, I need to

  • Implement (in Elm) the functionality to send a POST request (similar to what $curl does)

Other notes

  • This will bring new kind of media as possible values for cells
    • Images
    • Audio
    • Videos (.mp4, .mov, .avi) and YouTube videos (with optional start and end)

What can you do with it?

  • Aggregate a region into a gif animation! =GIF(A1:A5) or =ANIMATE(A1:A5)
    • Manipulate the gif image
      • =GIFSPEED(G1, 0.5): Change playback speed
      • =GIFREV(G1): Reverse the frames of a gif

File upload

  • This is similar to #21, if we simply want to store a base64 encoding of the uploaded file inside the memory

What's next?

  • Upload sound clips via drag/droping files, then implement functionalities inside the frontend to play audio (this needs a new issue)
  • Drag/drop .mat files (related to #16)

COMET fetches everything as ESLit

What needs to be done

  • The backend needs to perform IO to generate a png corresponding to the fgl graph that is stored in the cell
  • Furthermore, the backend needs to send a URL so that Elm frontend can use img tag to display it in the side view
  • We need more CometValues in Haskell
    • not just CometSLit, but also CometFormula and so on
  • We need to be able to parse CometValues into Elm values
    • The decoding seems to be very mechanical

Elm files

CometUpdate cometKey res ->
case res of
Ok payload ->
let
valueType = D.decodeValue (D.field "valueType" D.string) payload
val = case valueType of
Ok "EILit" ->
case D.decodeValue (D.field "value" D.int) payload of
Ok i -> EILit i
Err err -> EError (Debug.toString err)
Ok "ESLit" ->
case D.decodeValue (D.field "value" D.string) payload of
Ok i -> ESLit i
Err err -> EError (Debug.toString err)
_ -> EError "COMET value not implemented"
in
( { model | cometStorage = Dict.insert cometKey (Debug.log (Debug.toString cometKey) val) model.cometStorage}, Cmd.none )
Err err ->
( { model | cometStorage = Dict.insert cometKey (Debug.toString err |> EError) model.cometStorage}, Cmd.none )

Haskell files

-- readTVar modelTVar
val <- do
model <- readTVarIO modelTVar
res <- eval model (ECellRef cometAddress)
return $ CometString cometAddress (show res)

Also, adding data to these lines:

data CometValue = CometAddr CellAddress
| CometString CellAddress String
instance ToJSON CometValue where
toJSON (CometAddr addr) =
object
[ (T.pack "value") .= (show addr :: String)
, (T.pack "valueType") .= (T.pack "ESLit")
, (T.pack "cometKey") .= (T.pack $ addrToExcelStyle addr)
]
toJSON (CometString addr str) =
object
[ (T.pack "value") .= str
, (T.pack "valueType") .= (T.pack "ESLit")
, (T.pack "cometKey") .= (T.pack $ addrToExcelStyle addr)
]

HTTP servlets

Example of servlets

  • Static file server (A_: endpoint, and B_: EExpr values)
    • Key/value file-server
  • Simple hit counter
  • erb, mustache, and other template engines (see also #36)

What if you could spin up HTTP servers from inside cells?

By that I mean:

What if, you could store HTTP servers, inside a cell?

I described this in my "Zine" notebook.

We are not trying to be a solution for persistency. (Ingres, MySQL, etc. are better at this) However, we aim to pull data from various sources, and we aim to expose contents of a sheet in as many ways as we can, including but not limited to

  • FUSE filesystem
  • Read-only HTTP endpoint that serves the data of a region inside the spreadsheet
  • An endpoint that responds to Neo4j queries (#14)

This issue is mostly about HTTP serverlets.

Suggested formula syntax

=RENDER("layout.html", "content", "Hello World!")

=RENDER("main.html")

See also

Incremental computing, cache for IO and Haxl

Incremental Computation and Minicell

In Minicell, these are our most computationally-intense domains:

  • Graph Processing (#11)
  • DSP and Audio processing (right now all our audio/video processing is handled by sox and ffmpeg, but imagining "Digital Signal Processing" primitives inside Minicell is realistic) (#50)
  • Any sort of external IO that only depends on content of input (e.g. making an image grayscale), not the state of the system (e.g. time, or random number generator) (#22)
  • It is also desirable to cache and mock certain remote IO

With respect to Incremental Computation, the main questions are:

  • How much computation can we save in Minicell if we have incremental graph algorithms? (perhaps implemented in an external library like Gunrock)
  • Does existing research in Incremental Computation promise effectiveness in the domain of graph processing? What are some related work on this?

Background

Spreadsheets are intersection point of

  • Working with streams of values
    • More like the notion of "temporal values" discussed in ICFP '97
    • (as in unix pipes), even though common spreadsheet environments don't exploit this enough
  • Immediate feedback (immediate recalculation and display of values)
  • Incremental Computation (faster recalculation of values)

As we support more streaming types (like audio, video, dynamic graphs, time and other temporal values), the codebase of Minicell becomes more complicated. On top of that, some of the IO that we do on large image, audio or video files can be costly. Of course we want to avoid unnecessary computation as much as we can.

Our attempts so far

As I was working on basic formulas for processing audio (#50, 19e4e83), I attempted to write a few lines of code to avoid unnecessary IO.

Instead of passing things down to ffmpeg, we first calculate an md5 sum of each file.
We name the output of audio computations based on the hash of their content.

Line 359โ€“363 check to see if we have previously calculated the result of =ACONCAT(A1,A2) or not.

let fullSrc1 = mconcat [ audioPath, src1 ]
fullSrc2 = mconcat [ audioPath, src2 ]
fileContent1 <- LB.readFile fullSrc1
fileContent2 <- LB.readFile fullSrc2
let cacheKey = mconcat ["ACONCAT", show $ md5 fileContent1, show $ md5 fileContent2 ]
let targetPath = mconcat [ audioPath, cacheKey, ".mp3" ]
exists <- doesFileExist targetPath
if exists then return () else

What's needed?

I'm not quite sure at this point. But I sense that we need to approach things in a more principled way. We need to be more organized about:

  • multiple graph types that we have
  • remote fetching, storage and IO
  • re-computation of values

(This issue addresses only the third one. We need separate issues for the first two.)

See also

ๅ›พ

This issue keeps track of development of the formula set.

  • (๐ŸŒŸ) G3=UNION(G1, G2)

Naming

I don't know what title to pick for this issue:

  • Graph combinators
  • A graph processing DSL
  • The operation table

Misc

  • graph comprehension (e.g. G2 = [ <VV, EE> | <V, E> <- G1, VV <- V, in_degree(VV) > 5, EE@(V1, V2) <- E, {V1, V2} โŠ† VV ])

Persistency and serialization

For now, a generic serializer would be enough. Ideally, we write a show to a file and read it into our own types.

We won't be able to retain the following across saves (however, we might be able to retain their "computed values (#32)" in style of Excel)

  • Gunrock pointers (see #16)

Need to separate "computed value" from "cell content"

It seems like the main part of work must be done in cometStorage in Elm.. (https://github.com/johari/minicell/search?l=Elm&q=cometStorage)

COMET values mask the underlying formula

We treated COMET values very specially when we implemented the frontend in Elm. But as we move along, the way we treat COMET, computed values and literals are changing.

  • Right now, the only thing that is passed to Elm is the computed value.
  • We need to make it so that the formula (before evaluation) is also passed, so that when the user wants to edit a formula cell, they have something to work with.
  • At this point we mask the underlying formula in the client, and only show the value that the evaluator gives to us

Description

We have addressed saving values via COMET (see #21) but something is fundamentally lacking from the data that we send from Haskell to Elm.

  • We need to extend COMET transmission so that both computed value AND original expression are sent from Haskell to Elm.

Parser

The parser and interpreter (#13) are related, but separate.

The parser is implemented with parsec. The code is mainly here:

cellContent :: Parser EExpr
cellContent = do
s <- choice $ [ formulaWithEqSign, numberLiteral, stringLiteral ]
return s
numberLiteral :: Parser EExpr
numberLiteral = do
num <- many1 digit
return $ EILit (read num :: Int)
formulaWithEqSign = do
char '='
formula
formula = do
s <- choice [ formulaCellRef, formulaWithOperands ]
return s
cometKeyToAddr cometKey =
case parse excelStyleAddr "" cometKey of
Right addr -> addr
Left err -> (-1, -1)
excelStyleAddr :: Parser CellAddress
excelStyleAddr =
do
column <- letter
row <- many1 digit
return $ (((read row) - 1), ((ord $ toLower $ column) - (ord 'a'))) -- This is ultra buggy (works only for A-F)
formulaCellRef = do
-- column <- try $ many1 letter
-- row <- try $ many1 digit
parsedAddress <- try $ excelStyleAddr
return $ ECellRef parsedAddress
formulaWithOperands = do
operation <- try $ many1 letter
char '('
args <- sepBy1 formula (char ',' >> many spaces)
char ')'
return $ EApp operation args

Right now, we support parsing these expressions:

  • 42
  • =42
  • =A1
  • =SP(A1)
  • Hello world!

Adding pair types (a,b)

In commonplace spreadsheets, each cell represents one value. Like a string literal, or a number.

It's unconventional for spreadsheets to store a list of numbers inside one cell (One +2010 system from MIT explores having lists as cell types)

I'm impartial about supporting lists as cell types for now. But I think, because of our emphasis on graphs, we need to support tuples. This way, we may have cells that capture a (from, to) relation (or a (from, to, label)).

Initial page load via COMET

Right now is handled in a very ad-hoc way

init : () -> ( Model, Cmd Msg )
init _ =
( exampleSpreadsheetRemote
, Cmd.batch [ cometUpdate "A1"
, cometUpdate "A2"
, cometUpdate "A3"
, cometUpdate "B1"
, cometUpdate "B2"
, cometUpdate "B3"
, cometUpdate "C1"
, cometUpdate "C2"
, cometUpdate "C3"
]
)

Removing surprises from the interface

Although the implementation of the interface in Elm is in good shape, there are missing bits here and there that make the interface unsuitable for a serious demo. For example

  • rendering is not implemented for some critical data types
  • the parser doesn't work
  • it's difficult to create new examples by interacting with the interface (examples are mostly hard-coded now)

For example, certain graph types are implemented in this file:

| ECellGraph (Graph Cell ())
| ESuperFancyGraph G
-- ^^^ A super fancy type
-- that allow you to jump from one cell to another
-- if cells are linked with respect to a graph (normally graph inside the cellUnderView)
| EGraph (Graph String ()) -- Good old "CellGraph" (<3<3<3)

but these functionalities are not implemented.

  • rendering of cells of this kind
  • combining graph values using combinators

This is a medium-size task. The scale of this task is bigger than "polishing", but smaller than implementing things from scratch.

Nix

nix-shell -p "haskellPackages.ghcWithPackages (pkgs: with pkgs; [fgl generic-random QuickCheck brick fgl-arbitrary hspec diagrams palette mysql-simple  hslogger wai warp aeson wai-websockets wai-extra wai-cors fgl-visualize graphviz wreq stache temporary pureMD5 time hxt tagsoup hoauth2 pandoc probability cborg serialise haxl fb http-conduit http-client-tls async hashable resourcet cabal2nix extra dhall heterocephalus csound-expression hint gitlib libgit2 hlibgit2 gitlib-libgit2 hlint])"

I use Nix for development, mainly because a one-liner in Nix properly drops me in a shell that just works. I add packages to the one-liner once in a while, but as of now, it looks like this:

nix-shell -p "haskellPackages.ghcWithPackages (pkgs: with pkgs; [fgl generic-random QuickCheck brick fgl-arbitrary hspec diagrams palette z3 mysql-simple logict HFuse hslogger aeson scotty])"

Since scotty doesn't work with Nix on Mac, I wrote SimpleServer.hs which uses wai and warp. My one-liner looks like this on mac:

nix-shell -p "haskellPackages.ghcWithPackages (pkgs: with pkgs; [fgl generic-random QuickCheck brick fgl-arbitrary hspec diagrams palette z3 mysql-simple logict hslogger wai warp aeson wai-websockets wai-extra])"

I tried to setup stack mainly because Haskero [1] and Intero [2] depend on it. I had success on Linux, but Mac failed me with esoteric link errors.

[1]: https://marketplace.visualstudio.com/items?itemName=Vans.haskero
[2]: https://github.com/commercialhaskell/intero

Formula error handling

Mark the cell with appropriate error messages in case of following

  • Division by zero (/0)
  • Invalid formula

TensorFlow models inside a cell

I've mentioned this to a labmate once. Minicell would be a nice playground to experiment with ideas like this:

[...]
Maybe you could work on a deep learning toolkit inside the spreadsheet environment..
One that is natural and effective for a non-programmer to work with,
and leverages everything that the spreadsheet interface provides

I donโ€™t know what the related work on this is already covering..
I'm sure others have done a couple of works in this spirit.
Perhaps nothing that leverages spreadsheets though.

Criticism

On the other hand, I don't necessarily think making models like that commonplace is a good idea. For example, disappointingly, there are more than 3-4 repositories on GitHub that are aiming to provide models for "ethnicity detection", "race detection", "gender detection" and other things of this nature.. The last thing you'd want is for some loan or insurance company to plug-in an "ethnicity detection" model to their budgeting spreadsheet.

My criticism is that models like this provide little explanation about the answer that they come up with. It's still an open challenge to make these system describe their answers in a way that human can understand. Any piece of work that tried to address this open challenge has been out of my realm of comprehension so far.

I think it would be "cool" to have TF models in a cell. It would probably keep a lot more GPUs busy, but cool and computationally intensive doesn't imply applicability in real-world phenomena. And I'm worried that having them in Minicell will reverberate something unholy. Models like this tend to encode, calcify and amplify hard-to-trace biases in the original training data. Let alone their wide applicability in extremely brutal surveillance.

The bright side

There are some other applications though. For example, I wish you could implement a clone of Dynamicland's object detection inside spreadsheets. (think of what https://paperprograms.org/ does, but entirely within Minicell instead of Node)

Piping a stream of images and extracting a stream of structured information and geometrical attributes from them is a great fit for the capabilities that Minicell (or any spreadsheet environment) is planning to implement. But I need more concrete use-cases that sound healthy and interesting enough before starting to bridge between Minicell and TF.

Namespace of people

The key difference between SourceForge and GitHub was that GitHub adopted <username>/<repo> addressing (the bazaar), mixing social networks and software projects, whereas SF.net encouraged a project-centric view of the open-source world (the cathedral). (This was facilitated, of course, by advances in distributed version control systems, DVCS, but GitHub had a particular emphasis on people from the get go.)

As #23 matures, Minicell spreadsheets become more than just spreadsheets. After #23, Minicell documents will resemble glitch.com applications (serving hypertext and dynamic webpages).

If we look at each minicell document as an "app", it will make sense for these "apps" to be forked and remixed as well. This is where this issue comes in.

One level of bringing collaboration inside spreadsheet environments is doing it the way Google does it. Realtime, colaborative editors, both for word processing and basic data processing and computation (spreadsheets and forms). Another is the glitch model, which puts trust in the node ecosystem and containers.

I think there's a mid-point between glitch model and google spreadsheet model. And that's Minicell. Collaborative, Approachable and Programmatic.

Multiline text editor

Problem

In the interface, we don't have a nice way to edit mustache strings.
Or a few lines of a function definition. (in JS, or Python, or Lisp)

Background

I added a basic support for Mustache templates here:

EApp "MUSTACHE" args -> do
let mustacheText = "Hello {{A1}} <a href=\".{{A2}}\">{{A2}}</a>"
let compiledTemplate = compileMustacheText "foo" mustacheText
a1InHtml <- eval model (ECellRef (0, 0)) >>= eexprToHtml
a2InHtml <- eval model (ECellRef (1, 0)) >>= eexprToHtml
case compiledTemplate of
Left _ -> return $ EError "Mustache compile failed"
Right template ->
return $ ESLit (L.unpack $ renderMustache template $ object [ "A1" .= (T.pack $ a1InHtml)
, "A2" .= (T.pack $ a2InHtml)
])

Even if we implement the formula bar (see #38) , I think we need a <textarea>-like editor so that we can edit mustache templates with ease.

The interesting thing is that as of now (DEC 19th), we provide a basic support for HTML rendering. The main issue that addresses this is #23, but to summarize, we are able to browse to http://localhost:3000/minicell/B2/HTTP/ and view an HTML document. Basic support for routing and serving remote images is included as well. Please refer to these snippets:

("minicell":cometKey:"HTTP":tail) -> do
-- TODO: first we must check the content of the cell
-- We should serve this endpoint only if
-- the content of the cell is a `=HTTP(arg1)` formula
-- where `arg1` represents a 2-column region.
-- We treat this 2-column region as a key-value hash.
-- where keys map to `tail`, and
-- values map to the content served at that URL
model <- readTVarIO modelTVar
case tail of
t -> do
let needle = case t of
x | x == [] || x == [""] -> "/"
[x] -> "/" ++ (T.unpack x)
_ -> ""
print needle
columnA <- sequence $ (\x -> do
s <- eval model (value x)
return $ (x, s)) <$> [ cell | cell <- database model, snd (addr cell) == 0 ]
let rowNumberOfslashInColumnA = find (\(_, y) -> y == ESLit needle) columnA
print columnA
print rowNumberOfslashInColumnA
case rowNumberOfslashInColumnA of
Nothing -> res $ responseLBS status404 [] ("No default index set for /")
Just (indexCell, _) -> do
let (rho, kappa) = addr indexCell
indexVal <- eval model (ECellRef (rho, kappa+1))
httpResponse <- eexprToHttpResponse indexVal
res $ httpResponse
-- TODO:if "/" doesn't exist, it's 404, or 403
_ -> do
res $ responseLBS status200
[(hContentType, "application/json")]
(encode $ mconcat $ ["Hello HTTP server at cell ", cometKey, " !", T.pack $ show tail])

eexprToHttpResponse cellValue = do
case cellValue of
ESLit s -> return $
responseLBS status200
[(hContentType, "text/html")]
(fromString $ s)
EImage src -> do
-- Download the image
-- Show the image
print $ mconcat ["fetching ", src]
-- response <- (get "http://poppet.us/favicon.ico")
response <- (get src)
return $ responseLBS status200 [] (response ^. responseBody)
_ -> return $
responseLBS status503
[(hContentType, "text/plain")]
(fromString $ "HTML output not implemented for " ++ (show cellValue))

Could we create a wiki with minicell?

With mustache and HTTP support (#36 and #23), and given how we provide a basic support for including images inside cells (#22), Minicell is at a stage that it can render simple HTML pages.

The pages, in increased order of difficulty, could be

  • A one-page webpage (rendering of mixed html-mustache, and occasional [ ] "transclusion" of cells, or even [ ] evaluating EExpr expressions
  • A static website (similar to Jekyll and other static site generators)
  • A simple wiki
  • Google-Forms style of form-filling

Work that facilitates this issue

We definitely want to be done with the following issues. They will make the current one much easier:

  • Store and serialization (persistency) #18
    • Rails and Django: Model
  • Serving portion of a sheet via HTTP (similar to Show website in Glitch) #23
    • Rails: Controller
    • Django: View
  • String interpolation, list iteration, extending templates, etc. #36
    • Rails: View
    • Django: Template
    • Express: Render
        To some extent
                 this issue reminds me of
       glitch.com
                     where they chose to bias towards javascript
                              (for practical reasons)
                                    we biased towards spreadsheets
           (for a different set of practical reasons)
  
   Let #36 populate cells,
   Let #23 materialize!
   Create dynamic hypertext.
   Share dynamic hypertext.
   Fork dynamic hypertext.

See also

A syntax for lists

We have (used to have?) type-level support for lists, but so far (as of v0.0.2) we haven't been utilizing lists.

To be more precise, I mean using lists as value for one single cell.

I think I have a use-case for lists.

As I was thinking through GitHub interoperability (#42), I thought it would be nice if we could establish a dependsOn relationship among individual issues.

[TO BE CONTINUED]

EGraphFGLE

This type extends EGraphFGL (Gr String Int) to EGraphFGLE (Gr EExpr EExpr)

This type came up in context of #39.

Are we a spreadsheet interface yet?

  • Cells that contain numbers, strings, dates, names, amounts of money, etc.
  • Cells that contain formula that take other cells as an argument
  • In-place re-computation
  • Formatted cells and content
  • Mixed-media (mainly plots, graphs, etc.)

Good examples

What's needed

  • More examples in Haskell so we can load them up via
    • =LOAD("cities")
    • =LOAD("org-tree")
  • More primitives (see also #11)
    • =MF
    • =SP

I have discussed the following examples with these folks:

  • Zhendong
    • Something with numbers
      • max-flow
      • =SHORTEST_PATH(G1, "davis", "berkeley")
  • Vu
    • Videos from YouTube (transhipment problems)
  • Serban and John
    • Load a .mat file via Gunrock API
    • Some application that Gunrock currently implements (e.g. Geolocation)
  • Ward
    • Mehrdad had a particular name for representing a graph in this style: FIXME
    • Wrangle graph from JSON Iterative modeling of corporate resources in a rapidly growing company
      • Load JSON from file into table: Import JSON inside the Haskell backend
      • Extract a few graphs from the data
        - [ ] For each manager, find the subtree of the organization that is under them and store it in a new column
      • Test out a few tasks that Ward demos (https://www.youtube.com/watch?v=E0N138ThyMI)
  • Duke and Mehrdad
  • Older ones
    • The cities (shortest path)
      • Wrangle graph from incidence matrix
    • The original Pinboard urls and tags (https://youtu.be/vxjYErBWWM8?t=701)
      • Tag cloud via in-degrees
      • Make various jumps (forward and backward) between namespace of tags and namespace of URLs
      • Filter bookmarks based on hostname
      • Filter bookmarks based on tag (Either tag1 tag2, Both tag1 tag2, Just tag1)
      • (some people think this is just an enhancement)

This is what we have so far

Right now, there are a lot of examples in Elm and Haskell..

For the purposes of this issue, the examples mainly need to be in Haskell.

-- emptySpreadsheet = Spreadsheet [] (IdleMode (0, 0)) [] [] (millisToPosix 0)
emptySpreadsheet = Spreadsheet [] (IdleMode (0, 0)) [] []
spreadsheetWithOneGraph = Spreadsheet [ graphCell (0, 0) vor ] (IdleMode (0, 0)) [] []

These are examples in other languages:

Neo4j

Neo4j

  • Serve Gremlin queries
  • Import graphs into cells from Neo4j graphs

Initial basic prototypes for the reactive interface

I'm trying to see how I can map reflex-frp abstractions to spreadsheet abstractions.

My goal is to have a basic example consisted of 3 cells:

  • Two cells containing an integer
  • one cell that multiplies the integers together

From the interface side, this is what I need to implement

  • I want to have widgets on the screen that allow me to edit contents of both the data cells and the formula cell.

UI Enhancements

  • 0. You immediately see the visible tabular structure, even if the spreadsheet is empty
  • 1. Typing up a table is fast. In particular, navigation back and forth between cells, rows and columns is simple (and not necessarily easy)
  • 2. Pasting things into the spreadsheet software (from web, tsv, csv) is really easy
  • 2. Navigate directly to tables
  • 3. Tables provide special shortcuts
  • 4. Painless drag and drop
  • 5. Table headers stay visible
  • 6. Tables expand automatically
  • 7. Totals without formulas
  • 8. Rename a table anytime
  • 9. Fill formulas automatically
  • 10. Change formulas automatically
  • 11. Human-readable formulas
  • 12. Easy dynamic ranges
  • 13. Enter structured references with the mouse
  • 14. Enter structured references by typing
  • 15. Check structured references with a formula
  • 16. Change table formatting with one click
  • 17. Remove all formatting
  • 18. Override local formatting
  • 19. Set a default table style
  • 20. Use a Table with a pivot table
  • 21. Use a table to create a dynamic chart
  • 22. Add a slicer to a table
  • 23. Get rid of a table

See also

  • Pinboard: 1 and 2

Profiling and Measuring time

How long does it take to eval a minicell query?

For example

  • When we compute something via Gunrock
  • When we generate GraphViz of something

Additionally, if we store the start timestamp of our computation, we can plot these information on a timeline!

See also

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.