kneasle / sapling Goto Github PK

A highly experimental vi-inspired editor where you edit code, not text.

License: MIT License

Rust 100.00%

code-editor editor experimental rust structured-editing text-editor vim

sapling's Introduction

Hello, I'm Ben White-Horne

I'm a software developer based in Cambridge and I currently work for MathWorks writing C++. I generally focus on writing low-level code with a focus on reliability and performance.

Outside of work I often work on side projects which can be found either here on GitHub or on my website. I also play trombone and drums, ring and make compositions for church bells, make lots of geometric modular origami, and play pool and snooker.

Here are some GitHub stats:

sapling's People

Contributors

Stargazers

Watchers

Forkers

zmilan kixunil wardtoulet protowalker chengchingwen stokhos gopherj jaywalker76 ayourtch nrdxp plecra sjoshid kuator xa888s icodein fvdveen iq-scm ndrezn nyngwang

sapling's Issues

Add integration tests for DAG

Currently there's no automated testing of whether or not the commands do what they are intended.

I'm not sure exactly how straightforward this is going to be; if you want to have a crack at it but are stuck with what to do, comment here and I'll give pointers!

Replace the current logs with a printout of what the user is typing

This'd be really useful for the audience of my livestreams, but we do need to redirect the logging elsewhere.

Add way to print sequences of `Key`s to strings

In a yet-to-be-published PR, I've made normal_mode receive tuikit::Keys instead of chars. However, there isn't function to convert multiple Keys into one string for easy displaying. We'd probably want a function in core to do that.

Incorrect cursor position after undo

After undo, the cursor should be returned to where it was just before that change was made. However, this is not the case - the cursor is moved to its location after the previous edit.

I think the cause of this is that there's some confusion over what the Paths in Dag::root_history are supposed to signify - in a snapshot (&'arena Node, Path), the Path should be what Dag::current_cursor_path should be set to when that snapshot is loaded in the undo history, but it's too late at night for me to puzzle this out.

Ideally, we'd store the history as an alternating list of Paths and Roots (always starting and ending with a Path) and then there'd be no ambiguity over what the Paths mean. Such a datatype would be an interesting but hard project with a fair amount of unsafe.

This is related to #27.

Remap `d` to `x`

Since d currently deletes the node under the cursor, it would make more sense to be consistent with Vim and map x to this instead.

Allow nodes to be RCed and deallocated from the arena

Currently, Sapling will never free up space in the arena even if history is rewritten and some trees will never be reached. This is essentially a memory leak, and will cause Sapling to just accumulate memory over time.

Some kind of garbage collection has to happen, because Sapling's arena allocator only produces immutable references to the nodes. However, because of the tree structure of the AST, we can rely on AST nodes having no reference cycles and so standard reference counting should suffice to prevent memory leaks. However, some custom drop code will be required (because we can't simply deallocate small parts of memory owned by the Arena - we need to notify the arena that this cell isn't being used).

So I think the solution is going to have to involve some kind of smart pointer that does reference counting but, instead of deallocating the memory it points to, instead notifies the Arena whenever the reference count hits 0. Also, AST nodes in the arena currently sit inside Item structures (which in its own right probably should be called Cell), which allows us to add extra fields for things like RC and memoisation of expensive functions (string conversions, etc.).

My proposed solution would be something like having a Ref<'arena, Node> smart pointer, which contains an RCed reference to an arena::Item<Node>, which can then be deref-ed into just a plain &'arena Node.

Node memory management is definitely not something that I want to screw up, and I've never had to deal with smart pointers or RCing before so if someone who knows more is happy to implement this then feel free :).

Macro to generate `AstClass` enums

On the back of #71, there will be a lot of easily macro-able code, which we should make a standard macro for generating.

Quit on Enter

I noticed that when I hit enter in normal mode, sapling quits

Implement command mode

To make implementation easier, the different modes in Sapling are implemented as a state machine (technically a DFA). So each 'mode' should have a struct which implements editor::state::State (like editor::normal_mode::State).

The most useful part of this trait is the 'transition function', which is run every time the user presses a key. It also returns Box<dyn state::State>, which allows for state transitions on key presses (i.e. if the user presses : in normal mode, it should return the default command mode state to enter command mode).

So implementing command mode basically boils down to the following:

Create a new State for command mode (would make sense to put it in editor/command_mode.rs).
Add an extra keybinding for : in normal mode, which switches the editor into command mode (see quitting the editor with q for how to do this).
Move w and q from insert mode into command mode, and if you want you can also add a command to write the .dot code for the Dag to either the log or a file.

If you want more pointers, then just ask 😁 - the state transition code is quite unintuitive.

Wanted to let you know about my blog post

https://mightyiam.medium.com/edit-code-as-code-e51ead8522b5

Add type-safe intermediate step for node names

Currently, when a command is run that requires a type argument (e.g. insertions) that char is not resolved until much of the checks have already been performed. This ends up with a whole load of craziness with the error messages not being correct (e.g. EditErr::CharNotANode sometimes gets returned at the wrong moment).

@stokhos, if you want to do this then comment here and I'll give you more info - this might be a bit fiddly but you've been doing very well with the other fiddly things so I'm sure you'll be fine.

Combine Ast::is_insert_char and Ast::is_replace_char

Both Ast::is_insert_char and Ast::is_replace_char perform the same purpose of performing tree validation and preventing illegal nodes to be added to the tree. Therefore, it makes more sense to have one method that also takes an child index, like Ast::is_valid_child(&self, index: usize, c: char) -> bool.

In fact, Ast::is_replace_char is not correct - for example, a JSON string could be either the key of an object (as in {"string": true}) or just as a 1st class JSON datastructure (like ["string"]). In the latter, the string may be replaced, but in the former it can't. We need to know what the parent is (in fact, we only need to know what the parent is if the grammar is context-free).

This shouldn't be too hard; maybe not a first issue but it doesn't require lots of code-base knowledge - the compiler errors and unit tests should provide guidance.

Rename current 'Command' to make way for actual commands

Currently all the code that refers to a 'command' is actually referring to a normal-mode keystroke command, as opposed to proper commands which start with : (like :quit in Vim).

I'm not sure what to rename current commands to... NormalModeCommand or KeyStrokeCommand? Both feel quite long, so perhaps we could call it something like KeyCmd.

We should also probably refactor the whole normal-mode command system into its own module, but that's some refactoring for another day.

This issue is probably quite easy to fix - all the changes are confined to src/editor.rs, and it should be a fairly simple refactoring job. Not the most exciting first issue, but straightforward nonetheless.

Go to previous cursor location

Do you think it is necessary to have a key that moves cursor to previous cursor location?

Like, if the cursor is on array, and I accidentally pressed p, the cursor moved parent, and I will have to press c and 4j to move back to the previous cursor location, this is very inconvenient

[
  true,
  false,
  false,
  { "key-1" : true, "key-2" : false },
]

Replacing fields results in an invalid tree

Suppose that the cursor is over a field in an object (cursor in <>):

{
     <"key": null>
}

If we type rt at this point will result in the following invalid object:

{
    <true>
}

AFAIK, this is not an easy fix (at least, no easy fix that actually solves the cause of the problem). The main issue is that objects implicitly create fields to contain their children, a very nice UX feature that causes some jank with tree validation.

Wrap operation

One editing operation that often comes up is wrapping a subtree in another node.
An example that happened to me recently in Rust if changing Type to Option<Type> in several places.
It'd be really great if we could have such wrapping operation built-in, as that'd help with these. However there's an interesting case - what if I wanted to wrap into Result? Which type parameter should be picked?

I envision something like this:

wo - wrap as Option<T>, no indication is needed as it's obviously one
w1r - wrap as Result<T, _>
w2r - wrap as Result<_, T>
w1o - same as wo
wr - error because of ambiguity (but equivalent to w1r would also make sense as this operation is more common in case of Result)

Dot command should work with this too, also multi-cursor operations. (Select all occurrences of Type within a subtree and wrap them.)

Idea: have editing operations specific to source editing types

So, there are a few things I can do if I’m an AST node (e.g. a variable). I could edit the string that’s the current leaf, e.g.

“Emacs -> emacs”.

I could also try and replace this concrete instance of a variable with a variable of the same kind,

“emacs -> kakoune”

But in the general case, it might make sense to rename all instances of the same string,

“emacs -> editor”

Because you’ve chosen modal editing as a paradigm, it may be useful to replace the c* a* and i* commands.

Any other things that I can work on?

Crash when insert an invalid char in array, and object

Insert invalid cha c in array and object will cause the problem.

This problem is related to cursor location.

sapling/src/editor/dag.rs

Lines 410 to 415 in 9fed519

 if !parent.is_valid_child(cursor_index, c) { 

 // Short circuit if `c` couldn't be a valid child of the cursor 

 return Err(EditErr::CannotBeChild { 

 c, 

 parent_name: cursor.display_name(), 

 });

String editing

Sooner or later we need to figure out how to do string editing - e.g. "foo" in Json or <foo> in XML.
I only have a rough idea now, but some things that make sense to me:

Have a "stringly" node (defined below) that can be edited like a string.
Allow languages to define escaping and unescaping functions - then the user can write unescaped text and it will become escaped when leaving string editation mode - this can be incredibly helpful as people often forget the escaping rules of different languages. This feature may be optional but it is probably easier to write it as mandatory for now.
I don't believe it's a good idea to represent strings as lists of chars - likely not useful and a waste of memory

The API of stringly node can be described by this trait (which may or may not be an actual trait)

trait StringlyNode {
    // Cow can save some allocations however we should inform the implementors which representation is more efficient.
    // My guess for now is that it'll be more efficient to store escaped version as unescaped one will be only used in string edit mode
    // Or maybe pass fmt::Write?
    fn get_string_unescaped(&self) -> Cow<'_, str>;
    // The user has to escape manually
    fn get_string_escaped(&self) -> Cow<'_, str>;
    // this generic can save some allocations while not caring about `&str` vs `String`
    // returns Err if the string contains banned chars
    // Alternatively we could use Cow to make this object safe
    fn set_string_unescaped<S: Deref<Target=str> + Into<String>>(&mut self, string: S) -> Result<(), SetStringError>;
    fn set_string_escaped<S: Deref<Target=str> + Into<String>>(&mut self, string: S) -> Result<(), SetStringError>;
    // Validates the string, may be called after user typing each char and revert the change if this fn returns false
    fn is_string_valid(&self, string: &str) -> bool;
}

I imagine this flow (pseudo code):

// when string edit mode is entered
// returned value imple StringlyNode
// returns Err if the node is not stringly
let stringly_node = tree.get_stringly_node(cursor)?;
let edited_string = String::from(stringly_node.get_tring_unescaped());
self.mode = Mode::StringEdit { stringly_node, };

// when leaving string edit mode:
match stringly_node.set_string_unescaped() {
    Ok(_) => self.mode = Mode::Normal,
    Err(error) => log_error!(error),
}

OT: so far I didn't have as much time to look at XML as I wished. I'm doing it now and maybe a bit tomorrow but I don't feel that great so I may be unable to finish it.

No stream today (5th December)

I've just got home from uni, and I can't get my laptop up and running in time.

It will be happening next week - I just can't find a better way to let people know than making github issues 😆.

Create a macro for generating TestJSON trees.

Currently, Sapling has two datatypes for JSON trees (JSON is analogous to &str whereas TestJSON is analogous to String). As the name suggests, TestJSON is used primarily for easily creating test cases, but is most of the time more verbose than JSON because of its long name. If we could create a macro for this then it would be very epic.

Something like:

test_json!([true, false, {"key" => "value"}])

should expand to:

TestJSON::Array(vec![
  TestJSON::True,
  TestJSON::False,
  TestJSON::Object(vec![
    (
      TestJSON::String("key".to_string()),
      TestJSON::String("value.to_string())
    )
  ])
]);

anyone who wants to have some fun with macros and knows what they're doing is welcome to do this 😁.

Refactor the command system

Currently we have two structures sapling::editor::Action and sapling::editor::Command, which have a lot of mutual duplication. It would be really nice to have a way to combine these two into one unified system, since this would make adding extra commands much easier.

Add syntax highlighting

As stated in the README, syntax highlighting is pretty straightforward to implement, since the AST is already parsed into text tokens and whitespace. So we just need to assign each text token to a syntax category when rendering, and then we get syntax highlighting.

Allow `h`/`l` to move to previous and next sibling

This would mean that h and k both map to Direction::Prev while j and l both map to Direction::Next. This is because the current keybindings are very unintuitive if the sibling nodes are arranged horizontally, as is fairly common (e.g. function parameters).

This would make a good first issue, the only changes required are in default_keymap in src/editor.rs. Remember to update the README.

Add keybindings for deleting nodes

For the time being, I think I should initially implement just d to delete the node under the cursor.

Add more logging messages

Ideally, we'd have log::trace! in every function, and log::debug! when debug output might be useful.

Incorrect cursor location after undo

After undo, the cursor should be returned to where it was just before that change was made. However, this is not the case - the cursor is moved to its location after the previous edit.

Alternative to "go to parent/child/sibling" navigation

I saw there was an issue about Leap Technology #44. I admit I haven't read the whole discussion but I agree with @kneasle it seems just like / and ? in vi. I don't use / for navigation because identical words often repeat throughout the code. On the other hand I also agree with the author of the issue that navigating the tree is much more tedious than using a mouse or just a cursor.

I propose an alternative solution I fell in love with which is used in amp. amp has a jump mode. In this mode it displays overlapping ids on top of the code. Each item you can jump (basically every word) is assigned a short id (2/3 letters). When you type the id, amp instantly moves the cursor to the beginning of this item. This solves the problem of identical words.

This solution is also used in browser extensions which enable "vim mode".

Avoid cloning midstep

Currently, when editing a node, the code clones the node, edits it mutably and then stores the result. This is slower than an alternative approach: as a collection is being cloned one item at a time, the node to be edited (replaced or deleted) is compared against the item being cloned and if they match, the new one is inserted instead of the old one (or skipped in case of deletion).

I imagine signatures roughly like this:

fn delete_child(&self, child_index_to_delete: usize) -> Result<Self, Self::Error>;
fn replace_child(&self, child_index_to_replace: usize, new_child: Self::Child) -> Result<Self, Self::Error>;
fn insert_child(&self, index: usize, side: Side, new_child: Self::Child) -> Result<Self, Self::Error>;

One interesting property of this is that it also statically guarantees that the editor won't ignore the error and store a malformed node somewhere.

(I wanted to attend the stream but mismanaged my schedule. :( )

Have undo and redo keep the cursor in the same/similar position in the tree

replace crashes sapling

For an array of size 3, cursor on 1st node, 4rt crashes sapling

Add `i` and `a` as keybindings for insert before/after cursor

Rename `CursorPath` to simply `Path`

I think this is fine, despite the potential naming collision with std::path::Path on the basis that we are unlikely to use both in the same code. @Kixunil - thoughts?

This'd be a fairly straightforward refactoring job - there might be some rogue pieces of documentation left to patch up but it should be nice and easy.

Make a uniform logging API

The current logging API has many issues that could do with being resolved:

It's entirely defined in editor.rs, even though it's unrelated to the editor functionality. It therefore feels illogical to use it elsewhere in the code.
It puts all the log messages in a (non-cell) field of editor::Editor, which requires &mut self to do any logging, and also ties the logging functionality to the editor::Editor struct.

So ideally, we'd have a separate logging module that by default just forwards to stderr/eprintln, preferably with ANSI colouring for the log levels. We can sort out a better logging system later.

Thinking about this, there are almost certainly proper logging APIs out there that would stop us from reinventing the wheel.

Crash bug

Opening the default JSON tree, and then typing cxxxuR will cause Sapling to crash because it tries to move the cursor to the child of the root, which no longer exists.

I think this is caused by the requirement to save cursor locations after calling DAG::cursor_location. Therefore, if an edit moves the cursor then the new location won't be saved to the history. In most cases, this doesn't cause a crash but in this case the cursor movement was to avoid referencing nodes that don't exist and so this causes a crash.

Add better documentation for `Path`

Currently there is no good documentation for how Paths are used to traverse trees. This is fairly non-trivial and for the longevity of Sapling it would be good to document any non-trivial parts of the code. ~~This could probably be done in parallel with #45, which itself can't easily be done in parallel with #36, because it will generate merge conflicts.~~

Cursor location after insert and quit

I noticed that after insert in DAG, the cursor stays at where it was. It would be be convenient to move the cursor to the new added node.

After quitting, the DAG still exists, until it goes out of the scope. Is this an expected behaviour?

Allow specification of arbitrary syntax trees from grammars

As said in the README, this is a very difficult challenge. I believe that the closest project to this is tree-sitter, and to be pragmatic I think that (at least initially) Sapling should try to use tree sitter as much as possible for parsing (even though it is not ideal for our use case). This way Sapling can actually function as an editor whilst we potentially build a more niche parsing system.

Leap technology instead of tens/hundreds of commands/operations/...

After seeing your videos and reading the rant on "text" (programming code) editors, I can see that the AST editing experience tailored for the underlying syntax/language/... leads to an excessive amount of commands/operations/... the user needs to learn first. It feels even more difficult than plain vi bindigs.

Therefore I dare to propose adopting the "Leap technology" as the way to lessen the amount of things to learn.

Thoughts?

No stream 2020-12-26

Hey guys; there'll be no stream today - it's too close to Christmas. I will be back to streaming next week, though.

Rename `DAG` and `JSON` to pascal case

In the crates.io guidelines, names which are acronyms should still be written as one word (like Ast is now). However, Sapling's code currently has DAG, JSON and TestJSON which should be Dag, Json and TestJson respectively.

This will likely break any in-progress PRs. I have one in progress (but not published yet), so please don't do this until I remove this message. Thanks!

Allow exporting of the DAG datastructure to some graphing program

This would allow for better visualisation and debugging of the inner workings of the editable_tree::DAG data structure.

It would also be good to help new developers understand how Sapling stores nodes.

is_insert_char seems to be redundant

I wanted to take a look at cleaning up that nesting in editor.rs and it seems to me that is_insert_char is not really needed as the following two functions essentially do the same check (or if they don't, they should).

But I'm not entirely sure, maybe I'm missing something. So I thought you may know the answer quicker than I can analyze it completely.

Code reorganisation

I think that all 'core' datatypes (editable_tree::{Direction, Side, cursor::Path}, ast::size::Size) should probably live in their own module. I'm not sure if this is a good/better way to group these datatypes, but I reckon we'll probably need a place for really basic datatypes at some point.

We could re-import them in main.rs, so all other modules can import them as simply crate::Path or crate::Size, but I'm not overall convinced that this is useful, particularly with the easy name collisions with std::path::Path.

Once we've moved editable_tree::cursor::Path, the whole editable_tree module doesn't make any sense. So I think we should move editable_tree/mod.rs to editor/dag.rs, so that DAG is now editor::dag::DAG.

So pretty much the changes would be as follows:

editable_tree::{Direction, Side} -> core/mod.rs
ast::size::Size -> core/mod.rs
editable_tree::cursor::Path -> core/path.rs (now core::path::Path)
edtable_tree::mod.rs -> editor/dag.rs (so DAG moves from editable_tree::DAG to editor::dag::DAG)
We should probably re-import core::path::Path as core::Path, so we'd add pub use path::Path to the top of core/mod.rs
Delete any now-empty files/directories

	if !parent.is_valid_child(cursor_index, c) {
	// Short circuit if `c` couldn't be a valid child of the cursor
	return Err(EditErr::CannotBeChild {
	c,
	parent_name: cursor.display_name(),
	});

kneasle / sapling Goto Github PK

sapling's Introduction

Hello, I'm Ben White-Horne

sapling's People

Contributors

Stargazers

Watchers

Forkers

sapling's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs