1j01 / tracky-mouse Goto Github PK
View Code? Open in Web Editor NEWMouse control via head tracking, as a cross platform desktop app and JS library. eViacam alternative.
Home Page: https://trackymouse.js.org/
License: MIT License
Mouse control via head tracking, as a cross platform desktop app and JS library. eViacam alternative.
Home Page: https://trackymouse.js.org/
License: MIT License
Could make it feel more magical, and less technical.
Currently the app supports dwell clicking (hovering in one spot long enough to trigger a click), but a gesture would be faster.
There is already the facemesh library included for detecting the pose of the user's face, and whilst it doesn't look like it handles eye or eyebrow movements well enough, I think detecting a smile or open mouth should work fine.
It might cause accidental clicks sometimes though when it loses proper tracking of the face, we'll have to see if that's a problem.
You should still have the option of using dwell clicking.
See #1 for support for clicking by blinking.
Right now the dwell clicker can start before the camera stream has even started.
Now, it may be desirable to use the dwell clicker separate from the head tracking, with an eye tracker, especially once I make the dwell clicker more useful with knowledge of system controls (#40) and controls on web pages (#27), but if the head tracker is enabled, the dwell clicker shouldn't start until the head is tracked.
An idea seen in Polymouse (research software which looks hard to install, but is also intriguing.)
Similar to facial gestures (#25), vocalizations would be a lot faster to execute, and more relaxed than dwell clicking, and can allow natural extending of the click by extending the sound (for dragging things, or painting, etc.)
Actually a hybrid with visual detection of facial gesture (#25) might be good. Make a popping sound with your lips to start a click, and close your mouth to stop it.
That way, you can yawn without worrying about clicking (as opposed to an open mouth gesture), in theory.
And it could require mouth movement combined with sound so that it's less likely to interpret some background noise as a click.
It would also be possible to use pitch and volume to vary brush parameters like thickness and opacity, if the program acted as a tablet driver, or if software wasn't so stupid, at a fundamental level. (I don't know how to make a driver! And software would have to be configured with an understanding that "pen pressure" means "pitch" or what have you. Stupid. Not stupid enough that it's not a good goal, but I hate the fact that that's the best we could hope for, without custom software for every application, y'know?)
Or perhaps pitch could control speed of mouse movement with the head tracking... (Controlling something within the software like that would be easier to implement.)
There is so far a global shortcut, F9, which toggles both features on/off at once in the desktop app, with an overlay on the screen to tell you about the shortcut and the current state.
The same shortcut is listened for in the web library, although without an overlay message, and it doesn't toggle the dwell clicker.
It toggles the dwell clicker in the Electron app because I have it set up to toggle the dwell clicker when the head tracker is toggled.
There should be a way to toggle each feature on/off, visible in the UI. Perhaps just two checkboxes, or perhaps big toggle buttons in order to be more prominent.
Currently facial pose affects the cursor position. This will be a problem for facial gestures (#25) if every time you try to click it moves the cursor.
First of all, this could be solved by using the face pose directly, if it were fast and accurate enough. This would also allow rotation to move the cursor more naturally, as right now, using points tracked on the surface of your face as projected in the 2D camera image, movement slows and stops as the point reaches the edge of your projected face, or in other words as the tangent of your face's surface where the point is tracked becomes parallel to the camera. In other other words, the movement of the theoretical 3D point that it's tracking becomes parallel to the camera as you rotate your head 90 degrees, and this parallel movement (depth) is not tracked.
Secondly, this could be solved by tracking a point on the head unaffected by facial gestures. Faces are, however, incredibly expressive! I see only a few candidates:
Oh, I just thought of a third option. Like the first, use the head rotation as tracked by facemesh, but since it's slow, use it only for movement not otherwise handled by the tracking points. Calculate the movement to be done by rotation and subtract the movement to theoretically already done by translation by looking at either the tracking points's movement since the last facemesh update, or some projected points (based on the last facemesh result and the new one) — not sure which. This is probably a terrible idea though. It technically provides the best of both worlds, but in a likely confusing, unpleasant mixture.
It will have to work a little differently because on the web I have to use events (not query the current mouse position), whereas on the desktop I have to query the current mouse position (and don't have any mouse movement event available.)
I should at least share the timing constants, which may become configurable later.
This isn't as important as in the desktop app, but without this, in a drawing program like JS Paint, it jitters back and forth between the system mouse position and the head tracking mouse position, if you try to draw something with the mouse.
It gets an error in the main process because the window is destroyed.
It should either unregister the shortcut when closing the app window, or the global shortcut handler should handle reopening the window when needed.
Uncaught Exception:
TypeError: Object has been destroyed
at Function.<anonymous> (/Users/io/Projects/tracky-mouse/tracky-mouse-electron/src/electron-main.js:186:14)
Facemesh is run asynchronously in a Web Worker. It's sent camera frames, does its processing, then returns the face location predictions, but by the time it's processed it, the world has moved on: there have been multiple camera frames, and points tracked on the image have moved, moving the pointer (it doesn't wait for facemesh; it uses point tracking which is faster).
The goal is to place (and remove) tracking points on the image based on the face location, but the face location information we get is for an outdated camera frame.
Now, this generally isn't that much of a problem. It causes some jitter due to points being added and removed stupidly, but the software still generally tracks your head movements.
It can be a huge issue though, if hardware acceleration is disabled, if I recall correctly?
Make facemesh fast enough to run it synchronously.
First I tried moving the facemesh result, translating it by the averaged movement of the tracking points during the time it took facemesh to return results. That kind of.. almost works? But it's sort of backwards....
In order to compensate for the latency of results from the Worker, I made a "time travel" system:
...but it was too slow to actually do the replaying. Tracking the points is fast enough to do in realtime, but when time is bunched up like this, it causes a lot of lag, and makes it unusable.
To make this work I would need to optimize it.
To be honest I don't remember if I implemented this in order to address this or not, but I feel like I made the adding and removing of points more strict, more constrained to the facemesh result, and even though the result is outdated, this actually helped a lot?
This isn't a huge issue at this point, since:
As long as the hardware is good enough?
Look I worked on this a long time ago, I don't remember the situation.
Even switching between windows often causes the mouse to jump, because the head tracking gets lost, because the lighting changes, because the overall luminosity is different between the windows.
This happens when lighting isn't great, i.e. you don't have a lamp facing you, and the light of the screen is a significant factor in the overall amount of light reflecting off you.
I plan to include some guidance about lighting into the app, but I wonder if this could be improved by normalizing the luminosity histogram, perhaps selectively, like with a vignette for a multiplier of how much each pixel matters, approximating where the user's face likely is, without being too specific / relying on facemesh too much. My reasoning for focusing on the user for lighting normalization is that the user's face is what's important to normalize to keep tracking working, and the background scenery may not change nearly so much, since it's further behind, further away from the screen, and more affected by other light sources. If the background were to be counted equally to the face, the image of the face may not be normalized enough.
Add options:
There is a feature for regaining mouse control (pausing the head tracker temporarily) by just moving the mouse normally.
You don't want to interrupt head tracking by accident, so there's a threshold of movement before the mouse will take control from the head tracker.
Continuing to move the mouse continues manual control, but it doesn't continue as easily as it could.
I think a smaller movement threshold for maintaining manual control vs taking over (interrupting) would be good.
Currently it always starts head tracking and dwell clicking when the app starts up.
First need #5
The UI layout uses vw
/vh
units which are percentages of the viewport size of the whole web page. This doesn't work for the embedding use case of the JS library.
The layout is also relying on the current sizes of things outside the webcam view to define how the webcam view shrinks, which is fragile.
ResizeObserver
could be used to make the layout more robust.
I just need to find a good storage option that works for both Electron and normal browsers.
And I should include a format version number for upgrading.
Similar to when you change your screen resolution, changing settings like sensitivity can make it hard to interact with your system if too extreme (e.g. mouse barely moves, or mouse zips from edge to edge).
So I think a "Do you want to keep these changes?" dialog would be helpful here.
I want a tutorial/guide/setup/help system built into the UI, with explanatory graphics, maybe even some animations.
Coach user on:
guvcview
can magically fix a webcam not showing up (worked for my Logitech C920 when it wouldn't show up in applications even after restart, but was listed in lsusb
) (source)I should make the sliders bigger. Maybe add some tick marks, but that's just a bonus.
It's meant to be hidden from the taskbar, but it's showing up in Windows 11.
I've had nothing but problems with the Windows 11 taskbar, so I don't know if this is even my fault - icons missing, freezing, taskbar buttons overlapping, windows not minimizing, etc. not to mention that they've removed the ability to choose which side of the screen the taskbar goes on! I would install Linux in a heartbeat, but I can't access my BIOS...
But hopefully it's a simple configuration issue.
I want to make a browser extension, which would:
There are a few projects that aim to provide Electron-like functionality without bundling Chromium.
Realistically, I'll probably stay with electron for this project. I already have a lot of IPC set up between the three different processes (main, main window renderer, and screen overlay renderer) at this point. But I'm moving all my todos to issues, and including this for thoroughness. And I'd like to at least check out Apptron. It probably doesn't support screen-sized always-on-top click-through transparent frameless windows, being a new framework and all, as that's quite an edge case!
When you go to the edge of the screen, it stops, and then if you move at all in the other direction, or if the head tracking jitter is detected as you moving at all in the other direction, it will move the pointer away from the edge of the screen.
This is useful for a simple form of calibration — move your head further past the edge of the screen so that when you come back it'll be offset by that distance, and you can reach further away from that edge.
But the hard stop at the edge of the screen makes it hard to click at the very edge, which is needed for some operating system features.
Adding a small margin outside of the screen, where it keeps track of the position, past where it can actually move the mouse, should make it easier to perform clicks on the edges and corners of the screen.
It shouldn't be too large as to affect your ability to calibrate as I described above.
To be clear, it's not that hard to click at the edge right now, you just have to keep moving towards the edge during the dwell time, so it doesn't detect anything as moving backwards.
I want to start producing builds early. I can copy a lot of boilerplate/configuration from jspaint.
Electron and Electron Forge both pretend to be way simpler than they are in practice, but what are you gonna do?
There is a feature for regaining mouse control (pausing the head tracker temporarily) by just moving the mouse normally.
It returns to head tracking after a time. When it does, it jumps from whereever you moved the mouse to, to the location it would have been had it not switched to manual control temporarily.
This is confusing because you can't see where the pointer is going to return to, and often can't find the cursor at all because it's suddenly somewhere you're not expecting, and hazardous because it starts dwell clicking right away.
I see two solutions to this:
To enhance dwell clicking, showing halos, centering the dwell click indicator, and use a UI element's bounding box as the region to dwell within to click it, we need information about UI elements under the cursor. For web pages, this may be done through a browser extension (#27), but for native apps we'll need to use system-specific accessibility APIs — or a cross platform one, but that will probably bring too much cruft, and cause friction.
A good strategy might be to use a high level API for a proof of concept and then, to reduce unnecessary dependencies (to make it easier to install), try copying only the parts of code that are needed.
But it may be easier to just use the native system APIs directly from the start. Have to vibe it out.
Windows looks like it has a good API for this. I think this covers everything I need:
There is also Cobra WinLDTP, which shares a cross-platform API.
local position = currentElement:attributeValue("AXPosition")
There is also ATOMac - Automated Testing on Mac, the macOS version of LDTP.
I could use the LDTP API, or the underlying AT-SPI API.
In particular, getobjectnameatcoords
which takes some code from Accerciser.
If the user brushes their hair out of their eyes, the cursor is moved before tracking quickly corrects.
I like the idea of being perfectionistic about this, and detecting if any object (such as a hand) occludes the face, and ignore its movement, e.g. by removing tracking points.
Could detect objects by their differing optical flow... theoretically. Unless the hand comes to rest... face palm.
This is not worth it. Seriously, it's good already. It hardly interrupts for a second.
When the phone changes from portrait to landscape or visa versa, the aspect ratio isn't updated, and the picture becomes squashed.
The head tracking still functions, but it looks ugly.
Sometimes I get up and pace around without thinking to turn off the dwell clicker, and end up closing windows while I'm not looking.
I'd rather it ignore my face and stop dwell clicking if I'm far away, and ignore any other faces in the background.
A setting to ignore faces smaller than some size threshold should suffice.
That said, it's an approximation of attention, which is an approximation of intention.
More explicit ways of triggering clicks may be much more important.
mainWindow.webContents.send("shortcut-register-result", success);
only happens once in the app's lifetime, and in the renderer process, it looks for window.shortcutRegisterSuccess
before attempting to listen for the global shortcut:
if (window.onShortcut && window.shortcutRegisterSuccess) {
window.onShortcut(handleShortcut);
} else {
addEventListener("keydown", (event) => {
// Same shortcut as the global shortcut in the electron app (is that gonna be a problem?)
if (!event.ctrlKey && !event.metaKey && !event.altKey && !event.shiftKey && event.key === "F9") {
handleShortcut("toggle-tracking");
}
});
}
If the page is reloaded, it doesn't get shortcutRegisterSuccess
and so the shortcut no longer works — not even within the app, with the keydown
listener, because it's intercepted globally.
This is only a problem during development.
It should show the webcam view temporarily, so you can see what it's talking about, and see what's going on.
Either 1. the app window should be brought to front (but not focused), and hidden (after a second) if the issue becomes resolved (unless the window got focused by the user, in which case it should stay visible), or 2. the message and webcam view should be shown on the overlay window, and should avoid the mouse, either hiding mostly and blurring, or moving out of the way, so that you don't think you can click on them. Or you should be able to click on them, but that's more complicated.
(Showing the webcam on the overlay screen would mean... using WebRTC?)
These warnings should not show up immediately, only if the condition persists for a few seconds. Although jumps in lighting can cause problems (#22) (and automatic brightness may bring it within normal levels before a reasonable period of laxity), 1. it would probably be too annoying if these warnings showed fast enough to catch that happening, and 2. it's more the change that causes that issue, not the lack of absolute luminosity; that is, it could cause a bad mouse jump while brightening too.
This should be tested with regression testing, i.e. using recorded footage, not just tested live based on the present lighting conditions.
I haven't touched the menus yet.
On macOS there's a default About screen in an "Electron" menu that doesn't exist on Ubuntu — describing Electron of course.
Every app needs a good About screen.
Currently, the user's face is detected in order to place tracking points, and optical flow is then used to track these points on your face, and only the optical flow influences the final cursor movement.
In this existing scheme:
What if we auto-calibrated based on the head tracker's face orientation, perhaps making adjustments only during otherwise-detected movement?
Like a perpetual motion machine's secret magnetic "kick", or a magician's slight of hand, it would subtly adjust the mouse position so that it ends up at the edges of the screen when tilted a certain amount, and centered when facing forward.
Drawbacks:
Formula:
Assuming the tilt can be normalized, or assuming the camera and screen and face are directly in line with each other,
a formula for the other part might be fairly simple, something like
adjusted_x = x + (abs(delta_x) * auto_calibrate_strength * (x_implied_by_tilt - x)
where x_implied_by_tilt
is the x position on the screen that would be mapped purely from the head tilt
where delta_x = x - x_previous
where x
is the latest x position from optical flow tracking
It's probably a little more complicated, like maybe the delta_x
factor should be raised to some power, or clamped, etc.
On the plus side, it could be made so that with auto_calibrate_strength = 1
, it purely uses the head tilt, so a separate head tilt mode wouldn't be needed. (Again, I'm not sure about the delta_x
part in regard to this.)
The name "Tracky Mouse" sounds like it could be software that tracks your mouse movements rather than software that uses (head) tracking to move your mouse.
Please, passing visitor, help me brainstorm!
Ideally the name should be open to eye tracking as a future feature, although several of my ideas are head-tracking specific.
For serious usage, people may want to automatically start the program.
Currently a red rectangle is shown around the border of the screen if you turn the dwell clicker off while a dwell is in progress.
It should simply show the dwell as canceled (ideally with an animation: #12)
The red rectangle represents an occluding element. There's no occluder in this case, but I see I do show_occluder_indicator(occluder || document.body);
Perhaps this is just a stopgap in lieu of a dwell click cancel animation? Or perhaps there's some case I'm not considering. I should take a peek at the git blame before changing this, but it's probably fine.
(The red dotted outline looks almost like it's starting to record the screen or something, when it covers the screen.)
May also be causing #3 — I'm getting this terrible lag on a MacBook Air (Early 2014), and it could be stopping the global toggle shortcut from working as well. May be clogged IPC tubes.
I could try to measure the latency, and reduce processing accordingly.
Even mediocre webcam-based eye tracking could be good, when combined with head tracking, in a hybrid approach as seen in Precision Gaze Mouse and PolyMouse. Eye tracking is used for quick movements to any place on the screen, and head tracking is used for fine adjustment.
Hello @1j01 !
I was playing with your demo available in: https://1j01.github.io/tracky-mouse/
Congratulations for your studies in this topic, and for group these knowledge.
Is there any way to click with eyes or head movements?
I've found that moving diagonally requires too much head movement compared to horizontal/vertical movement.
Acceleration curves may be playing a role in this. Currently the acceleration curve is applied to deltaX
and deltaY
independently, with the distance
parameter being ignored here:
// Acceleration curves add a lot of stability,
// letting you focus on a specific point without jitter, but still move quickly.
// var accelerate = (delta, distance) => (delta / 10) * (distance ** 0.8);
// var accelerate = (delta, distance) => (delta / 1) * (Math.abs(delta) ** 0.8);
var accelerate = (delta, distance) => (delta / 1) * (Math.abs(delta * 5) ** acceleration);
var distance = Math.hypot(movementX, movementY);
var deltaX = accelerate(movementX * sensitivityX, distance);
var deltaY = accelerate(movementY * sensitivityY, distance);
If you picture the head as a sphere, it makes sense that diagonal movements are weakened, due to the projection, in combination with the acceleration curves.
Tilting up, down, left, or right, the projected point is moved in a single axis, whereas tilting diagonally moves sqrt(2)/2 in each axis.
When spread across two axes, with the acceleration curves applying separately, the exponentiation isn't as high.
That said, there may be a reason why I didn't use the distance
parameter here; maybe it even makes it worse somehow.
I might need a separate sort of filter to compensate for diagonal movement feeling subdued, reminiscent of the pin-cushion adjustment on old CRT monitors.
Related:
Since moving your head around is not an exact science, it's useful to be able to limit mouse movement to configurable boundaries.
It should be visually obvious whether a click takes place or not.
I could animate canceling a dwell click, with a blur, maybe shape it like an X.
And I could flash a circle outline or target symbol when it actually clicks. Maybe an MLG hitmarker.
I should make sure not to show a click indicator for a drag release. Maybe show an octagon outline or an open hand (letting go / saying stop, but not facing the user, because it's meant to be analogous to the user.)
It should be visually obvious whether a dwell will click or release from a drag.
I show a stop sign (octagon) when releasing from a drag, but it's the same color as the normal indicator (red), and at a glance, an octagon is not that different from a circle!
I could change the normal indicator color.
(I also hope to improve the visibility of the indicators generally.)
Currently the website lives at https://1j01.github.io/tracky-mouse/ and includes a short, yet very technical, blurb, and embeds the Tracky Mouse UI as a demo, but doesn't feature dwell clicking.
This demo was created when Tracky Mouse was still in the proof-of-concept stage, and it's now at least a minimum viable product.
I want a professional-looking website that outlines what the software is, who it's for, shows off the capabilities as much as possible, and includes a download for the desktop app, as well as some instructions for developers wanting to integrate Tracky Mouse into their products.
A new domain would be nice too.
Oh, and I've designed a tessellating cursors background that would be nice to include. (It just occurred to me one day that cursors could probably tessellate, and they can! I don't know if this was even inspired by work on Tracky Mouse... I think I was thinking of building a library to customize cursors, since I came up with a technique to override cursors while still supporting cursor: auto
in CSS; I was gonna call it cursormize.js; might do that at some point, if I come across a website hurting accessibility by including custom cursors.)
The screen overlay and thus dwell clicking indicators go behind, on Ubuntu 22, the launcher's window picker (when switching between two or more windows), as well as the Activities screen, and any context menus.
It can be quite jarring when it clicks suddenly without any indication that it's going to (whereas normally it gives you a fair warning.)
I'm already setting the window to always on top and specifying the highest level available, "screensaver", but I could try periodically bringing the window to the front (without taking focus), or research system APIs to do this.
The screen overlay could show icons representing mouse control and head control, near the mouse (or centered?), when control switches — when moving the mouse manually to regain manual control, or when it switches back after a timeout.
I have made these icons, although I'm not happy with them yet:
It would also be good to signal whether it is paused temporarily or stopped.
The app is supposed to pause/resume when you press F9, globally.
F9 works as a global shortcut in Serenade (another Electron app), if I configure it, so it should be able to work.
And it doesn't get an error when registering the shortcut...
There is a feature for regaining mouse control (pausing the head tracker temporarily) by just moving the mouse normally.
You don't want to interrupt head tracking by accident, so there's a threshold of movement before the mouse will take control from the head tracker.
However, I'm noticing it pausing randomly, perhaps especially when moving the cursor significantly with my head, on Windows 11.
Possible solutions:
setMouseLocation
promise is resolved?setMouseLocation
resolves, if and only if it's guaranteed that getMouseLocation
will return the new position at that point.Also, this could help:
getMouseLocation
results, rather than a queue of setMouseLocation
requests.Related:
The virtual cursor from the head tracking system (currently a red dot, all too similar to the dwell clicking indicators) is visible before starting movement. It should be hidden.
Main call-to-action sections should be at the top: Add to your project, Install desktop app.
License goes with Add to your project, as that's when you generally care about the license.
Install Desktop App currently points to Development Setup because there's no release yet... doesn't need to be contiguous but it is right now.
Libraries Used goes with Software Architecture, and could be merged.
Todo could be removed, if I record everything in Issues.
Restore window position when restarting the app. There are prebuilt solutions for this.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.