samizdapp / herakles Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 1.0 22.13 MB

slaying the hydra of IP/NAT/DNS/TLS/AppStore

Shell 56.17% Dockerfile 13.93% Elixir 14.44% Mustache 15.46%

herakles's People

Contributors

Stargazers

Watchers

Forkers

joshuacwebdeveloper

herakles's Issues

Document setup of Android and iOS clients using Chrome/Brave and Safari (respectively)

There is a pathway of using the WiFi hotspot that doesn't require the installation of the app for Android.

Can't connect via Relays

For the past several days, connections from client to box over relays have generally not been working.

#59 Complicates understanding this issue fully.

Service worker doesn't update if registration URL not accessible

Problem

The browser attempts to update the service worker by fetching the script url. This url will be the original url of the script at the time of registration.

The fetch request will not be handled by the service worker:

Set request’s service-workers mode to "none".

https://w3c.github.io/ServiceWorker/#update-algorithm

This means that unless the browser can directly access the script via the original URL, no update will happen.

Solution

Create a bootloader service worker that gets updated rarely. It fetches the full worker script and loads and executes it.

Design Criteria

Should work over relays if necessary
Come up with fallback for no libp2p connection (i.e. do relative url)
Make sure both install and update flows are accounted for
Come up with some sort of caching solution
- Maybe stale while revalidate, 24 hour limit
Methods for eval:
- eval() probably won't work with such a large bundle
- Can explore Function constructor
- Explore creating a Web Worker thread

Some of the work in the planned status update concerning worker versions should be implemented as a part of this solution (assuming this gets done before the status update).

How to (Currently) Update the Service Worker

The steps for updating the service worker are more or less the same as the steps for installing the client.

These steps will likely still apply assuming we go with the above solution. They will also have to be done in order to migrate over to the above solution.

Chrome/Brave on Desktop

The worker should automatically update following normal worker update rules.

Chrome/Brave on Android

For Android versions less than 12

Ensure your box is connected via ethernet (required).
Restart your box by powering it off and on again.
Connect your device to your box's WiFi network:
i. In your device's list of available WiFi networks, select the network named "SamizdApp".
ii. Enter the security key: samizdapp.
If you unset the #unsafely-treat-insecure-origin-as-secure flag, you'll need to set it again.
Navigate to http://samizdapp.local/smz/pwa. Refresh the page.
Wait for the message saying "Install the PWA to continue."

For all other Android versions

The worker should automatically update following normal worker update rules.

Safari on iOS

Use the Shortcut again?

SAM-59 Balena's dashboard is not an effective logging solution

Metadata:

Original Estimate: 
Priority: Low
Epic: SAM-3 Improve logging

Don't attempt to cache 206 responses in static asset cache

Uncaught (in promise) TypeError: Failed to execute 'put' on 'Cache': Partial response (status code 206) is unsupported
    at eval (worker-app.js:2:931542)

Nodes announce local network addresses

#59 Reports 4 big inefficiencies with our libp2p network. Problem number 1) is that libp2p attempts to connect to a bunch of addresses that don't exist rather than attempting to connect to the addresses in the bootstrap list that it is given.

This problem seems sporadic, but I think I have tracked it down to each libp2p node announcing these no-op local addresses to the network. The client receives these announced addresses from each node in the network and attempts to use them to connect to the node.

Nodes should not announce addresses that they can't be reached by. I believe the way to prevent this is to specify an announceFilter: libp2p/js-libp2p#769 (specifying the announce config only adds addresses, not subtracts them).

Verify that welcome wagon URL exists before redirecting

Currently, if the URL doesn't exist, Pleroma will redirect to a 404.

Uploading files sometimes severs the client's connection to the box

Sometimes, when attempting to upload a file to Pleroma, the upload will not happen (stays pending with no progress).

At this point, the client's connection to the box will also fail (home page will show offline). One time, it stayed offline for awhile, the other time it recovered quickly.

This has not been debugged yet.

Service worker silently fails with no error if origin is not trusted

If http://samizdapp.local is not added to the insecure origin browser flag (or if the flag is disabled), then the service worker will fail to install with no error either in the UI or in the browser console. (the loading gif just keeps spinning).

Ideally, we'd be able to detect when the origin isn't being treated as secure and broadcast a message in the UI.

At the very least, we should log the status of the service worker (whether it is installed or not) and log an error if it doesn't get installed by the browser.

Set libp2p minimum connections to `2`

A minimum of 0 disables autodialing.

Client does not retry failed connection attempts to the box

Currently, if something is wrong with the box and the client can't connect, it won't keep retrying. i.e. once the box has recovered, the worker will need to be restarted.

The libp2p client should retry failed connections on an interval until a connection can be reestablished.

Link previews don't work

In the soapbox UI, when a link is posted, an attempt is made to show a preview of it, but the loading skeleton ultimately goes away and no preview is shown.

Handle UPnP Error 402 (Full routing table)

For users who run a lot of UPnP applications (i.e. video games), it will be common for the UPnP table in their router to be full.

When this happens, a 402 will often be returned.

Handle 402 errors by checking for many UPnP entries (past some threshold?) and removing either some or all of them.

This Perl script deletes multiple entries by iterating over the list and deleting each one individually: https://www.howtoforge.com/administrating-your-gateway-device-via-upnp#igdctlpl

I couldn't find a best practice on this; it is possible that clearing the entire table is the preferred method so that any applications that lose their entries are more likely to recover.

SAM-61 Configure Github environments and combine build and deploy into single workflow

Metadata:

Original Estimate: 
Priority: Medium
Epic: SAM-5 Setup CI/CD, Deployment, and Automated testing

There should be 3 environments:

feature - dev

dev - dev

prod - alpha

Feature and prod will require approval with environment protections. Dev will have no protections.

The new build-deploy workflow will use the configured environments to run deploy.

SAM-97 Develop a working POC of connecting to a libp2p-webrtc server from the service worker

Metadata:

Original Estimate: 
Priority: High
Epic: SAM-98 WebRTC Transport & Secure Bootstrapping

SAM-99 UI App abstraction backbone

Metadata:

Original Estimate: 
Priority: High
Epic: SAM-98 WebRTC Transport & Secure Bootstrapping

Create data structure in PWA to hold new app manifests (app manifests should have a list of components with a type and a specification. The existing manifest.manifest property would be converted into a manifest.components[].type === "docker-compose" && manifest.components[].specification === '{version: "1.0".... }'.

The PWA data structure wouldn’t need to include types for the docker-compose component, but would need to type a new ui component. The ui component specification property should be some HTML that gets rendered.

A new /pwa/apps path should render the app launcher (TODO - not a part of this ticket), and a parameterized /pwa/apps/:id path should just render the app with the given id (the app launcher will launch an app by just redirecting to this path).

Hardcode the manifest for pleroma in a sensible place. It should render an iframe that points to /timelines/fediverse (we can use this method: [https://reactjs.org/docs/react-dom-server.html#rendertostaticmarkup|https://reactjs.org/docs/react-dom-server.html#rendertostaticmarkup|smart-link] to specify the hardcoded iframe using JSX).

SAM-49 Improve status from status service

Metadata:

Original Estimate: 2h
Priority: Medium
Epic: SAM-88 Status Overhaul

Displaying a waiting status set from the frontend until we are able to connect to the status service:

If we are unable to connecting, set an offline status

Send an online status from the service in a 4 minute loop even if there are no incomming statuses

libp2p.relays never pruned

New relays are added to libp2p.relays when received from the p2p_proxy; however, old addresses are never removed from the list.

Bad peer gets added to config by networking-service - causes yggdrasil service to segfault

Somehow, on my dev box, the networking service added a bad yggdrasil config pee:

    "Peers": [
        "tls://51.38.64.12:28395",
        "tcp://172.114.149.250:5008",
        "tcp://98.53.136.200:5000",
        "tcp://98.53.136.200:5001",
        "tcp://24.212.191.150:5000",
        "tcp://100.64.153.14:5000",
        "tcp://73.181.212.115:5000",
        "tcp://67.161.135.149:5000",
        "tcp://45.18.74.227:5000",
        "tcp://118.208.141.64:5000",
        "tcp://;; UDP setup with 1.0.0.1#53(1.0.0.1) for whoami.cloudflare failed: network unreachable.\n;; UDP setup with 1.0.0.1#53(1.0.0.1) for whoami.cloudflare failed: network unreachable.\n;; UDP setup with 1.0.0.1#53(1.0.0.1) for whoami.cloudflare failed: network unreachable.:5005"
    ],

Manually editing the yggdrasil config file would not resolve the issue as the networking service would simply put the bad value back in.

It isn't clear whether this was a bad value stuck in memory, or whether the networking service was making the fetch node info call every time and getting the same error. Regardless, I don't know why this bad value wasn't sanitized; my only guess is that some sort of race condition was happening, but I don't see a place in the code where a race condition could occur. I couldn't reproduce this locally by manually inserting the error into the yggdrasil config (it would correctly get sanitized).

The network error itself happens here: https://github.com/samizdapp/athena/blob/2a10367cc7124e3983c908a3384432450089e942/packages/networking-service/src/yggdrasil/manager.ts#L36

repair pleroma fork [tech debt]

to make for a better relay experience, we currently have a fork of pleroma that fetches some history every time a box follows another box. The changes are pretty minimal, but do to the fact that we were originally forking the repo wholesale, we've diverged from main substantially.

As we'd like to keep parity with the main pleroma project, we should do some housekeeping here.

Two options:

1 (preferred) submit our changes to pleroma as a PR and get them accepted. The substantive change is something that used to be in pleroma but was removed after running into infinite recursion. Our implementation is constrained in scope so as to remove this problem.

2 rebase against pleroma master branch, keeping only the 20ish lines of code we need, and periodically merge upstream changes.

Proxy websockets

Currently, we are unable to open websocket connections to the box because the proxy only supports HTTP. This causes many features in Pleroma to fail.

Reduce size of gateway_client image

The gateway_client image is over a gigabyte, but it should be much smaller (under 100 Mb). Try deleting node_modules/ at the end of the Dockerfile to see if that takes away most of the size.

Unable to run NextJS locally

I've tried all combinations of the 4 NPM scripts, and I always get this error when opening the app locally:

A bad HTTP response code (404) was received when fetching the script.
localhost/:1 Uncaught (in promise) TypeError: Failed to register a ServiceWorker for scope ('http://localhost:8001/pwa/') with script ('http://localhost:8001/pwa/sw.js'): A bad HTTP response code (404) was received when fetching the script.

SAM-101 Navigating to Pleroma page in frame reloads entire browser page

Metadata:

Original Estimate: 
Priority: Medium
Epic: SAM-98 WebRTC Transport & Secure Bootstrapping

I forget exactly how that feature is implemented, but we only want to navigate in the frame, not the entire page. Figure out how to implement it so that only the frame navigation is affected, with an eye towards modularizing pleroma stuff out of the worker.

SAM-102 Get rid of debug mode gesture

Metadata:

Original Estimate: 
Priority: Medium
Epic: SAM-98 WebRTC Transport & Secure Bootstrapping

There is no longer a need for the debug mode gesture because there will be a toolbar with a status link in it that can take the user to the status dashboard at any time.

The gesture is now unwanted, because if it does get triggered, it will redirect the pleroma iframe to the status dashboard page.

Pleroma links inside pleroma are broken

Inside Pleroma, links to Pleroma posts are broken, they are absolute links that use the domain name of the pleroma server, instead of pointing to the current domain (samizdapp.local) or being relative.

If it is easy to configure this, that could be a fix, but the more robust solution is to probably implement service worker logic that handles these urls.

`/pwa` not redirecting to `/` on start of existing installation of PWA

When the PWA is installed via Chrome or Brave on Windows, it is opened automatically, and it is redirected to /.

However, it the PWA is then closed and then reopened, it will stay on /pwa and not be redirected.

/pwa should always redirect to / when the PWA is started.

Scheduled posts don't work

Scheduled posts in Pleroma never send.

After the scheduled time, they'll still show up under "Scheduled Posts" perpetually with a time left of "Moments remaining"

Box becomes defederated

It is possible for the box to be in a state where it becomes defederated from the SamizdApp network.

This is most evident when photos and videos that other people post fail to load (Pleroma returns 404s) - also when people's cover photos and avatars fail to load. Presumably, posts and likes from you won't be seen by others? But this wasn't tested. I'm assuming that the reason a post with an image will appear in the timeline, but the image fails to load is that Pleroma saved the post to the box but doesn't attempt to load the image until requested. Or maybe posts get copied to the box, but media never gets copied (is always loaded from the source box).

Restarting the box fixed this, but because I had to wait for the Yggdrasil crawl to happen (about 10-20 minutes), I'm not sure if only restarting Yggdrasil or Pleroma would have also fixed it (I didn't wait long enough after restarting the services before restarting the box).

UPnP port goes stale

It has been observed that even if there is a UPnP port open on the router, traffic will not be able to go through it. While restarting the router does fix this problem, restarting the proxy service also fixes the problem, suggesting that there is something that can be done on our side.

Come up with a way to detect stale UPnP ports and recover from them.

Enable notifications from the PWA

When I grant permission to the PWA to show me operating system notifications, none are generated.

This is because we need to implement service worker logic to handle sending out notifications for the PWA.

SAM-100 Refreshing the browser refreshes the whole page, not just the iframe

Metadata:

Original Estimate: 
Priority: High
Epic: SAM-98 WebRTC Transport & Secure Bootstrapping

When I refresh the browser on my laptop, the entire page refreshes, not just the frame.

In most cases, this is actually what we want to likely have happen. However, this presumably also happens when the users swipes down on mobile. Assuming this is the case, we probably want to override that gesture somehow and allow apps to register a onRefreshGesture handler. In the iframe case, it should just reload the iframe.

"the origin is trusted!" message not shown upon successful service worker registration

When setting up the client by navigating to http://samizdapp.local/pwa, the docs instruct the user to look for a message saying "the origin is trusted!". After seeing that message, they can click the install button to install the PWA.

However, no such message is shown; instead, a spinner is shown under the message: "please install PWA to continue reloading false: no error"

SAM-74 Replace references to "relays" with "box addresses" in status UI

Metadata:

Original Estimate: 0.5h
Priority: Medium
Epic: SAM-88 Status Overhaul

SAM-34 Report status for wifi-connect_service

Metadata:

Original Estimate: 3h
Priority: Low
Epic: SAM-88 Status Overhaul

Should give status of both ethernet and wifi. Should probably use metadata for this ([https://joshuacwebdeveloper.atlassian.net/browse/SAM-30|https://joshuacwebdeveloper.atlassian.net/browse/SAM-30|smart-link]).

Default Pleroma UI is being used

The default Pleroma UI is currently being shown on a fresh install in Chrome:

Something is wrong with the installation of soapbox and it isn't correctly overriding the default Pleroma UI.

PWA won't redirect after first being installed if the browser is currently at /smz/ with a trailing slash

SAM-96 Don't show "Install PWA" message until manifest.json has downloaded

Metadata:

Original Estimate: 
Priority: Medium
Epic: SAM-88 Status Overhaul

Sometimes, it takes awhile for my browser to give me the option (or popup message) of installing the PWA (i.e. it allows me to add a shortcut for my desktop instead of a PWA, but will then show me the notification and PWA install option later).

I believe what is happening is that the manifest.json file is being downloaded via the service worker, which may take awhile if the service worker hasn’t connected yet.

Show the Installing… message until the manifest.json has been fully loaded.

Update PWA title to "SamizdApp"

Currently the title is "Next.JS Progressive Web App"

SAM-50 Indicate that a connection is still being attempted when the offline message is shown in the homepage.

Metadata:

Original Estimate: 3h
Priority: Medium
Epic: SAM-88 Status Overhaul

Maybe animate the graphic in a loop.

Additionally, we previously wanted to show which relay connection is currently being attempted in the worker status box. However, with the recent worker refactor, that doesn’t make as much sense because p2p will attempt connections more or less simultaneously (i.e. if a connection fails or reties, all addresses will have already been tried).

Think of some other useful connection attempt status that can be shown an include it.

Caddy often unresponsive

Problem:

Quite often, attempts to load Pleroma will timeout due to requests from the browser receiving no responses from Caddy.

Sometimes, this means that requests to http://setup.local timeout. When this happens, no errors (or logs from anywhere) are generally observed.

Other times, requests to http://setup.local will respond successfully, but requests to the Pleroma API timeout.

Most recently, requests to the Pleroma API timing out were accompanied by these errors:

 yggdrasil  Traceback (most recent call last):
 yggdrasil    File "/crawler/example.py", line 12, in <module>
 yggdrasil      ygg = YggdrasilConnection.fromServer()
 yggdrasil    File "/crawler/yggdrasil_iface.py", line 56, in fromServer
 yggdrasil      return cls(s.AF_INET, (host, port))
 yggdrasil    File "/crawler/yggdrasil_iface.py", line 40, in __init__
 yggdrasil      self.socket.connect(address)
 yggdrasil  ConnectionRefusedError: [Errno 111] Connection refused

 daemon_proxy  caught error TypeError: Cannot read properties of null (reading 'arrayBuffer')
 daemon_proxy      at PocketProxy.handleEvent (/proxy/dist/pocket_proxy.js:168:29)
 daemon_proxy      at runMicrotasks (<anonymous>)
 daemon_proxy      at processTicksAndRejections (node:internal/process/task_queues:96:5)

Usually, this issue will go away after awhile and requests will start responding again. A restart of the box/services doesn't necessarily help.

The overall result of all of this is that the box is not highly available and access to Pleroma is not reliable.

Expectation

Requests made to the box should result in a response. Both Caddy and the Proxy should be able to handle the failure of services they forward requests to.

Uploads severely degrade p2p connection (large files can't be uploaded)

When uploading a file, the p2p connection is severely degraded for the duration of the upload. This results in requests taking 10x - 20x longer to complete. This also results in operations with timeouts (such as ping) to fail. Ping failing will cause the client to reconnect, thereby killing the upload. This effectively means that any file that takes longer than 20 seconds (2 rounds of ping with a 10 second timeout) to upload can never be successfully uploaded

Inefficiencies in client connection to box (can take minutes to connect)

Even if the client is able to connect to the box through the LAN, it can sometimes take over a minute for that connection to actually happen (during which, the user will be shown the online message).

This is due to a few observed inefficiencies:

libp2p attempts to connect to a bunch of addresses that don't exist rather than attempting to connect to the addresses in the bootstrap list that it is given
libp2p connects to the bootstrap addresses in a non-intuitive manner. Connections are attempted on some addresses much more frequently than others, and it can take many minutes for some addresses to even have one connection attempt
Old relay addresses are never pruned (#15), which causes libp2p to spend a bunch of time connecting to them (and worsening problem 2) )
Relay addresses are prioritized over local addresses. So even if the client is on LAN, it will spend a bunch of time trying to connect to relays before attempting to connect directly to the box.

Don't restrict bootstrap app using user-agent checking

When run in a chromium browser that isn't Chrome, this error is displayed:

Use feature detection instead of user-agent checking so that the app is future proof and so that anyone with a supported browser can access the page.

Poor stream controller management

There a couple of issues that come up frequently with the new ReadableStreamDefaultController that the NativeRequestStream uses.

The first is that release() attempts to call close() on a stream that sometimes has already errored out:

native-request.ts:269 Uncaught (in promise) TypeError: Failed to execute 'close' on 'ReadableStreamDefaultController': Cannot close an errored readable stream
    at NativeRequestStream.release (native-request.ts:269:38)
    at NativeRequestStream.receiveResponseEnd (native-request.ts:260:14)
    at NativeRequestStream.receiveChunk (native-request.ts:175:22)
    at NativeRequestStream.initInbox (native-request.ts:156:18)

The second is that a response is attempted to be read into a stream after the stream has closed:

index.ts:54 ERROR [worker/p2p/streams/native-request] receiveResponseBody TypeError: Failed to execute 'enqueue' on 'ReadableStreamDefaultController': Cannot enqueue a chunk into a closed readable stream
    at NativeRequestStream.receiveResponseBody (native-request.ts:251:41)
    at NativeRequestStream.receiveChunk (native-request.ts:171:22)
    at NativeRequestStream.initInbox (native-request.ts:156:18)

@rynomad

SAM-70 Investigate possibility of connection timeout

Metadata:

Original Estimate: 
Priority: Medium
Epic: SAM-62 Refactor service worker

As a part of the [Connection Manager Overhaul|], libp2p’s tendency to have connections time out and fail to deliver messages is discussed.

This could be what is causing some of the issues on iOS. Attempt to reproduce this, and if successful, patch it.

SAM-103 Review existing injectors and clients code

Metadata:

Original Estimate: 
Priority: High
Epic: SAM-98 WebRTC Transport & Secure Bootstrapping

See if being loaded in a frame or being injected into multiple active pages will cause any issues with any of our injectors.

Also, review how we’re using the worker clients to see if controlling multiple clients at the same time will cause any issues. Also, quickly just check and see if there are any unexpected caveats with controlling multiple clients.

SAM-85 Improve yggdrasil logs

Metadata:

Original Estimate: 2h
Priority: Medium
Epic: SAM-88 Status Overhaul

Currently the logic for limiting the number of logs doesn’t work well enough, eventually yggdrasil will start logging dozens of logs.

Don't show success message in PWA until after install button is availble

For example, if the service worker fails to load, the success message is still shown. Don't show it until we're show the user has the install button available.

Can't connect to Wifi hotspot

When I attempt to connect to the wifi hotspot by entering the ssid and passphrase into the UI, the server crashes with the following error:

[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect] Deleting existing WiFi connection to the same network: "SamizdApp"
[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect] Stopping access point 'SamizdApp'...
[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect] [network_manager::dbus_api:ERROR] org.freedesktop.NetworkManager.Settings.Connection::Delete method call failed on /org/freedesktop/NetworkManager/Settings/15
[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect] Stopping access point 'SamizdApp'...
[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect] [network_manager::dbus_api:ERROR] org.freedesktop.NetworkManager.Settings.Connection::Delete method call failed on /org/freedesktop/NetworkManager/Settings/15
[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect] Error: Stopping the access point failed
[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect]   caused by: D-Bus failure: org.freedesktop.NetworkManager.Settings.Connection::Delete method call failed on /org/freedesktop/NetworkManager/Settings/15
[Logs]    [11/1/2022, 8:48:59 AM] [service_wifi-connect]   caused by: "No such interface âorg.freedesktop.NetworkManager.Settings.Connectionâ on object at path /org/freedesktop/NetworkManager/Settings/15"

All subsequent attempts to access the server fail, but the container stays running.

samizdapp / herakles Goto Github PK

herakles's People

Contributors

Stargazers

Watchers

Forkers

herakles's Issues

Problem

Solution

Design Criteria

How to (Currently) Update the Service Worker

Chrome/Brave on Desktop

Chrome/Brave on Android

For Android versions less than 12

For all other Android versions

Safari on iOS

Displaying a waiting status set from the frontend until we are able to connect to the status service:

If we are unable to connecting, set an offline status

Send an online status from the service in a 4 minute loop even if there are no incomming statuses

Problem:

Expectation

Recommend Projects

Recommend Topics

Recommend Org

Jobs