openfun / ralph Goto Github PK
View Code? Open in Web Editor NEW:gear: Ralph, the ultimate Learning Record Store (and more!) for your learning analytics
Home Page: https://openfun.github.io/ralph/
License: MIT License
:gear: Ralph, the ultimate Learning Record Store (and more!) for your learning analytics
Home Page: https://openfun.github.io/ralph/
License: MIT License
Expected behavior/code
Using the --es-client-options to set a new CA certificate for ElasticSearch backend should not raise an error.
Actual Behavior
Fatal error :
2023-01-06 18:32:04,775 INFO ralph.cli Running API server on 0.0.0.0:8100 with es backend
2023-01-06 18:32:04,776 INFO ralph.cli Do not use runserver in production - start production servers through a process manager such as gunicorn/supervisor/circus.
INFO: Will watch for changes in these directories: ['/app']
INFO: Loading environment from '/tmp/tmp6uie9pai'
INFO: Uvicorn running on http://0.0.0.0:8100 (Press CTRL+C to quit)
INFO: Started reloader process [1] using WatchFiles
Traceback (most recent call last):
File "/usr/local/bin/ralph", line 33, in <module>
sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/local/lib/python3.9/importlib/metadata.py", line 86, in load
module = import_module(match.group('module'))
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/app/src/ralph/__main__.py", line 23, in <module>
cli.cli()
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/app/src/ralph/cli.py", line 568, in runserver
uvicorn.run(
File "/usr/local/lib/python3.9/site-packages/uvicorn/main.py", line 564, in run
ChangeReload(config, target=server.run, sockets=[sock]).run()
File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/basereload.py", line 45, in run
for changes in self:
File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/basereload.py", line 64, in __next__
return self.should_restart()
File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/watchfilesreload.py", line 85, in should_restart
changes = next(self.watcher)
File "/usr/local/lib/python3.9/site-packages/watchfiles/main.py", line 119, in watch
with RustNotify([str(p) for p in paths], debug, force_polling, poll_delay_ms, recursive) as watcher:
FileNotFoundError: Permission denied (os error 13) about ["/app/k3d-storage/2d37968c-2b27-4271-a05d-53e62419dfba"]
Process SpawnProcess-1:
Traceback (most recent call last):
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.9/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
target(sockets=sockets)
File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 60, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 67, in serve
config.load()
File "/usr/local/lib/python3.9/site-packages/uvicorn/config.py", line 474, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.9/site-packages/uvicorn/importer.py", line 21, in import_from_string
module = importlib.import_module(module_str)
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/app/src/ralph/api/__init__.py", line 5, in <module>
from ralph.conf import settings
File "/app/src/ralph/conf.py", line 302, in <module>
settings = Settings()
File "pydantic/env_settings.py", line 39, in pydantic.env_settings.BaseSettings.__init__
File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Settings
BACKENDS -> DATABASE -> ES -> CLIENT_OPTIONS
value is not a valid dict (type=type_error.dict)
Steps to Reproduce
bin/ralph runserver -b es --es-client-options ca_certs=toto
Environment
The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
Pandas was introduced as a Ralph dependency for preliminary performance assessment and is now only used in the GELF parser. We think it's not worth it to have such dependency for a small use case.
json.loads
from the standard library instead in the GELF parserThe bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.
xAPI forum used in ashley
models have to be described in ralph
for validation purpose and usage of ralph
as a library
Expected behavior/code
In Ralph documentation, command and models description sentences should end with a point.
Actual Behavior
Some of them are currently missing endpoints.
Steps to Reproduce
command
page from Ralph documentationmodels
page from the upper documentationcommand
and models
are missing endpointsEnvironment
Possible Solution
Recheck all docstrings and add enpoints if missing
Is your feature request related to a problem or unsupported use case? Please describe.
With the addition of the OpenId Connect authentication into Ralph, we should, as mentionned by the xAPI spec, implement scopes.
Describe the solution you'd like
In a first implementation, we could implement the following scopes:
all/read
all
statements/write
statements/read
statements/read/mine
Discovery, Documentation, Adoption, Migration Strategy
We will probably have to add an Authority mechanism, necessary for the statements/read/mine
scope.
As Ralph's Helm chart starts to be tested in various environments, we have collected feedback from our early adopters. This issue is attempts to list required improvements for future releases.
vault.yaml
values file for secrets, we better document how to generate a Secret
object for Ralph and use values from this secret in other Ralph secrets.ralph.elasticsearch.enabled
)namespace
metadata to for hpa and ingress objectsExpected behavior/code
When executing command validate
on a (known-to-be-correct) xAPI statement, Ralph should return
Validating 1 events (ignore_errors=0 | fail-on-unknown=0
Actual Behavior
When executing command validate
on a (known-to-be-correct) xAPI statement containing result extensions, Ralph returns a BadFormatException:
2023-01-17 14:25:58,440 INFO ralph.cli Validating xapi events (ignore_errors=False | fail-on-unknown=False)
2023-01-17 14:25:58,650 ERROR ralph.models.validator Input event is not a valid VideoSeeked event.
Traceback (most recent call last):
File "/app/src/ralph/models/validator.py", line 30, in validate
yield self._validate_event(event_str)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/src/ralph/models/validator.py", line 77, in _validate_event
return self.get_first_valid_model(event).json()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/src/ralph/models/validator.py", line 62, in get_first_valid_model
raise error
File "/app/src/ralph/models/validator.py", line 58, in get_first_valid_model
return model(**event)
^^^^^^^^^^^^^^
File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 4 validation errors for VideoSeeked
result -> extensions -> https://w3id.org/xapi/video/extensions/length
extra fields not permitted (type=value_error.extra)
result -> extensions -> https://w3id.org/xapi/video/extensions/played-segments
extra fields not permitted (type=value_error.extra)
result -> extensions -> https://w3id.org/xapi/video/extensions/progress
extra fields not permitted (type=value_error.extra)
@timestamp
extra fields not permitted (type=value_error.extra)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/ralph", line 33, in <module>
sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
return next(matches).load()
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/importlib/metadata/__init__.py", line 202, in load
module = import_module(match.group('module'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 940, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/app/src/ralph/__main__.py", line 23, in <module>
cli.cli()
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/src/ralph/cli.py", line 347, in validate
for event in validator.validate(sys.stdin, ignore_errors, fail_on_unknown):
File "/app/src/ralph/models/validator.py", line 45, in validate
raise BadFormatException(message) from err
ralph.exceptions.BadFormatException: Input event is not a valid VideoSeeked event.
Steps to Reproduce
curl -sL https://github.com/openfun/potsie/raw/main/fixtures/elasticsearch/lrs.json.gz | \
gunzip | \
head -n 1 | \
bin/ralph validate -f xapi
Environment
Is your feature request related to a problem or unsupported use case? Please describe.
The StatementParameters
class was defined as a dataclass to avoid double validation as it was intended to be populated with fields already validated by FastAPI.
However, if it's used in a context outside of the LRS, (e.g. library usage) - no validation is applied which is undesirable.
Describe the solution you'd like
We would like to replace the StatementsParameters
dataclass with a StatementsParameters
pydantic model to ensure field validation.
We could use the pydantic construct()
method in our LRS API to avoid double validation.
Describe alternatives you've considered
We could support both (dataclass and pydantic model), however, this would be redundant as in both contexts (LRS/library usage) a pydantic model could achieve the objective.
Discovery, Documentation, Adoption, Migration Strategy
The change should be backward compatible.
Do you want to work on it through a Pull Request?
This change is included in the backends unification pull request. #228
Thanks to @jmaupetit for spotting this issue and providing the solution)
Ralph should be able to read and write xAPI statements to move format from/to various standards.
Define Pydantic models for known/documented xAPI event types.
Hi all, I'm excited to get started developing with Ralph. Getting set up I ran into an issue, however. On the master branch make bootstrap
errors out pip installing psutil due to gcc not being available.
Expected behavior/code
make bootstrap
builds the docker images cleanly.
Actual Behavior
An error occurs:
#0 22.22 creating build/temp.linux-aarch64-cpython-39/psutil
#0 22.22 gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_SIZEOF_PID_T=4 -DPSUTIL_VERSION=594 -DPy_LIMITED_API=0x03060000 -DPSUTIL_LINUX=1 -DPSUTIL_ETHTOOL_MISSING_TYPES=1 -I/usr/local/include/python3.9 -c psutil/_psutil_common.c -o build/temp.linux-aarch64-cpython-39/psutil/_psutil_common.o
#0 22.22 C compiler or Python headers are not installed on this system. Try to run:
#0 22.22 sudo apt-get install gcc python3-dev
#0 22.22 error: command 'gcc' failed: No such file or directory
Steps to Reproduce
make bootstrap
Environment
Possible Solution
I was able to get a successful build by adding the required development tools, see the PR here: bmtcril#1
We need a way to convert learning analytics logs from various to various formats.
Add base support for standards from the learning community:
Example usage:
$ ralph fetch -b ldp e8ecbb69-ec1a-4d20-b597-2cfd75a8f12b | \
ralph extract -p gelf | \
ralph convert --from edx --to xapi \
> e8ecbb69-ec1a-4d20-b597-2cfd75a8f12b.json
It has been decided that we use typing in all ralph
project. We have to integrate a static type checker in our linting toolbox.
Ralph's first use is easier with a tutorial with given functional data. It gives a lot of additional information that could be usually given in question to the FUN team.
Provide simple test data to test all ralph's commands and write a tutorial for a first use with this data.
Create a bunch of archives overlaying all use cases:
Write a workflow that covers all the ralph
's command (to test each one of them and to understand their utility):
Ralph logs require a careful attention. We need a way to configure loggers with custom handlers and formatters like logging_ldp.
This is more or less related to #14. Maybe a plugins architecture is the way to go for this kind of integrations.
Is your feature request related to a problem or unsupported use case? Please describe.
Many classes in code base are named with the suffix "Field" (LaxObjectField
, MboxSha1SumActorField(BaseActorField)
, etc.). I would tend to not include this suffix for two reasons:
Describe the solution you'd like
I suggest removing these suffixes if possible.
Is your feature request related to a problem or unsupported use case? Please describe.
Ralph's CLI outputs need more love. We are in 2023.
Describe the solution you'd like
We would like shiny outputs that emphasizes important information with a clear display.
Describe alternatives you've considered
None for now.
Discovery, Documentation, Adoption, Migration Strategy
The project's documentation: https://github.com/Textualize/rich
Do you want to work on it through a Pull Request?
Oh yeah!
Add support to read the tracking logs from any S3 compatible service.
We at NAU use Ceph S3. The default Tutor installation uses the MinIO to store the files. For example some installations could use that MinIO installation to store the tracking logs, or use other service that provides the same interface.
Is your feature request related to a problem or unsupported use case? Please describe.
Add a configuration that allow to change the default AWS S3 endpoint URL to any endpoint URL.
Describe the solution you'd like
Change the backend S3 code with a new optional configuration that allow to change the boto3 endpoint_url
.
Describe alternatives you've considered
Any.
Discovery, Documentation, Adoption, Migration Strategy
Add a sub-section in the docs to reference that you can use any S3 compatible service, like Ceph S3 or MinIO.
Do you want to work on it through a Pull Request?
Yes, me directly, or someone from NAU team.
Is your feature request related to a problem or unsupported use case? Please describe.
A permission/authority mechanism is currently being implemented in this PR in link with #288.
In the future, Ralph LRS could be deployed on a large scale to serve multiple organizations (eg. universities and platforms). For practical reasons, as well as RGPD compliance, it will be necessary to include a mechanism in which:
Describe the solution you'd like
This seems like a job for OAuth (a user authorizes a client to write data on their behalf). The proposed solution would be to write all statements with Authority as a group, containing both the "client" and "user". Authority would resemble:
"authority": {
"objectType" : "Group",
"member": [
{
"account": {
"homePage":"http://example.com/xAPI/OAuth/Token",
"name":"oauth_consumer_x75db"
}
},
{
"mbox":"mailto:[email protected]"
}
]
}
OAuth seems to be a necessity as the spec only allows grouped Authority in this situation.
This issue provides visibility into Renovate updates and their statuses. Learn more
These updates have all been created already. Click a checkbox below to force a retry/rebase of any.
Faker
, hypothesis
, ipython
, mkdocs-material
, pyfakefs
, pytest
)Arnold will soon support an experimental way to package deployable application similarly to Helms for k8s. It's a great opportunity to add a tested tray for Ralph.
Is your feature request related to a problem or unsupported use case? Please describe.
Currently, there is no obvious way to know which parts of the xAPI specification are covered by Ralph. There is no flagging of tests that cover mandatory specifications. Having this knowledge could help guide future developments or pinpoint limits to the current tool.
Describe the solution you'd like
It would be interesting to have a tool to evaluate which parts of the specifications are covered. Perhaps the LRS Test Suite could be used. Another solution would be to create a dedicated battery of tests.
Reference on @jmaupetit issue #219
Inpiration from commit openfun/warren/pull/7
Each backend is a plugin
Each plugin has it own pyproject.toml
Core directory with pyproject.toml
Find a way to manage update of core with update of plugins
Use of importlib (importlib_metadata or importlib.metadata depending on python version)
Install plugin through pip
New version pattern for plugins
Which entry points ?
fs => builtin per default ?
When register plugins => check performances
Use of Cookiecutter to bootsrap project efficiently ?
Still editing...
We need to keep project dependencies up-to-date to ease it's maintenance.
As Ralph will manipulate sensible data, we should carefully watch jobs execution. Sentry will be a great help to detect failures.
We should think of a plugin architecture for Ralph for this kind of integration.
DockerHub recently changed it's bandwidth usage policy and restrict anonymous requests to pull images. As we extensively rely on it in the CI, we need to use our DockerHub account for all DockerHub requests.
Expected behavior/code
Using the ws
backend with OVH LDP's websocket as ws_uri
parameter should output live logs
Actual Behavior
Fatal error :
Traceback (most recent call last):
File "/usr/local/bin/ralph", line 33, in <module>
sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
return next(matches).load()
File "/usr/local/lib/python3.9/importlib/metadata.py", line 77, in load
module = import_module(match.group('module'))
File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/app/src/ralph/__main__.py", line 24, in <module>
cli.cli()
File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/app/src/ralph/cli.py", line 314, in fetch
backend.stream()
File "/app/src/ralph/backends/stream/ws.py", line 38, in stream
asyncio.get_event_loop().run_until_complete(_stream())
File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/app/src/ralph/backends/stream/ws.py", line 34, in _stream
async with websockets.connect(self.uri) as websocket:
File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 632, in __aenter__
return await self
File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 649, in __await_impl_timeout__
return await asyncio.wait_for(self.__await_impl__(), self.open_timeout)
File "/usr/local/lib/python3.9/asyncio/tasks.py", line 481, in wait_for
return fut.result()
File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 660, in __await_impl__
await protocol.handshake(
File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 331, in handshake
self.extensions = self.process_extensions(
File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 220, in process_extensions
raise NegotiationError(
websockets.exceptions.NegotiationError: Unsupported extension: name = permessage-deflate, params = []
ERROR: 1
Steps to Reproduce
./bin/ralph fetch -b ws --ws-uri "wss://gra3.logs.ovh.com/tail/?tk=xxxxxxx" -c 1
Environment
Linux 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 GNU/Linux
Possible Solution
# src/ralph/backends/stream/ws.py:27
def stream(self):
"""Stream websocket content to stdout."""
# pylint: disable=no-member
logger.debug("Streaming from websocket uri: %s", self.uri)
async def _stream():
async with websockets.connect(self.uri, compression=None) as websocket:
# ^ Add compression=None
while event := await websocket.recv():
sys.stdout.buffer.write(bytes(f"{event}" + "\n", encoding="utf-8"))
asyncio.get_event_loop().run_until_complete(_stream())
We should pass a client_options
parameter like we do in database backend to be able to send compression=None
to the connect method.
Should compression=None
be a default option ?
Expected behavior/code
When making a GET request to the "/xAPI/statements" endpoint, a new endpoint for the next "page" of statements is received via the "more" key on the response.
Making a second GET request to this new endpoint, should return a new response with another valid endpoint in the "more" key.
Actual Behavior
An invalid endpoint in the "more" key is returned, it seems to be missing the ?
.
Steps to Reproduce
curl -s \
--user ralph:secret \
-H "Content-Type: application/json" \
http://localhost:8100/xAPI/statements/ \ | jq
"more": "/xAPI/statements?pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150"
curl -s \
--user ralph:secret \
-H "Content-Type: application/json" \
"http://localhost:8100/xAPI/statements?pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150" \ | jq
"more": "/xAPI/statementspit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150&pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1637921586498|32"
{
"detail": "Not Found"
}
Environment
Ralph should be able to write to storage backends.
Implement a generic push
command that takes a storage backend (and its configuration) as arguments and streams the standard input to this backend.
Is your feature request related to a problem or unsupported use case? Please describe.
An LRS has a specified API, thus, it can be considered as a standard backend for Ralph to pull/push learning events.
Describe the solution you'd like
Implement the lrs
backend that should be compatible with i. the pull
and push
commands, and ii. the standard Ralph backend API.
Describe alternatives you've considered
Using curl
or equivalent headless tool to send HTTP requests (httpie
, etc.), meaning having a prior knowledge of what a LRS is and its specification.
Discovery, Documentation, Adoption, Migration Strategy
Pushing learning events from foo
apps to an LRS:
tail -f /var/log/syslog | \
grep `foo` | \
ralph extract --parser syslog | \
ralph convert --from foo --to xapi | \
ralph push --backend lrs --backend-root-url=https://lrs.example.org
Note that this example is a brain dump that requires parsers and models that do not exist... yet!
Do you want to work on it through a Pull Request?
Yes! ๐ช
The management of edx
events implies to associate them an identifier. Indeed, when reimporting events into elasticsearch
, they appear as duplicates because they are considered as different events. Adding an identifier allows you to rewrite the events with the modifications of the reimport.
The objective is to implement a UUID calculation that remains constant for each event, i.e., the identifier must allow the event to be found when a new version of it is imported.
The UUID must be native to edx
to remain independent of the implementation of the validation models
Is your feature request related to a problem or unsupported use case? Please describe.
There has been several occasions where the first setup of Ralph was a pain caused by environment variables not properly set.
At the moment, when starting Ralph LRS through uvicorn, it is hard to know if it is using the default settings or if it has correctly taken into account the specified settings.
Describe the solution you'd like
Add the possibility to log debug informations at the start of Ralph through uvicorn, similarly to how it's done when starting Ralph LRS through ralph runserver
command.
Ralph should be able to read edx tracking logs to convert them to various standards.
Define Marshmallow Schemas for known/documented edx tracking log event types.
Ralph should be able to read/write from ES indexes.
Implement a new storage backend with read and write features to ES clusters.
Since the 3.0 release, Ralph has many optional dependencies depending on its usage: library (with backends support), CLI or LRS. It's not obvious to make dependency-management straightforward both for project developers and end-users.
To simplify the various flavors of the project, as suggested by @sampaccoud in #218, we propose to implement a plugin management system that will simplify project maintenance and extensibility!
References:
Expected behavior/code
As specified here, when the LRS contains no statements, the LRS should still return 200 OK
to a GET Statements (with an empty array of statements).
Actual Behavior
LRS returns a 500 Internal Server Error
.
Steps to Reproduce
bin/ralph runserver -b es
http -a janedoe:supersecret :8100/xAPI/statements
Environment
For now, we do not lint our docstrings, and Black is not opinionated about this. I think we should enforce our numpy docstring style.
Integrate pydocstyles using the numpy
convention.
Is your feature request related to a problem or unsupported use case? Please describe.
When tracking Ralph LRS performances using Sentry transactions, reports are polluted with health check transactions.
Describe the solution you'd like
We might consider ignoring health check routes in transaction reports (should be configurable via a feature flag in Ralph's configuration).
Discovery, Documentation, Adoption, Migration Strategy
Transaction filtering for particular routes is described in Sentry's documentation.
Do you want to work on it through a Pull Request?
Of course!
Backends have been developed in a CLI-perspective with data I/O inherited from UNIX standard streams. Now that Ralph embeds a LRS server, we need to make them more generic/usable as a library.
The history file identifiers are not tied to a container or bucket, hence the same file name from different buckets can be considered as a previously processed record.
Namespacing the file identifier by the bucket (container) name or ID in the history file seems a relevant approach.
https://github.com/adlnet/xAPI-Spec/blob/1.0.3/xAPI-Communication.md#13-alternate-request-syntax
The xAPI spec describes an alternate request syntax, where all requests are passed as POST with fields "method"
and "content"
. This can be used to circumvent query string length limitations for GET as well as some cases where PUT is unavailable. Perhaps this should be implemented ?
Describe the solution you'd like
Checking for "method"
in the json provided to all POST requests, and decide whether or not the "alternate" syntax is being used.
As it stands, if a batch of statements is POSTed to Ralph one one of them has an id that is already stored the entire batch will be rejected. This complicates uses cases of trying to backfill lost or historical data, or in the case of a retry where part of a batch was already processed. The specification is a little unclear on how to handle this case, but I believe we have the flexibility to simply remove duplicate ids from the batch before processing as long as they are not saved: https://github.com/adlnet/xAPI-Spec/blob/1.0.3/xAPI-Communication.md#212-post-statements
Describe the solution you'd like
Instead of throwing a 409, simply remove the duplicate statements from the batch and continue processing. Perhaps an ideal solution would be to include any found duplicates in the response detail.
Describe alternatives you've considered
Discovery, Documentation, Adoption, Migration Strategy
I believe this is a valid interpretation of the xAPI specification and shouldn't require any changes from consumers of the API unless they already have a custom implementation to handle the current issues.
Do you want to work on it through a Pull Request?
It should be a fairly small change, I'm happy to write a PR for it if this change is desired.
A few cloud providers propose an Amazon Glacier-compatible cold storage service. It seems required to add support for this backend in Ralph.
Implement a new glacier
backend using the boto
library. In a first implementation, we can consider this backend as write-only.
The settings management in our project is mainly home made and in some cases, it can be messy. We should find a cleaner solution.
As we are using pydantic
in our projects, why not using it for settings management? It is one of its main application and it would be meaningful for us to enlarge its use to this scope
We need to add a headless HTTP API server to Ralph that implements the LRS specification. This server should be fully LRS-compliant to ensure interoperability over a collection of trusted sovereign LRSs.
โ The LRS server should be implemented using the FastAPI framework for its performances and its integration with Pydantic, a library extensively used in this project.
โ
The LRS server should be started using a new serve
command. This command will be a wrapper for the gunicorn
ad hoc command (see: https://www.uvicorn.org/deployment/#gunicorn).
โ One should ensure the LRS compliance to this implementation using the LRS conformance requirements reference.
๐ก A good start might be to write a reference LRS spec using OpenAPI 3.x format and use this document in a TDD perspective using Dredd (or any other open source equivalent).
1๏ธโฃ Once deployed and publicly accessible, implemented LRS compliance can be tested using the official LRS test server: https://lrstest.adlnet.gov/
2๏ธโฃ Should we handle this feature in a dedicated project using Ralph as a library (dependency)?
Is your feature request related to a problem or unsupported use case? Please describe.
We are currently having poor performance when making many concurrent requests to Ralph LRS.
As seen here, when load testing Ralph LRS with 1000 concurrent users (each sending one request containing one xAPI statement) the total average response time skyrockets.
After adding a timing middleware, it shows that making a dummy request to Ralph LRS takes ~200ms, the majority of it (~180ms) spend hashing the HTTP Basic Auth password to check user credentials.
When building Ralph LRS, we chose to go with bcrypt
for hashing and salting password. bcrypt
seems to be the standard for HTTP Basic Auth. It is slow by design to prevent brute force attacks, but induces a large overhead for each request.
Describe the solution you'd like
An OpenId Connect authentication method is currently under development (#262), and it should greatly speed up each request, as it does not require to hash password to check credentials.
Describe alternatives you've considered
Another solution, still being discussed on our side, would be to propose different HTTP Basic Auth backends with different hashing method, so that developers can choose their own performance cost/security level
ratio. It would also allow us to compare Ralph LRS to other open source LRSs in a fair way.
Is your feature request related to a problem or unsupported use case? Please describe.
The specifiation for GET statements states that agent
parameter should behave as such:
Filter, only return Statements for which the specified Agent or Group is the Actor or Object of the Statement.
For the purposes of this filter, Groups that have members which match the specified Agent based on their Inverse Functional Identifier as described above are considered a match
Currently, the agent
filter parameter only acts as a filter on the actor
field of a statements. Furthermore, the current implementation does not work with groups.
NB: the implementation of agent
parameter is marked as optional in the spec.
Describe the solution you'd like
Perhaps we should implement this part of the spec ? There is a little refactoring to be done as all filters currently work as AND
, whereas this would require an OR
mechanism.
OVH Swift object storage is widely used. It's crucial for us to support it.
Most cloud providers propose an Amazon S3-compatible object storage service. It seems required to add support for this backend in Ralph.
Implement a new s3
backend using the boto
library.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.