openfun / ralph Goto Github PK

View Code? Open in Web Editor NEW

36.0 11.0 15.0 14.66 MB

:gear: Ralph, the ultimate Learning Record Store (and more!) for your learning analytics

Home Page: https://openfun.github.io/ralph/

License: MIT License

Dockerfile 0.11% Makefile 0.58% Shell 0.10% Python 98.46% Jinja 0.66% Smarty 0.10%

lrs learning-analytics stream-processing python cli k8s docker xapi elasticsearch gelf

ralph's Introduction

Ralph, the ultimate Learning Record Store (and more!) for your learning analytics

Documentation: https://openfun.github.io/ralph

Source Code: https://github.com/openfun/ralph

Ralph is a toolbox for your learning analytics, it can be used as a:

LRS, a HTTP API server to collect xAPI statements (learning events), following the ADL LRS standard
command-line interface (CLI), to build data pipelines the UNIX-way™️,
library, to fetch learning events from various backends, (de)serialize or convert them from and to various standard formats such as xAPI, or openedx html

⚡️ Quick start guide: Run the LRS server

Preliminary notes:

curl, jq and docker compose are required to run some commands of this tutorial. Make sure they are installed first.

In order to run the Elasticsearch backend locally on GNU/Linux operating systems, ensure that your virtual memory limits are not too low and increase them (temporally) if needed by typing this command from your terminal (as root or using sudo): sysctl -w vm.max_map_count=262144

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/master/vm-max-map-count.html

To bootstrap a test environment on your machine, clone this project first and run the bootstrap Makefile target:

make bootstrap

This command will create required .env file (you may want to edit it for your test environment), build the Ralph's Docker image and start a single node Elasticsearch cluster via Docker compose.

You can check the elasticsearch service status using the status helper:

make status # This is an alias for: docker compose ps

You may now start the LRS server using:

make run

The server should be up and running at http://localhost:8100. You can check its status using the heartbeat probe:

curl http://localhost:8100/__heartbeat__

The expected answer should be:

{"database":"ok"}

If the database status is satisfying, you are now ready to send xAPI statements to the LRS:

gunzip -c data/statements.json.gz | \
head -n 100 | \
jq -s . | \
curl -Lk \
    --user ralph:secret \
    -X POST \
    -H "Content-Type: application/json" \
    -d @- \
    http://localhost:8100/xAPI/statements/

The command above fetches one hundred (100) example xAPI statements from our Potsie project and sends them to the LRS using curl.

You can get them back from the LRS using curl to query the /xAPI/statements/ endpoint:

curl -s \
    --user ralph:secret \
    -H "Content-Type: application/json" \
    http://localhost:8100/xAPI/statements/ \ |
jq

Note that using jq is optional in this case, it is used to improve response readability. It is not required to install it to run this snippet.

⚡️ Quick start guide: Manipulate data with the CLI

With the Docker image

Ralph is distributed as a Docker image. If Docker is installed on your machine, it can be pulled from DockerHub:

docker run --pull always --rm fundocker/ralph:latest ralph --help

With the Python package

Ralph is distributed as a standard python package; it can be installed via pip or any other python package manager (e.g. Poetry, Pipenv, etc.):

# Install the full package
pip install \
    ralph-malph[full]

# Install only the core package (library usage without backends, CLI and LRS)
pip install ralph-malph

If you installed the full package (including the CLI, LRS and supported backends), the ralph command should be available in your PATH. Try to invoke the program usage thanks to the --help flag:

ralph --help

You should see a list of available commands and global flags for ralph. Note that each command has its own usage that can be invoked via:

ralph COMMAND --help

You should substitute COMMAND by the target command, e.g. list, to see its usage.

Migrating

Some major version changes require updating persistence layers. Check out the migration guide for more information.

Contributing

This project is intended to be community-driven, so please, do not hesitate to get in touch if you have any question related to our implementation or design decisions.

We try to raise our code quality standards and expect contributors to follow the recommendations from our handbook.

Useful commands

You can explore all available rules using:

make help

but here are some of them:

Bootstrap the project: make bootstrap
Run tests: make test
Run all linters: make lint
If you add new dependencies to the project, you will have to rebuild the Docker image (and the development environment): make down && make bootstrap

License

This work is released under the MIT License (see LICENSE).

ralph's People

Contributors

Stargazers

Watchers

Forkers

sergiosim jksrc p-bizouard wilbrdt victorverbeke bmtcril aitkarrafarid smadjid insad-elearning pomegranited jamopg kallepronk rprin arkosi27 inokufu

ralph's Issues

Allow custom logging configuration

Purpose

Ralph logs require a careful attention. We need a way to configure loggers with custom handlers and formatters like logging_ldp.

Proposal

This is more or less related to #14. Maybe a plugins architecture is the way to go for this kind of integrations.

Purpose

OVH Swift object storage is widely used. It's crucial for us to support it.

Proposal

add a new swift storage backend

Purpose

We need a way to convert learning analytics logs from various to various formats.

Proposal

Add base support for standards from the learning community:

xAPI
EdX tracking logs

Example usage:

$ ralph fetch -b ldp e8ecbb69-ec1a-4d20-b597-2cfd75a8f12b | \
  ralph extract -p gelf | \
  ralph convert --from edx --to xapi \
    > e8ecbb69-ec1a-4d20-b597-2cfd75a8f12b.json

Purpose

We need to add a headless HTTP API server to Ralph that implements the LRS specification. This server should be fully LRS-compliant to ensure interoperability over a collection of trusted sovereign LRSs.

Proposal

✅ The LRS server should be implemented using the FastAPI framework for its performances and its integration with Pydantic, a library extensively used in this project.

✅ The LRS server should be started using a new serve command. This command will be a wrapper for the gunicorn ad hoc command (see: https://www.uvicorn.org/deployment/#gunicorn).

✅ One should ensure the LRS compliance to this implementation using the LRS conformance requirements reference.

💡 A good start might be to write a reference LRS spec using OpenAPI 3.x format and use this document in a TDD perspective using Dredd (or any other open source equivalent).

Alternatives

1️⃣ Once deployed and publicly accessible, implemented LRS compliance can be tested using the official LRS test server: https://lrstest.adlnet.gov/

2️⃣ Should we handle this feature in a dedicated project using Ralph as a library (dependency)?

Wrong "more" url on the second request or when using query parameters

Bug Report

Expected behavior/code
When making a GET request to the "/xAPI/statements" endpoint, a new endpoint for the next "page" of statements is received via the "more" key on the response.

Making a second GET request to this new endpoint, should return a new response with another valid endpoint in the "more" key.

Actual Behavior
An invalid endpoint in the "more" key is returned, it seems to be missing the ?.

Steps to Reproduce

Make a simple request to the /xAPI/statements endpoint:

curl -s \
    --user ralph:secret \
    -H "Content-Type: application/json" \
    http://localhost:8100/xAPI/statements/ \ | jq

The response contains a "more" key:

"more": "/xAPI/statements?pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150"

Make a new request to the previous returned endpoint:

curl -s \
    --user ralph:secret \
    -H "Content-Type: application/json" \
    "http://localhost:8100/xAPI/statements?pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150" \ | jq

The "more" key on the response is as following (notice the lack of a "?" character on "statementspit_id")

"more": "/xAPI/statementspit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150&pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1637921586498|32"

Making a request to this url without adding the "?" character returns:

{
  "detail": "Not Found"
}

Environment

Ralph version: 3.5.1
Platform: kubernetes

Dependency Dashboard

This issue provides visibility into Renovate updates and their statuses. Learn more

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

⬆️(dependencies) update python dependencies (Faker, hypothesis, ipython, mkdocs-material, pyfakefs, pytest)

Check this box to trigger a request for Renovate to run again on this repository

Add xAPI statements (de)serialisers for Marsha and Ashley events

Purpose

Ralph should be able to read and write xAPI statements to move format from/to various standards.

Proposal

Define Pydantic models for known/documented xAPI event types.

Plugin management system for Ralph

Purpose

Reference on @jmaupetit issue #219
Inpiration from commit openfun/warren/pull/7

Proposal and Questions

Each backend is a plugin
Each plugin has it own pyproject.toml
Core directory with pyproject.toml
Find a way to manage update of core with update of plugins
Use of importlib (importlib_metadata or importlib.metadata depending on python version)
Install plugin through pip
New version pattern for plugins
Which entry points ?
fs => builtin per default ?
When register plugins => check performances

Next Version

Use of Cookiecutter to bootsrap project efficiently ?

Still editing...

Improve Sentry integration for transactions

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
When tracking Ralph LRS performances using Sentry transactions, reports are polluted with health check transactions.

Describe the solution you'd like
We might consider ignoring health check routes in transaction reports (should be configurable via a feature flag in Ralph's configuration).

Discovery, Documentation, Adoption, Migration Strategy
Transaction filtering for particular routes is described in Sentry's documentation.

Do you want to work on it through a Pull Request?
Of course!

Add push command

Purpose

Ralph should be able to write to storage backends.

Proposal

Implement a generic push command that takes a storage backend (and its configuration) as arguments and streams the standard input to this backend.

Make backends less CLI-driven

Purpose

Backends have been developed in a CLI-perspective with data I/O inherited from UNIX standard streams. Now that Ralph embeds a LRS server, we need to make them more generic/usable as a library.

Proposal

use file-like objects instead of hard-coded streams for I/O
add a query method for database backends
ease database client instantiation / configuration

Need to disable compression to use OVH LDP's WebSocket

Bug Report

Expected behavior/code
Using the ws backend with OVH LDP's websocket as ws_uri parameter should output live logs

Actual Behavior
Fatal error :

Traceback (most recent call last):
  File "/usr/local/bin/ralph", line 33, in <module>
    sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
  File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/local/lib/python3.9/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/app/src/ralph/__main__.py", line 24, in <module>
    cli.cli()
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/src/ralph/cli.py", line 314, in fetch
    backend.stream()
  File "/app/src/ralph/backends/stream/ws.py", line 38, in stream
    asyncio.get_event_loop().run_until_complete(_stream())
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/app/src/ralph/backends/stream/ws.py", line 34, in _stream
    async with websockets.connect(self.uri) as websocket:
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 632, in __aenter__
    return await self
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 649, in __await_impl_timeout__
    return await asyncio.wait_for(self.__await_impl__(), self.open_timeout)
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 481, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 660, in __await_impl__
    await protocol.handshake(
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 331, in handshake
    self.extensions = self.process_extensions(
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 220, in process_extensions
    raise NegotiationError(
websockets.exceptions.NegotiationError: Unsupported extension: name = permessage-deflate, params = []
ERROR: 1

Steps to Reproduce

./bin/ralph fetch -b ws --ws-uri "wss://gra3.logs.ovh.com/tail/?tk=xxxxxxx" -c 1

Environment

Ralph version: master
Platform: Linux 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 GNU/Linux

Possible Solution

# src/ralph/backends/stream/ws.py:27
    def stream(self):
        """Stream websocket content to stdout."""
        # pylint: disable=no-member

        logger.debug("Streaming from websocket uri: %s", self.uri)

        async def _stream():
            async with websockets.connect(self.uri, compression=None) as websocket:
#                                                      ^ Add compression=None
                while event := await websocket.recv():
                    sys.stdout.buffer.write(bytes(f"{event}" + "\n", encoding="utf-8"))

        asyncio.get_event_loop().run_until_complete(_stream())

We should pass a client_options parameter like we do in database backend to be able to send compression=None to the connect method.

Should compression=None be a default option ?

Evaluate xAPI compliance coverage

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
Currently, there is no obvious way to know which parts of the xAPI specification are covered by Ralph. There is no flagging of tests that cover mandatory specifications. Having this knowledge could help guide future developments or pinpoint limits to the current tool.

Describe the solution you'd like
It would be interesting to have a tool to evaluate which parts of the specifications are covered. Perhaps the LRS Test Suite could be used. Another solution would be to create a dedicated battery of tests.

Add Arnold tray

Purpose

Arnold will soon support an experimental way to package deployable application similarly to Helms for k8s. It's a great opportunity to add a tested tray for Ralph.

Proposal

add tray's openshift objects (customizable cronjob + secret)
update the CI to test tray's deployment

xAPI Validation error when `result.extensions` is specified

Bug Report

Expected behavior/code
When executing command validate on a (known-to-be-correct) xAPI statement, Ralph should return
Validating 1 events (ignore_errors=0 | fail-on-unknown=0

Actual Behavior
When executing command validate on a (known-to-be-correct) xAPI statement containing result extensions, Ralph returns a BadFormatException:

2023-01-17 14:25:58,440 INFO     ralph.cli Validating xapi events (ignore_errors=False | fail-on-unknown=False)
2023-01-17 14:25:58,650 ERROR    ralph.models.validator Input event is not a valid VideoSeeked event.
Traceback (most recent call last):
  File "/app/src/ralph/models/validator.py", line 30, in validate
    yield self._validate_event(event_str)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/ralph/models/validator.py", line 77, in _validate_event
    return self.get_first_valid_model(event).json()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/ralph/models/validator.py", line 62, in get_first_valid_model
    raise error
  File "/app/src/ralph/models/validator.py", line 58, in get_first_valid_model
    return model(**event)
           ^^^^^^^^^^^^^^
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 4 validation errors for VideoSeeked
result -> extensions -> https://w3id.org/xapi/video/extensions/length
  extra fields not permitted (type=value_error.extra)
result -> extensions -> https://w3id.org/xapi/video/extensions/played-segments
  extra fields not permitted (type=value_error.extra)
result -> extensions -> https://w3id.org/xapi/video/extensions/progress
  extra fields not permitted (type=value_error.extra)
@timestamp
  extra fields not permitted (type=value_error.extra)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/ralph", line 33, in <module>
    sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
    return next(matches).load()
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/metadata/__init__.py", line 202, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/src/ralph/__main__.py", line 23, in <module>
    cli.cli()
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/ralph/cli.py", line 347, in validate
    for event in validator.validate(sys.stdin, ignore_errors, fail_on_unknown):
  File "/app/src/ralph/models/validator.py", line 45, in validate
    raise BadFormatException(message) from err
ralph.exceptions.BadFormatException: Input event is not a valid VideoSeeked event.

Steps to Reproduce

curl -sL https://github.com/openfun/potsie/raw/main/fixtures/elasticsearch/lrs.json.gz | \
gunzip | \
head -n 1 | \
bin/ralph validate -f xapi

Environment

Ralph version: 3.1.0
Platform: Linux

Implement UUID edx events

Purpose

The management of edx events implies to associate them an identifier. Indeed, when reimporting events into elasticsearch, they appear as duplicates because they are considered as different events. Adding an identifier allows you to rewrite the events with the modifications of the reimport.

Proposal

The objective is to implement a UUID calculation that remains constant for each event, i.e., the identifier must allow the event to be found when a new version of it is imported.

The UUID must be native to edx to remain independent of the implementation of the validation models

Add support for the AWS Glacier backend

Purpose

A few cloud providers propose an Amazon Glacier-compatible cold storage service. It seems required to add support for this backend in Ralph.

Proposal

Implement a new glacier backend using the boto library. In a first implementation, we can consider this backend as write-only.

Remove pandas dependency

Purpose

Pandas was introduced as a Ralph dependency for preliminary performance assessment and is now only used in the GELF parser. We think it's not worth it to have such dependency for a small use case.

Proposal

remove pandas dependency
use the json.loads from the standard library instead in the GELF parser

Write tutorial documentation for Ralph open-source usage

Purpose

Ralph's first use is easier with a tutorial with given functional data. It gives a lot of additional information that could be usually given in question to the FUN team.

Proposal

Provide simple test data to test all ralph's commands and write a tutorial for a first use with this data.

Create a bunch of archives overlaying all use cases:

Write a workflow that covers all the ralph's command (to test each one of them and to understand their utility):

Implement Authority as a group (user | client)

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
A permission/authority mechanism is currently being implemented in this PR in link with #288.

In the future, Ralph LRS could be deployed on a large scale to serve multiple organizations (eg. universities and platforms). For practical reasons, as well as RGPD compliance, it will be necessary to include a mechanism in which:

an LRS admin can access all the data
an admin for a platform (eg. moodle) can access all the data written by this platform
a user (eg. student) can access all statements it has produced across all platforms

Describe the solution you'd like

This seems like a job for OAuth (a user authorizes a client to write data on their behalf). The proposed solution would be to write all statements with Authority as a group, containing both the "client" and "user". Authority would resemble:

"authority": {
	"objectType" : "Group",
	"member": [
		{
			"account": {
				"homePage":"http://example.com/xAPI/OAuth/Token",
				"name":"oauth_consumer_x75db"
			}
		},
		{ 
			"mbox":"mailto:[email protected]" 
		}
	]
}

OAuth seems to be a necessity as the spec only allows grouped Authority in this situation.

Keep project dependencies up-to-date

Purpose

We need to keep project dependencies up-to-date to ease it's maintenance.

Proposal

configure pyup

LRS should return `200 OK` to the GET statements when no statement present

Bug Report

Expected behavior/code
As specified here, when the LRS contains no statements, the LRS should still return 200 OK to a GET Statements (with an empty array of statements).

Actual Behavior
LRS returns a 500 Internal Server Error.

Steps to Reproduce

bin/ralph runserver -b es
http -a janedoe:supersecret :8100/xAPI/statements

Environment

Ralph version: 3.5.0
Platform: -

Correct docstring typos with endpoint for ralph documentation

Bug Report

Expected behavior/code
In Ralph documentation, command and models description sentences should end with a point.

Actual Behavior
Some of them are currently missing endpoints.

Steps to Reproduce

Read command page from Ralph documentation
Read models page from the upper documentation
Compare that some command and models are missing endpoints

Environment

Ralph version: 1.1.0

Possible Solution
Recheck all docstrings and add enpoints if missing

`make bootstrap` is failing due to missing build dependences

Bug Report

Hi all, I'm excited to get started developing with Ralph. Getting set up I ran into an issue, however. On the master branch make bootstrap errors out pip installing psutil due to gcc not being available.

Expected behavior/code
make bootstrap builds the docker images cleanly.

Actual Behavior
An error occurs:

#0 22.22       creating build/temp.linux-aarch64-cpython-39/psutil
#0 22.22       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_SIZEOF_PID_T=4 -DPSUTIL_VERSION=594 -DPy_LIMITED_API=0x03060000 -DPSUTIL_LINUX=1 -DPSUTIL_ETHTOOL_MISSING_TYPES=1 -I/usr/local/include/python3.9 -c psutil/_psutil_common.c -o build/temp.linux-aarch64-cpython-39/psutil/_psutil_common.o
#0 22.22       C compiler or Python headers are not installed on this system. Try to run:
#0 22.22       sudo apt-get install gcc python3-dev
#0 22.22       error: command 'gcc' failed: No such file or directory

Steps to Reproduce

Sync the Ralph master branch
run make bootstrap
The above error occurs

Environment

Ralph version: master branch
Platform: Docker Desktop on Mac OS 13.0.1

Possible Solution
I was able to get a successful build by adding the required development tools, see the PR here: bmtcril#1

Use pydantic for settings management

Purpose

The settings management in our project is mainly home made and in some cases, it can be messy. We should find a cleaner solution.

Proposal

As we are using pydantic in our projects, why not using it for settings management? It is one of its main application and it would be meaningful for us to enlarge its use to this scope

Validate StatementParameters

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
The StatementParameters class was defined as a dataclass to avoid double validation as it was intended to be populated with fields already validated by FastAPI.
However, if it's used in a context outside of the LRS, (e.g. library usage) - no validation is applied which is undesirable.

Describe the solution you'd like
We would like to replace the StatementsParameters dataclass with a StatementsParameters pydantic model to ensure field validation.
We could use the pydantic construct() method in our LRS API to avoid double validation.

Describe alternatives you've considered
We could support both (dataclass and pydantic model), however, this would be redundant as in both contexts (LRS/library usage) a pydantic model could achieve the objective.

Discovery, Documentation, Adoption, Migration Strategy
The change should be backward compatible.

Do you want to work on it through a Pull Request?
This change is included in the backends unification pull request. #228

Thanks to @jmaupetit for spotting this issue and providing the solution)

Add authorization scope mechanism

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
With the addition of the OpenId Connect authentication into Ralph, we should, as mentionned by the xAPI spec, implement scopes.

Describe the solution you'd like
In a first implementation, we could implement the following scopes:

all/read
all
statements/write
statements/read
statements/read/mine

Discovery, Documentation, Adoption, Migration Strategy
We will probably have to add an Authority mechanism, necessary for the statements/read/mine scope.

Add support for custom S3 endpoint

Feature Request

Add support to read the tracking logs from any S3 compatible service.

We at NAU use Ceph S3. The default Tutor installation uses the MinIO to store the files. For example some installations could use that MinIO installation to store the tracking logs, or use other service that provides the same interface.

Is your feature request related to a problem or unsupported use case? Please describe.
Add a configuration that allow to change the default AWS S3 endpoint URL to any endpoint URL.

Describe the solution you'd like
Change the backend S3 code with a new optional configuration that allow to change the boto3 endpoint_url.

Describe alternatives you've considered
Any.

Discovery, Documentation, Adoption, Migration Strategy
Add a sub-section in the docs to reference that you can use any S3 compatible service, like Ceph S3 or MinIO.

Do you want to work on it through a Pull Request?

Yes, me directly, or someone from NAU team.

Failing to set client_options for ElasticSearch backend

Bug Report

Expected behavior/code
Using the --es-client-options to set a new CA certificate for ElasticSearch backend should not raise an error.

Actual Behavior
Fatal error :

2023-01-06 18:32:04,775 INFO     ralph.cli Running API server on 0.0.0.0:8100 with es backend
2023-01-06 18:32:04,776 INFO     ralph.cli Do not use runserver in production - start production servers through a process manager such as gunicorn/supervisor/circus.
INFO:     Will watch for changes in these directories: ['/app']
INFO:     Loading environment from '/tmp/tmp6uie9pai'
INFO:     Uvicorn running on http://0.0.0.0:8100 (Press CTRL+C to quit)
INFO:     Started reloader process [1] using WatchFiles
Traceback (most recent call last):
  File "/usr/local/bin/ralph", line 33, in <module>
    sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
  File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/local/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/app/src/ralph/__main__.py", line 23, in <module>
    cli.cli()
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/app/src/ralph/cli.py", line 568, in runserver
    uvicorn.run(
  File "/usr/local/lib/python3.9/site-packages/uvicorn/main.py", line 564, in run
    ChangeReload(config, target=server.run, sockets=[sock]).run()
  File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/basereload.py", line 45, in run
    for changes in self:
  File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/basereload.py", line 64, in __next__
    return self.should_restart()
  File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/watchfilesreload.py", line 85, in should_restart
    changes = next(self.watcher)
  File "/usr/local/lib/python3.9/site-packages/watchfiles/main.py", line 119, in watch
    with RustNotify([str(p) for p in paths], debug, force_polling, poll_delay_ms, recursive) as watcher:
FileNotFoundError: Permission denied (os error 13) about ["/app/k3d-storage/2d37968c-2b27-4271-a05d-53e62419dfba"]
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
    target(sockets=sockets)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 60, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 67, in serve
    config.load()
  File "/usr/local/lib/python3.9/site-packages/uvicorn/config.py", line 474, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/app/src/ralph/api/__init__.py", line 5, in <module>
    from ralph.conf import settings
  File "/app/src/ralph/conf.py", line 302, in <module>
    settings = Settings()
  File "pydantic/env_settings.py", line 39, in pydantic.env_settings.BaseSettings.__init__
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Settings
BACKENDS -> DATABASE -> ES -> CLIENT_OPTIONS
  value is not a valid dict (type=type_error.dict)

Steps to Reproduce
bin/ralph runserver -b es --es-client-options ca_certs=toto

Environment

Ralph version: master
Platform: Linux 62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Do not throw 409 errors on duplicate statements in POST

Feature Request

As it stands, if a batch of statements is POSTed to Ralph one one of them has an id that is already stored the entire batch will be rejected. This complicates uses cases of trying to backfill lost or historical data, or in the case of a retry where part of a batch was already processed. The specification is a little unclear on how to handle this case, but I believe we have the flexibility to simply remove duplicate ids from the batch before processing as long as they are not saved: https://github.com/adlnet/xAPI-Spec/blob/1.0.3/xAPI-Communication.md#212-post-statements

Describe the solution you'd like
Instead of throwing a 409, simply remove the duplicate statements from the batch and continue processing. Perhaps an ideal solution would be to include any found duplicates in the response detail.

Describe alternatives you've considered

Changing from batch processing to sending each statement individually so no statements are lost. This is not feasible for large historical loads, however, which may have tens or hundreds of millions of statements.
Having Ralph reject the batch with a list of duplicate IDs so the client can remove them and re-POST. This seems unnecessarily complicated and relies on implementation specific knowledge of the LRS which some clients may not have.

Discovery, Documentation, Adoption, Migration Strategy
I believe this is a valid interpretation of the xAPI specification and shouldn't require any changes from consumers of the API unless they already have a custom implementation to handle the current issues.

Do you want to work on it through a Pull Request?
It should be a fairly small change, I'm happy to write a PR for it if this change is desired.

Make sentry integration optional

Purpose

As Ralph will manipulate sensible data, we should carefully watch jobs execution. Sentry will be a great help to detect failures.

Proposal

We should think of a plugin architecture for Ralph for this kind of integration.

Add edx tracking logs (de)serializers

Purpose

Ralph should be able to read edx tracking logs to convert them to various standards.

Proposal

Define Marshmallow Schemas for known/documented edx tracking log event types.

Integrate pydocstyle linter

Purpose

For now, we do not lint our docstrings, and Black is not opinionated about this. I think we should enforce our numpy docstring style.

Proposal

Integrate pydocstyles using the numpy convention.

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Add ElasticSearch backend

Purpose

Ralph should be able to read/write from ES indexes.

Proposal

Implement a new storage backend with read and write features to ES clusters.

Implement alternate request syntax in API

Feature Request

https://github.com/adlnet/xAPI-Spec/blob/1.0.3/xAPI-Communication.md#13-alternate-request-syntax

The xAPI spec describes an alternate request syntax, where all requests are passed as POST with fields "method" and "content". This can be used to circumvent query string length limitations for GET as well as some cases where PUT is unavailable. Perhaps this should be implemented ?

Describe the solution you'd like

Checking for "method" in the json provided to all POST requests, and decide whether or not the "alternate" syntax is being used.

Add support for the AWS S3 backend

Purpose

Most cloud providers propose an Amazon S3-compatible object storage service. It seems required to add support for this backend in Ralph.

Proposal

Implement a new s3 backend using the boto library.

Log debug informations when starting Ralph with uvicorn

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
There has been several occasions where the first setup of Ralph was a pain caused by environment variables not properly set.
At the moment, when starting Ralph LRS through uvicorn, it is hard to know if it is using the default settings or if it has correctly taken into account the specified settings.

Describe the solution you'd like
Add the possibility to log debug informations at the start of Ralph through uvicorn, similarly to how it's done when starting Ralph LRS through ralph runserver command.

Remove "field" as suffix from class names ?

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
Many classes in code base are named with the suffix "Field" (LaxObjectField, MboxSha1SumActorField(BaseActorField), etc.). I would tend to not include this suffix for two reasons:

as it does not describe the object itself
I find it a bit confusing, and makes the code harder to read

Describe the solution you'd like
I suggest removing these suffixes if possible.

Follow specification for GET /statements query behavior ?

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
The specifiation for GET statements states that agent parameter should behave as such:

Filter, only return Statements for which the specified Agent or Group is the Actor or Object of the Statement.
For the purposes of this filter, Groups that have members which match the specified Agent based on their Inverse Functional Identifier as described above are considered a match

Currently, the agent filter parameter only acts as a filter on the actor field of a statements. Furthermore, the current implementation does not work with groups.

NB: the implementation of agent parameter is marked as optional in the spec.

Describe the solution you'd like
Perhaps we should implement this part of the spec ? There is a little refactoring to be done as all filters currently work as AND, whereas this would require an OR mechanism.

Namespace identifiers in command history

Purpose

The history file identifiers are not tied to a container or bucket, hence the same file name from different buckets can be considered as a previously processed record.

Proposal

Namespacing the file identifier by the bucket (container) name or ID in the history file seems a relevant approach.

Add support for a `lrs` backend to convert and send logs from any app to an LRS

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
An LRS has a specified API, thus, it can be considered as a standard backend for Ralph to pull/push learning events.

Describe the solution you'd like
Implement the lrs backend that should be compatible with i. the pull and push commands, and ii. the standard Ralph backend API.

Describe alternatives you've considered
Using curl or equivalent headless tool to send HTTP requests (httpie, etc.), meaning having a prior knowledge of what a LRS is and its specification.

Discovery, Documentation, Adoption, Migration Strategy
Pushing learning events from foo apps to an LRS:

tail -f /var/log/syslog | \
  grep `foo` | \
  ralph extract --parser syslog | \
  ralph convert --from foo --to xapi | \
  ralph push --backend lrs --backend-root-url=https://lrs.example.org

Note that this example is a brain dump that requires parsers and models that do not exist... yet!

Do you want to work on it through a Pull Request?
Yes! 💪

[Helm chart] feedback from our early adopters

Purpose

As Ralph's Helm chart starts to be tested in various environments, we have collected feedback from our early adopters. This issue is attempts to list required improvements for future releases.

Proposal

instead of using a vault.yaml values file for secrets, we better document how to generate a Secret object for Ralph and use values from this secret in other Ralph secrets.
define elasticsearch, mongodb and clickhouse as optional dependencies (they should be enabled using a value, e.g. ralph.elasticsearch.enabled)
make persistence optional (e.g. as in bitnami charts) ; the operator may want to override it
allow to use an existing PVC
add missing namespace metadata to for hpa and ingress objects
make all object names consistent (use the chart full name the most often)
remove ingress prefix in the object name

Integrate mypy for Python type checking

Purpose

It has been decided that we use typing in all ralph project. We have to integrate a static type checker in our linting toolbox.

Proposal

Integrate mypy in the project (local tooling for development + CI)

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Poor performance of LRS due to HTTP Basic Auth hashing

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.

We are currently having poor performance when making many concurrent requests to Ralph LRS.
As seen here, when load testing Ralph LRS with 1000 concurrent users (each sending one request containing one xAPI statement) the total average response time skyrockets.

After adding a timing middleware, it shows that making a dummy request to Ralph LRS takes ~200ms, the majority of it (~180ms) spend hashing the HTTP Basic Auth password to check user credentials.

When building Ralph LRS, we chose to go with bcrypt for hashing and salting password. bcrypt seems to be the standard for HTTP Basic Auth. It is slow by design to prevent brute force attacks, but induces a large overhead for each request.

Describe the solution you'd like

An OpenId Connect authentication method is currently under development (#262), and it should greatly speed up each request, as it does not require to hash password to check credentials.

Describe alternatives you've considered

Another solution, still being discussed on our side, would be to propose different HTTP Basic Auth backends with different hashing method, so that developers can choose their own performance cost/security level ratio. It would also allow us to compare Ralph LRS to other open source LRSs in a fair way.

Integrate Rich library to make Ralph even cooler :sunglasses:

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
Ralph's CLI outputs need more love. We are in 2023.

Describe the solution you'd like
We would like shiny outputs that emphasizes important information with a clear display.

Describe alternatives you've considered
None for now.

Discovery, Documentation, Adoption, Migration Strategy
The project's documentation: https://github.com/Textualize/rich

Do you want to work on it through a Pull Request?
Oh yeah!

Implement xAPI forum models in `ralph`

Purpose

xAPI forum used in ashley models have to be described in ralph for validation purpose and usage of ralph as a library

Proposal

Define pydantic models associated to each forum learning statement templates
Define selector for each model
Write model and selector tests

Make Ralph more extensible by implementing plugins support

Purpose

Since the 3.0 release, Ralph has many optional dependencies depending on its usage: library (with backends support), CLI or LRS. It's not obvious to make dependency-management straightforward both for project developers and end-users.

Proposal

To simplify the various flavors of the project, as suggested by @sampaccoud in #218, we propose to implement a plugin management system that will simplify project maintenance and extensibility!

References:

Authenticate Docker image pulls in the CI

Purpose

DockerHub recently changed it's bandwidth usage policy and restrict anonymous requests to pull images. As we extensively rely on it in the CI, we need to use our DockerHub account for all DockerHub requests.

Proposal

similarly to what @lunika did on other projects, add login credentials to the docker executor