GithubHelp home page GithubHelp logo

openfun / ralph Goto Github PK

View Code? Open in Web Editor NEW
33.0 10.0 14.0 13.92 MB

:gear: Ralph, the ultimate Learning Record Store (and more!) for your learning analytics

Home Page: https://openfun.github.io/ralph/

License: MIT License

Dockerfile 0.11% Makefile 0.58% Shell 0.10% Python 98.46% Jinja 0.66% Smarty 0.10%
lrs learning-analytics stream-processing python cli k8s docker xapi elasticsearch gelf

ralph's People

Contributors

ardawo avatar bmtcril avatar claudusd avatar jmaupetit avatar lebaudantoine avatar lebrunthibault avatar leobouloc avatar lunika avatar mbenadda avatar p-bizouard avatar pyup-bot avatar quitterie-lcs avatar renovate-bot avatar renovate[bot] avatar rprin avatar sergiosim avatar sifflex avatar waammar avatar wilbrdt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ralph's Issues

Failing to set client_options for ElasticSearch backend

Bug Report

Expected behavior/code
Using the --es-client-options to set a new CA certificate for ElasticSearch backend should not raise an error.

Actual Behavior
Fatal error :

2023-01-06 18:32:04,775 INFO     ralph.cli Running API server on 0.0.0.0:8100 with es backend
2023-01-06 18:32:04,776 INFO     ralph.cli Do not use runserver in production - start production servers through a process manager such as gunicorn/supervisor/circus.
INFO:     Will watch for changes in these directories: ['/app']
INFO:     Loading environment from '/tmp/tmp6uie9pai'
INFO:     Uvicorn running on http://0.0.0.0:8100 (Press CTRL+C to quit)
INFO:     Started reloader process [1] using WatchFiles
Traceback (most recent call last):
  File "/usr/local/bin/ralph", line 33, in <module>
    sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
  File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/local/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/app/src/ralph/__main__.py", line 23, in <module>
    cli.cli()
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/app/src/ralph/cli.py", line 568, in runserver
    uvicorn.run(
  File "/usr/local/lib/python3.9/site-packages/uvicorn/main.py", line 564, in run
    ChangeReload(config, target=server.run, sockets=[sock]).run()
  File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/basereload.py", line 45, in run
    for changes in self:
  File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/basereload.py", line 64, in __next__
    return self.should_restart()
  File "/usr/local/lib/python3.9/site-packages/uvicorn/supervisors/watchfilesreload.py", line 85, in should_restart
    changes = next(self.watcher)
  File "/usr/local/lib/python3.9/site-packages/watchfiles/main.py", line 119, in watch
    with RustNotify([str(p) for p in paths], debug, force_polling, poll_delay_ms, recursive) as watcher:
FileNotFoundError: Permission denied (os error 13) about ["/app/k3d-storage/2d37968c-2b27-4271-a05d-53e62419dfba"]
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/_subprocess.py", line 76, in subprocess_started
    target(sockets=sockets)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 60, in run
    return asyncio.run(self.serve(sockets=sockets))
  File "/usr/local/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.9/site-packages/uvicorn/server.py", line 67, in serve
    config.load()
  File "/usr/local/lib/python3.9/site-packages/uvicorn/config.py", line 474, in load
    self.loaded_app = import_from_string(self.app)
  File "/usr/local/lib/python3.9/site-packages/uvicorn/importer.py", line 21, in import_from_string
    module = importlib.import_module(module_str)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/app/src/ralph/api/__init__.py", line 5, in <module>
    from ralph.conf import settings
  File "/app/src/ralph/conf.py", line 302, in <module>
    settings = Settings()
  File "pydantic/env_settings.py", line 39, in pydantic.env_settings.BaseSettings.__init__
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for Settings
BACKENDS -> DATABASE -> ES -> CLIENT_OPTIONS
  value is not a valid dict (type=type_error.dict)

Steps to Reproduce
bin/ralph runserver -b es --es-client-options ca_certs=toto

Environment

  • Ralph version: master
  • Platform: Linux 62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Remove pandas dependency

Purpose

Pandas was introduced as a Ralph dependency for preliminary performance assessment and is now only used in the GELF parser. We think it's not worth it to have such dependency for a small use case.

Proposal

  • remove pandas dependency
  • use the json.loads from the standard library instead in the GELF parser

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

Implement xAPI forum models in `ralph`

Purpose

xAPI forum used in ashley models have to be described in ralph for validation purpose and usage of ralph as a library

Proposal

  • Define pydantic models associated to each forum learning statement templates
  • Define selector for each model
  • Write model and selector tests

Correct docstring typos with endpoint for ralph documentation

Bug Report

Expected behavior/code
In Ralph documentation, command and models description sentences should end with a point.

Actual Behavior
Some of them are currently missing endpoints.

Steps to Reproduce

  1. Read command page from Ralph documentation
  2. Read models page from the upper documentation
  3. Compare that some command and models are missing endpoints

Environment

  • Ralph version: 1.1.0

Possible Solution
Recheck all docstrings and add enpoints if missing

Add authorization scope mechanism

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
With the addition of the OpenId Connect authentication into Ralph, we should, as mentionned by the xAPI spec, implement scopes.

Describe the solution you'd like
In a first implementation, we could implement the following scopes:

  • all/read
  • all
  • statements/write
  • statements/read
  • statements/read/mine

Discovery, Documentation, Adoption, Migration Strategy
We will probably have to add an Authority mechanism, necessary for the statements/read/mine scope.

[Helm chart] feedback from our early adopters

Purpose

As Ralph's Helm chart starts to be tested in various environments, we have collected feedback from our early adopters. This issue is attempts to list required improvements for future releases.

Proposal

  • instead of using a vault.yaml values file for secrets, we better document how to generate a Secret object for Ralph and use values from this secret in other Ralph secrets.
  • define elasticsearch, mongodb and clickhouse as optional dependencies (they should be enabled using a value, e.g. ralph.elasticsearch.enabled)
  • make persistence optional (e.g. as in bitnami charts) ; the operator may want to override it
  • allow to use an existing PVC
  • add missing namespace metadata to for hpa and ingress objects
  • make all object names consistent (use the chart full name the most often)
  • remove ingress prefix in the object name

xAPI Validation error when `result.extensions` is specified

Bug Report

Expected behavior/code
When executing command validate on a (known-to-be-correct) xAPI statement, Ralph should return
Validating 1 events (ignore_errors=0 | fail-on-unknown=0

Actual Behavior
When executing command validate on a (known-to-be-correct) xAPI statement containing result extensions, Ralph returns a BadFormatException:

2023-01-17 14:25:58,440 INFO     ralph.cli Validating xapi events (ignore_errors=False | fail-on-unknown=False)
2023-01-17 14:25:58,650 ERROR    ralph.models.validator Input event is not a valid VideoSeeked event.
Traceback (most recent call last):
  File "/app/src/ralph/models/validator.py", line 30, in validate
    yield self._validate_event(event_str)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/ralph/models/validator.py", line 77, in _validate_event
    return self.get_first_valid_model(event).json()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/ralph/models/validator.py", line 62, in get_first_valid_model
    raise error
  File "/app/src/ralph/models/validator.py", line 58, in get_first_valid_model
    return model(**event)
           ^^^^^^^^^^^^^^
  File "pydantic/main.py", line 342, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 4 validation errors for VideoSeeked
result -> extensions -> https://w3id.org/xapi/video/extensions/length
  extra fields not permitted (type=value_error.extra)
result -> extensions -> https://w3id.org/xapi/video/extensions/played-segments
  extra fields not permitted (type=value_error.extra)
result -> extensions -> https://w3id.org/xapi/video/extensions/progress
  extra fields not permitted (type=value_error.extra)
@timestamp
  extra fields not permitted (type=value_error.extra)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/bin/ralph", line 33, in <module>
    sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
    return next(matches).load()
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/metadata/__init__.py", line 202, in load
    module = import_module(match.group('module'))
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1206, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/app/src/ralph/__main__.py", line 23, in <module>
    cli.cli()
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/src/ralph/cli.py", line 347, in validate
    for event in validator.validate(sys.stdin, ignore_errors, fail_on_unknown):
  File "/app/src/ralph/models/validator.py", line 45, in validate
    raise BadFormatException(message) from err
ralph.exceptions.BadFormatException: Input event is not a valid VideoSeeked event.

Steps to Reproduce

curl -sL https://github.com/openfun/potsie/raw/main/fixtures/elasticsearch/lrs.json.gz | \
gunzip | \
head -n 1 | \
bin/ralph validate -f xapi

Environment

  • Ralph version: 3.1.0
  • Platform: Linux

Validate StatementParameters

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
The StatementParameters class was defined as a dataclass to avoid double validation as it was intended to be populated with fields already validated by FastAPI.
However, if it's used in a context outside of the LRS, (e.g. library usage) - no validation is applied which is undesirable.

Describe the solution you'd like
We would like to replace the StatementsParameters dataclass with a StatementsParameters pydantic model to ensure field validation.
We could use the pydantic construct() method in our LRS API to avoid double validation.

Describe alternatives you've considered
We could support both (dataclass and pydantic model), however, this would be redundant as in both contexts (LRS/library usage) a pydantic model could achieve the objective.

Discovery, Documentation, Adoption, Migration Strategy
The change should be backward compatible.

Do you want to work on it through a Pull Request?
This change is included in the backends unification pull request. #228

Thanks to @jmaupetit for spotting this issue and providing the solution)

`make bootstrap` is failing due to missing build dependences

Bug Report

Hi all, I'm excited to get started developing with Ralph. Getting set up I ran into an issue, however. On the master branch make bootstrap errors out pip installing psutil due to gcc not being available.

Expected behavior/code
make bootstrap builds the docker images cleanly.

Actual Behavior
An error occurs:

#0 22.22       creating build/temp.linux-aarch64-cpython-39/psutil
#0 22.22       gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -DPSUTIL_POSIX=1 -DPSUTIL_SIZEOF_PID_T=4 -DPSUTIL_VERSION=594 -DPy_LIMITED_API=0x03060000 -DPSUTIL_LINUX=1 -DPSUTIL_ETHTOOL_MISSING_TYPES=1 -I/usr/local/include/python3.9 -c psutil/_psutil_common.c -o build/temp.linux-aarch64-cpython-39/psutil/_psutil_common.o
#0 22.22       C compiler or Python headers are not installed on this system. Try to run:
#0 22.22       sudo apt-get install gcc python3-dev
#0 22.22       error: command 'gcc' failed: No such file or directory

Steps to Reproduce

  1. Sync the Ralph master branch
  2. run make bootstrap
  3. The above error occurs

Environment

  • Ralph version: master branch
  • Platform: Docker Desktop on Mac OS 13.0.1

Possible Solution
I was able to get a successful build by adding the required development tools, see the PR here: bmtcril#1

Add convert command

Purpose

We need a way to convert learning analytics logs from various to various formats.

Proposal

Add base support for standards from the learning community:

  • xAPI
  • EdX tracking logs

Example usage:

$ ralph fetch -b ldp e8ecbb69-ec1a-4d20-b597-2cfd75a8f12b | \
  ralph extract -p gelf | \
  ralph convert --from edx --to xapi \
    > e8ecbb69-ec1a-4d20-b597-2cfd75a8f12b.json

Integrate mypy for Python type checking

Purpose

It has been decided that we use typing in all ralph project. We have to integrate a static type checker in our linting toolbox.

Proposal

  • Integrate mypy in the project (local tooling for development + CI)

Write tutorial documentation for Ralph open-source usage

Purpose

Ralph's first use is easier with a tutorial with given functional data. It gives a lot of additional information that could be usually given in question to the FUN team.

Proposal

Provide simple test data to test all ralph's commands and write a tutorial for a first use with this data.

Create a bunch of archives overlaying all use cases:

  • JSON formatted
  • random text information
  • GELF formatted
  • Non GELF formatted
  • Edx formatted
  • xAPI formatted

Write a workflow that covers all the ralph's command (to test each one of them and to understand their utility):

  • push all archives to the Swift backend
  • list all uploaded archives in the backend
  • fetch archives from a backend
  • extract with GELF parser
  • validate xAPI format
  • validate edX format
  • convert from xAPI to edX
  • convert from edX to xAPI

Allow custom logging configuration

Purpose

Ralph logs require a careful attention. We need a way to configure loggers with custom handlers and formatters like logging_ldp.

Proposal

This is more or less related to #14. Maybe a plugins architecture is the way to go for this kind of integrations.

Remove "field" as suffix from class names ?

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
Many classes in code base are named with the suffix "Field" (LaxObjectField, MboxSha1SumActorField(BaseActorField), etc.). I would tend to not include this suffix for two reasons:

  1. as it does not describe the object itself
  2. I find it a bit confusing, and makes the code harder to read

Describe the solution you'd like
I suggest removing these suffixes if possible.

Integrate Rich library to make Ralph even cooler :sunglasses:

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
Ralph's CLI outputs need more love. We are in 2023.

Describe the solution you'd like
We would like shiny outputs that emphasizes important information with a clear display.

Describe alternatives you've considered
None for now.

Discovery, Documentation, Adoption, Migration Strategy
The project's documentation: https://github.com/Textualize/rich

Do you want to work on it through a Pull Request?
Oh yeah!

Add support for custom S3 endpoint

Feature Request

Add support to read the tracking logs from any S3 compatible service.

We at NAU use Ceph S3. The default Tutor installation uses the MinIO to store the files. For example some installations could use that MinIO installation to store the tracking logs, or use other service that provides the same interface.

Is your feature request related to a problem or unsupported use case? Please describe.
Add a configuration that allow to change the default AWS S3 endpoint URL to any endpoint URL.

Describe the solution you'd like
Change the backend S3 code with a new optional configuration that allow to change the boto3 endpoint_url.

Describe alternatives you've considered
Any.

Discovery, Documentation, Adoption, Migration Strategy
Add a sub-section in the docs to reference that you can use any S3 compatible service, like Ceph S3 or MinIO.

Do you want to work on it through a Pull Request?

Yes, me directly, or someone from NAU team.

Implement Authority as a group (user | client)

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
A permission/authority mechanism is currently being implemented in this PR in link with #288.

In the future, Ralph LRS could be deployed on a large scale to serve multiple organizations (eg. universities and platforms). For practical reasons, as well as RGPD compliance, it will be necessary to include a mechanism in which:

  • an LRS admin can access all the data
  • an admin for a platform (eg. moodle) can access all the data written by this platform
  • a user (eg. student) can access all statements it has produced across all platforms

Describe the solution you'd like

This seems like a job for OAuth (a user authorizes a client to write data on their behalf). The proposed solution would be to write all statements with Authority as a group, containing both the "client" and "user". Authority would resemble:

"authority": {
	"objectType" : "Group",
	"member": [
		{
			"account": {
				"homePage":"http://example.com/xAPI/OAuth/Token",
				"name":"oauth_consumer_x75db"
			}
		},
		{ 
			"mbox":"mailto:[email protected]" 
		}
	]
}

OAuth seems to be a necessity as the spec only allows grouped Authority in this situation.

Add Arnold tray

Purpose

Arnold will soon support an experimental way to package deployable application similarly to Helms for k8s. It's a great opportunity to add a tested tray for Ralph.

Proposal

  • add tray's openshift objects (customizable cronjob + secret)
  • update the CI to test tray's deployment

Evaluate xAPI compliance coverage

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
Currently, there is no obvious way to know which parts of the xAPI specification are covered by Ralph. There is no flagging of tests that cover mandatory specifications. Having this knowledge could help guide future developments or pinpoint limits to the current tool.

Describe the solution you'd like
It would be interesting to have a tool to evaluate which parts of the specifications are covered. Perhaps the LRS Test Suite could be used. Another solution would be to create a dedicated battery of tests.

Plugin management system for Ralph

Purpose

Reference on @jmaupetit issue #219
Inpiration from commit openfun/warren/pull/7

Proposal and Questions

Each backend is a plugin
Each plugin has it own pyproject.toml
Core directory with pyproject.toml
Find a way to manage update of core with update of plugins
Use of importlib (importlib_metadata or importlib.metadata depending on python version)
Install plugin through pip
New version pattern for plugins
Which entry points ?
fs => builtin per default ?
When register plugins => check performances

Next Version

Use of Cookiecutter to bootsrap project efficiently ?

Still editing...

Make sentry integration optional

Purpose

As Ralph will manipulate sensible data, we should carefully watch jobs execution. Sentry will be a great help to detect failures.

Proposal

We should think of a plugin architecture for Ralph for this kind of integration.

Need to disable compression to use OVH LDP's WebSocket

Bug Report

Expected behavior/code
Using the ws backend with OVH LDP's websocket as ws_uri parameter should output live logs

Actual Behavior
Fatal error :

Traceback (most recent call last):
  File "/usr/local/bin/ralph", line 33, in <module>
    sys.exit(load_entry_point('ralph-malph', 'console_scripts', 'ralph')())
  File "/usr/local/bin/ralph", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/local/lib/python3.9/importlib/metadata.py", line 77, in load
    module = import_module(match.group('module'))
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/app/src/ralph/__main__.py", line 24, in <module>
    cli.cli()
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click-8.0.3-py3.9.egg/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/app/src/ralph/cli.py", line 314, in fetch
    backend.stream()
  File "/app/src/ralph/backends/stream/ws.py", line 38, in stream
    asyncio.get_event_loop().run_until_complete(_stream())
  File "/usr/local/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/app/src/ralph/backends/stream/ws.py", line 34, in _stream
    async with websockets.connect(self.uri) as websocket:
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 632, in __aenter__
    return await self
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 649, in __await_impl_timeout__
    return await asyncio.wait_for(self.__await_impl__(), self.open_timeout)
  File "/usr/local/lib/python3.9/asyncio/tasks.py", line 481, in wait_for
    return fut.result()
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 660, in __await_impl__
    await protocol.handshake(
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 331, in handshake
    self.extensions = self.process_extensions(
  File "/usr/local/lib/python3.9/site-packages/websockets-10.0-py3.9-linux-x86_64.egg/websockets/legacy/client.py", line 220, in process_extensions
    raise NegotiationError(
websockets.exceptions.NegotiationError: Unsupported extension: name = permessage-deflate, params = []
ERROR: 1

Steps to Reproduce

./bin/ralph fetch -b ws --ws-uri "wss://gra3.logs.ovh.com/tail/?tk=xxxxxxx" -c 1

Environment

  • Ralph version: master
  • Platform: Linux 4.19.128-microsoft-standard #1 SMP Tue Jun 23 12:58:10 UTC 2020 x86_64 GNU/Linux

Possible Solution

# src/ralph/backends/stream/ws.py:27
    def stream(self):
        """Stream websocket content to stdout."""
        # pylint: disable=no-member

        logger.debug("Streaming from websocket uri: %s", self.uri)

        async def _stream():
            async with websockets.connect(self.uri, compression=None) as websocket:
#                                                      ^ Add compression=None
                while event := await websocket.recv():
                    sys.stdout.buffer.write(bytes(f"{event}" + "\n", encoding="utf-8"))

        asyncio.get_event_loop().run_until_complete(_stream())

We should pass a client_options parameter like we do in database backend to be able to send compression=None to the connect method.

Should compression=None be a default option ?

Wrong "more" url on the second request or when using query parameters

Bug Report

Expected behavior/code
When making a GET request to the "/xAPI/statements" endpoint, a new endpoint for the next "page" of statements is received via the "more" key on the response.

Making a second GET request to this new endpoint, should return a new response with another valid endpoint in the "more" key.

Actual Behavior
An invalid endpoint in the "more" key is returned, it seems to be missing the ?.

Steps to Reproduce

  1. Make a simple request to the /xAPI/statements endpoint:
curl -s \
    --user ralph:secret \
    -H "Content-Type: application/json" \
    http://localhost:8100/xAPI/statements/ \ | jq
  1. The response contains a "more" key:
"more": "/xAPI/statements?pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150"
  1. Make a new request to the previous returned endpoint:
curl -s \
    --user ralph:secret \
    -H "Content-Type: application/json" \
    "http://localhost:8100/xAPI/statements?pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150" \ | jq
  1. The "more" key on the response is as following (notice the lack of a "?" character on "statementspit_id")
"more": "/xAPI/statementspit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1682373022000|150&pit_id=s6vrAwEKc3RhdGVtZW50cxZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRABZ2TFhuaFBLbVNmdS0yVXRSQkdWRy1BAAAAAAAAGtelFnZ1bzM0MERaVGxTSXNDZzZvTGRObUEAARZ3UHZkY25GQlNPeTJfWVdiSkxLc3ZRAAA=&search_after=1637921586498|32"
  1. Making a request to this url without adding the "?" character returns:
{
  "detail": "Not Found"
}

Environment

  • Ralph version: 3.5.1
  • Platform: kubernetes

Add push command

Purpose

Ralph should be able to write to storage backends.

Proposal

Implement a generic push command that takes a storage backend (and its configuration) as arguments and streams the standard input to this backend.

Add support for a `lrs` backend to convert and send logs from any app to an LRS

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
An LRS has a specified API, thus, it can be considered as a standard backend for Ralph to pull/push learning events.

Describe the solution you'd like
Implement the lrs backend that should be compatible with i. the pull and push commands, and ii. the standard Ralph backend API.

Describe alternatives you've considered
Using curl or equivalent headless tool to send HTTP requests (httpie, etc.), meaning having a prior knowledge of what a LRS is and its specification.

Discovery, Documentation, Adoption, Migration Strategy
Pushing learning events from foo apps to an LRS:

tail -f /var/log/syslog | \
  grep `foo` | \
  ralph extract --parser syslog | \
  ralph convert --from foo --to xapi | \
  ralph push --backend lrs --backend-root-url=https://lrs.example.org

Note that this example is a brain dump that requires parsers and models that do not exist... yet!

Do you want to work on it through a Pull Request?
Yes! ๐Ÿ’ช

Implement UUID edx events

Purpose

The management of edx events implies to associate them an identifier. Indeed, when reimporting events into elasticsearch, they appear as duplicates because they are considered as different events. Adding an identifier allows you to rewrite the events with the modifications of the reimport.

Proposal

The objective is to implement a UUID calculation that remains constant for each event, i.e., the identifier must allow the event to be found when a new version of it is imported.

The UUID must be native to edx to remain independent of the implementation of the validation models

Log debug informations when starting Ralph with uvicorn

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
There has been several occasions where the first setup of Ralph was a pain caused by environment variables not properly set.
At the moment, when starting Ralph LRS through uvicorn, it is hard to know if it is using the default settings or if it has correctly taken into account the specified settings.

Describe the solution you'd like
Add the possibility to log debug informations at the start of Ralph through uvicorn, similarly to how it's done when starting Ralph LRS through ralph runserver command.

Add edx tracking logs (de)serializers

Purpose

Ralph should be able to read edx tracking logs to convert them to various standards.

Proposal

Define Marshmallow Schemas for known/documented edx tracking log event types.

Add ElasticSearch backend

Purpose

Ralph should be able to read/write from ES indexes.

Proposal

Implement a new storage backend with read and write features to ES clusters.

Make Ralph more extensible by implementing plugins support

Purpose

Since the 3.0 release, Ralph has many optional dependencies depending on its usage: library (with backends support), CLI or LRS. It's not obvious to make dependency-management straightforward both for project developers and end-users.

Proposal

To simplify the various flavors of the project, as suggested by @sampaccoud in #218, we propose to implement a plugin management system that will simplify project maintenance and extensibility!

References:

  1. https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/
  2. https://setuptools.pypa.io/en/latest/userguide/entry_point.html

LRS should return `200 OK` to the GET statements when no statement present

Bug Report

Expected behavior/code
As specified here, when the LRS contains no statements, the LRS should still return 200 OK to a GET Statements (with an empty array of statements).

Actual Behavior
LRS returns a 500 Internal Server Error.

Steps to Reproduce

bin/ralph runserver -b es
http -a janedoe:supersecret :8100/xAPI/statements

Environment

  • Ralph version: 3.5.0
  • Platform: -

Integrate pydocstyle linter

Purpose

For now, we do not lint our docstrings, and Black is not opinionated about this. I think we should enforce our numpy docstring style.

Proposal

Integrate pydocstyles using the numpy convention.

Improve Sentry integration for transactions

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
When tracking Ralph LRS performances using Sentry transactions, reports are polluted with health check transactions.

Describe the solution you'd like
We might consider ignoring health check routes in transaction reports (should be configurable via a feature flag in Ralph's configuration).

Discovery, Documentation, Adoption, Migration Strategy
Transaction filtering for particular routes is described in Sentry's documentation.

Do you want to work on it through a Pull Request?
Of course!

Make backends less CLI-driven

Purpose

Backends have been developed in a CLI-perspective with data I/O inherited from UNIX standard streams. Now that Ralph embeds a LRS server, we need to make them more generic/usable as a library.

Proposal

  • use file-like objects instead of hard-coded streams for I/O
  • add a query method for database backends
  • ease database client instantiation / configuration

Namespace identifiers in command history

Purpose

The history file identifiers are not tied to a container or bucket, hence the same file name from different buckets can be considered as a previously processed record.

Proposal

Namespacing the file identifier by the bucket (container) name or ID in the history file seems a relevant approach.

Implement alternate request syntax in API

Feature Request

https://github.com/adlnet/xAPI-Spec/blob/1.0.3/xAPI-Communication.md#13-alternate-request-syntax

The xAPI spec describes an alternate request syntax, where all requests are passed as POST with fields "method" and "content". This can be used to circumvent query string length limitations for GET as well as some cases where PUT is unavailable. Perhaps this should be implemented ?

Describe the solution you'd like

Checking for "method" in the json provided to all POST requests, and decide whether or not the "alternate" syntax is being used.

Do not throw 409 errors on duplicate statements in POST

Feature Request

As it stands, if a batch of statements is POSTed to Ralph one one of them has an id that is already stored the entire batch will be rejected. This complicates uses cases of trying to backfill lost or historical data, or in the case of a retry where part of a batch was already processed. The specification is a little unclear on how to handle this case, but I believe we have the flexibility to simply remove duplicate ids from the batch before processing as long as they are not saved: https://github.com/adlnet/xAPI-Spec/blob/1.0.3/xAPI-Communication.md#212-post-statements

Describe the solution you'd like
Instead of throwing a 409, simply remove the duplicate statements from the batch and continue processing. Perhaps an ideal solution would be to include any found duplicates in the response detail.

Describe alternatives you've considered

  1. Changing from batch processing to sending each statement individually so no statements are lost. This is not feasible for large historical loads, however, which may have tens or hundreds of millions of statements.
  2. Having Ralph reject the batch with a list of duplicate IDs so the client can remove them and re-POST. This seems unnecessarily complicated and relies on implementation specific knowledge of the LRS which some clients may not have.

Discovery, Documentation, Adoption, Migration Strategy
I believe this is a valid interpretation of the xAPI specification and shouldn't require any changes from consumers of the API unless they already have a custom implementation to handle the current issues.

Do you want to work on it through a Pull Request?
It should be a fairly small change, I'm happy to write a PR for it if this change is desired.

Add support for the AWS Glacier backend

Purpose

A few cloud providers propose an Amazon Glacier-compatible cold storage service. It seems required to add support for this backend in Ralph.

Proposal

Implement a new glacier backend using the boto library. In a first implementation, we can consider this backend as write-only.

Use pydantic for settings management

Purpose

The settings management in our project is mainly home made and in some cases, it can be messy. We should find a cleaner solution.

Proposal

As we are using pydantic in our projects, why not using it for settings management? It is one of its main application and it would be meaningful for us to enlarge its use to this scope

Implement a LRS server

Purpose

We need to add a headless HTTP API server to Ralph that implements the LRS specification. This server should be fully LRS-compliant to ensure interoperability over a collection of trusted sovereign LRSs.

Proposal

โœ… The LRS server should be implemented using the FastAPI framework for its performances and its integration with Pydantic, a library extensively used in this project.

โœ… The LRS server should be started using a new serve command. This command will be a wrapper for the gunicorn ad hoc command (see: https://www.uvicorn.org/deployment/#gunicorn).

โœ… One should ensure the LRS compliance to this implementation using the LRS conformance requirements reference.

๐Ÿ’ก A good start might be to write a reference LRS spec using OpenAPI 3.x format and use this document in a TDD perspective using Dredd (or any other open source equivalent).

Alternatives

1๏ธโƒฃ Once deployed and publicly accessible, implemented LRS compliance can be tested using the official LRS test server: https://lrstest.adlnet.gov/

2๏ธโƒฃ Should we handle this feature in a dedicated project using Ralph as a library (dependency)?

Poor performance of LRS due to HTTP Basic Auth hashing

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.

We are currently having poor performance when making many concurrent requests to Ralph LRS.
As seen here, when load testing Ralph LRS with 1000 concurrent users (each sending one request containing one xAPI statement) the total average response time skyrockets.

After adding a timing middleware, it shows that making a dummy request to Ralph LRS takes ~200ms, the majority of it (~180ms) spend hashing the HTTP Basic Auth password to check user credentials.

When building Ralph LRS, we chose to go with bcrypt for hashing and salting password. bcrypt seems to be the standard for HTTP Basic Auth. It is slow by design to prevent brute force attacks, but induces a large overhead for each request.

Describe the solution you'd like

An OpenId Connect authentication method is currently under development (#262), and it should greatly speed up each request, as it does not require to hash password to check credentials.

Describe alternatives you've considered

Another solution, still being discussed on our side, would be to propose different HTTP Basic Auth backends with different hashing method, so that developers can choose their own performance cost/security level ratio. It would also allow us to compare Ralph LRS to other open source LRSs in a fair way.

Follow specification for GET /statements query behavior ?

Feature Request

Is your feature request related to a problem or unsupported use case? Please describe.
The specifiation for GET statements states that agent parameter should behave as such:

  • Filter, only return Statements for which the specified Agent or Group is the Actor or Object of the Statement.

  • For the purposes of this filter, Groups that have members which match the specified Agent based on their Inverse Functional Identifier as described above are considered a match

Currently, the agent filter parameter only acts as a filter on the actor field of a statements. Furthermore, the current implementation does not work with groups.

NB: the implementation of agent parameter is marked as optional in the spec.

Describe the solution you'd like
Perhaps we should implement this part of the spec ? There is a little refactoring to be done as all filters currently work as AND, whereas this would require an OR mechanism.

Add OVH swift storage backend

Purpose

OVH Swift object storage is widely used. It's crucial for us to support it.

Proposal

  • add a new swift storage backend

Add support for the AWS S3 backend

Purpose

Most cloud providers propose an Amazon S3-compatible object storage service. It seems required to add support for this backend in Ralph.

Proposal

Implement a new s3 backend using the boto library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.