skyzh / canvas_grab Goto Github PK

View Code? Open in Web Editor NEW

209.0 2.0 28.0 384 KB

🌐 One-click script to synchronize files from Canvas LMS.

Home Page: https://git.sjtu.edu.cn/iskyzh/canvas_grab

License: MIT License

Python 93.28% PowerShell 1.40% Shell 1.06% QML 4.26%

canvas python python-requests canvas-lms download

canvas_grab's Introduction

canvas-grab

Looking for Maintainers

As I no longer have access to Canvas systems, this project cannot be actively maintained by me. If you are interested in maintaining this project, please email me.

Grab all files on Canvas LMS to local directory.

Less is More. In canvas_grab v2, we focus on stability and ease of use. Now you don't have to tweak dozens of configurations. We have a very simple setup wizard to help you get started!

For legacy version, refer to legacy branch.

Getting Started

Install Python
Download canvas_grab source code. There are typically three ways of doing this.
- Go to Release Page and download {version}.zip.
- Or git clone https://github.com/skyzh/canvas_grab.
- Use SJTU GitLab, see Release Page, or visit https://git.sjtu.edu.cn/iskyzh/canvas_grab
Run ./canvas_grab.sh (Linux, macOS) or .\canvas_grab.ps1 (Windows) in Terminal. Please refer to Build and Run from Source for more information.
Get your API key at Canvas profile and you're ready to go!
Please don't modify any file inside download folder (e.g take notes, add supplementary items). They will be overwritten upon each run.

You may interrupt the downloading process at any time. The program will automatically resume from where it stopped.

To upgrade, just replace canvas_grab with a more recent version.

If you have any questions, feel free to file an issue here.

Build and Run from Source

First of all, please install Python 3.8+, and download source code.

We have prepared a simple script to automatically install dependencies and run canvas_grab.

For macOS or Linux users, open a Terminal and run:

./canvas_grab.sh

For Windows users:

Right-click Windows icon on taskbar, and select "Run Powershell (Administrator)".
Run Set-ExecutionPolicy Unrestricted in Powershell.
If some courses in Canvas LMS have very long module names that exceed Windows limits (which will causes "No such file" error when downloading), run the following command to enable long path support.
```
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name LongPathsEnabled -Type DWord -Value 1 
```
Open canvas_grab source file in file browser, Shift + Right-click on blank area, and select Run Powershell here.
Now you can start canvas_grab with a simple command:
```
.\canvas_grab.ps1
```

Configure

The setup wizard will automatically create a configuration for you. You can change config.toml to fit your needs. If you need to re-configure, run ./configure.sh or ./configure.ps1.

Common Issues

Acquire API token Access Token can be obtained at "Account - Settings - New Access Token".
SJTU users 请在此页面内通过“创建新访问许可证”按钮生成访问令牌。
An error occurred You'll see "An error occurred when processing this course" if there's no file in a course.
File not available This file might have been included in an unpublished unit. canvas_grab cannot bypass restrictions.
No module named 'canvasapi' You haven't installed the dependencies. Follow steps in "build and run from source" or download prebuilt binaries.
Error when checking update It's normal if you don't have a stable connection to GitHub. You may regularly check updates by visiting this repo.
Reserved escape sequence used please use "/" as the path seperator instead of "\".
Duplicated files detected There're two files of same name in same folder. You should download it from Canvas yourself.

Screenshot

Contributors

See Contributors list. @skyzh, @danyang685 are two core maintainers.

License

MIT

Which means that we do not shoulder any responsibilities for, included but not limited to:

API key leaking
Users upload copyright material from website to the Internet

canvas_grab's People

Contributors

Stargazers

Watchers

canvas_grab's Issues

main.py throws exception: "ValueError: time data '2020-03-19T06:33:33Z' does not match format '%Y-%m-%dT%H:%M:%S%z'"

I use the latest release and run from the source code in WSL. The main.py throws

Traceback (most recent call last):
  File "./main.py", line 381, in <module>
    main()
  File "./main.py", line 86, in main
    process_course(course)
  File "./main.py", line 341, in process_course
    file.updated_at, '%Y-%m-%dT%H:%M:%S%z').timestamp()
  File "/usr/lib/python3.6/_strptime.py", line 565, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)
  File "/usr/lib/python3.6/_strptime.py", line 362, in _strptime
    (data_string, format))

ValueError: time data '2020-03-19T06:33:33Z' does not match format '%Y-%m-%dT%H:%M:%S%z'
after each file is downloaded. I change the time format from '%Y-%m-%dT%H:%M:%S%z' to '%Y-%m-%dT%H:%M:%Sz', and everything goes well. I don't know much about the time format in Python so I am not sure whether it will happen on other platforms or not.

A TypeError Occurred

I am using canvas_grab v1.8.1. The crawler worked well last semester, but a TypeError suddenly occurred today when I first used it this semester. I wonder why... I have not changed anything.

File Exists Error when synchronizing

 Traceback (most recent call last):
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\main.py", line 6, in <module>
    canvas_grab.__main__.main()
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\canvas_grab\__main__.py", line 68, in main
    transfer.transfer(
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\canvas_grab\transfer.py", line 42, in transfer
    for _ in self.yield_transfer(base_path, archive_base_path, plans):
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\canvas_grab\transfer.py", line 59, in yield_transfer
    file_obj.rename(archive_path)
  File "C:\Users\jbxia\miniconda3\lib\pathlib.py", line 1377, in rename
    self._accessor.rename(self, target)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'files\\BIS 681 01 (SP22)_ Statistical Practice II\\3. Panel discussion\\Bios and contact info statistical practice panel 2022.docx' -> 'files/_canvas_grab_archive/files/BIS 681 01 (SP22)_ Statistical Practice II/3. Panel discussion/Bios and contact info statistical practice panel 2022.docx'

Since the file names are the same, why it will have such error?

Prepare for v2.0

In v2.0, I plan to implement:

use GraphQL API to batch fetching information
fine-grained filter
better terminal-based GUI
maybe we can use golang instead Python for better binary packaging and coroutine support

Is there a one-click way to start sync in windows?

Is there a one-click way to start sync in windows? Just like double click it.

Escape characters in course names

A course name may possibly contain any of the characters

\ / : ? " < > |

Creating a directory corresponding to the course name1 | name2 on Windows results in

Traceback (most recent call last):
  File "main.py", line 6, in <module>
    canvas_grab.__main__.main()
  File "C:\Users\username\canvas_grab\canvas_grab\__main__.py", line 45, in main
    on_disk_snapshot = canvas_grab.snapshot.OnDiskSnapshot(
  File "C:\Users\username\canvas_grab\canvas_grab\snapshot\on_disk_snapshot.py", line 29, in take_snapshot
    for item in base.rglob('*'):
  File "C:\Python38\lib\pathlib.py", line 1130, in rglob
    for p in selector.select_from(self):
  File "C:\Python38\lib\pathlib.py", line 486, in select_from
    if not is_dir(parent_path):
  File "C:\Python38\lib\pathlib.py", line 1385, in is_dir
    return S_ISDIR(self.stat().st_mode)
  File "C:\Python38\lib\pathlib.py", line 1176, in stat
    return self._accessor.stat(self)
OSError: [WinError 123] The filename, directory name, or volume label syntax is incorrect: 'files\\name1 | name2'

Reduce overhead of moved files

I'm using canvas_grab v1.7.7.
It cannot track the moved files for now. They should be moved to new place instead of redownloaded.

Thanks!

Course name with directory separators

If a course has directory separators such as /, then they will produce nested directories, which is not the desired behavior. For example, if a course has the name CS/Math 101, then it will produce the following directory structure

- CS
  - Math 101
    -  ...

I believe we should replace these special characters (for example, replace all directory separators with -).

Force users to review LICENSE before running this program

The administrator said that:

课程分本科生课程、研究生课程、自建课程没有完全统一的命名规则。友情提示一下三位参与项目的同学。我们并不反对利用程序提供的API进行相关辅助插件的开发应用。但如果造成教师版权资源的泄露，（教学视频、图书馆教程材料）你们需要承担相关责任。

But when it comes to copyright issues, it is not the maintainers of this project that should be blamed.

Users should be responsible for their own actions. The LICENSE clearly stated that:

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Therefore, to make sure that every user know we do not provide any warranty of any kind in this software, we should let users check LICENSE before running this program.

It's ridiculous, and impossible, for arbitrary users maintaining this project, to be responsible for others who upload SJTU course materials online. I should state that contributors are not maintainers, and maintainers are not necessarily SJTU students. A total of 3 GitHub users contributed to this project in the past. And currently, this project has two active maintainers.

Also, it should be made clear that this project is not solely intended for SJTU open canvas. Personally, I made this piece of script for automating the downloading file process for ANY Canvas LMS website. Only two functionalities are SJTU-specific, and require users to manually enable them:

auto renaming folder with SJTU course naming scheme
resolving (but not downloading) SJTU course video URL

canvas_grab never share your information, and never bypass limitations of Canvas. This program DOES NOT intentionally share what you've downloaded online. They are just saved in your local folders. Furthermore, canvas_grab can only access resource that users have access to. This program cannot bypass limitations of Canvas LMS. Therefore, even if there's any leak of copyright material, it is not the maintainers, but the users who use this software, that should take responsibility for his or her action.

For example, you use Chrome to download some copyright material from MIT intra-net, and upload it to YouTube, which violates the copyright policy. You can not blame Chrome developers for your action, as Chrome is just the tool a user use to access resource, and only by providing valid credentials can Chrome download resource for a user. The same applies to canvas_grab. We just automate the process of downloading with web browser.

We are acknowledged of the concerns expressed by SJTU open canvas administrators. Therefore, we enforced that user should review LICENSE before running this program in #30 .

Again, thank everyone who take initiative to test and use this software. Hope this software can free you from manually downloading every single file one by one.

Interrupt during downloading will leave a broken file

A potential wrong `Fore.Red`

https://github.com/skyzh/canvas_grab/blob/master/main.py#L175

It should be Fore.RED right?

Courses with NO `FILES` but `MODULES`

It seems that some teachers don't give the permission to students to view files but publish everything in modules. See below.

Any ideas?

Is whitelist mode of course management possible?

Can we manage courses by selecting courses instead of ignoring? Every semester we have only a few important courses and lots of unimportant courses whose files aren't necessary (e.g. P.E., 军理...). And by selecting course we can easily exclude courses from the previous semesters. I think it will bring us great convenience.

下载图片的时候有点问题

好像不能下载在公告里面的图片，然后只要有一个下载不成，后面的文件
就都下载不成了（要是能跳过或许更好）

Release Archive is too big

Package in a virtual environment will help to shrink the size of release archive, you may have a try.
It contains only 623 files now, 599 of these are timezone files.

cd .\canvas_grab\
git checkout master
pip install virtualenv
.\venv\Scripts\activate.ps1
pip install -r requirements.windows.txt
pip install pyinstaller
pyinstaller main.py --hidden-import pkg_resources.py2_warn --add-data 'config.example.toml;.' --add-data 'LICENSE;.' --add-data 'README.md;.' -n canvas_grab

Download function for links in module has not been implemented yet

canvas_grab/main.py

Lines 350 to 354 in 5a3df4d

 elif item.type in ["Page", "Discussion", "Assignment"]: 

 page_url = item.html_url 

 elif item.type == "ExternalUrl": 

 page_url = item.external_url 

 elif item.type == "SubHeader":

I think the links can be saved as a html file with a refresh meta tag.

How to download from multiple canvas lms profiles?

If I have more than one canvas lms account, how do I configure the toml file to download from both?

Download with file attributes

Is it impossible to keep the create date and modify date attributes in the downloaded files?

Can not download image file

It just says,
Retrying
Retrying
Retrying
Retrying
Retrying
, and a “KeyError: 'content-length'”.

screenshot

Renewal of Modified Files and Auto-Deletion of Manually Added Files

I find that any modification on downloaded file will be recovered as the program re-downloads files to cover the modified version. That means we cannot directly take notes on slides, which frustrated me a lot when yesterday I suddenly found out my notes were totally gone.
Besides, for files I manual added to the course directory (like some relevant materials), they will also be removed automatically by the program. I have set the configuration of "delete_file" to "false" but it seems not to solve the problem.
Currently, I have to copy files somewhere else and then take notes on the copy, which causes great inconvenience.
I am using the latest version of canvas_grab and macOS BigSur 11.2.2.
I would appreciate it if you could try to solve the problem. It used to work fine with the previous version. :)

Remind users to close opened files

The program may fail if the downloaded file is being used by another process. We should remind a user to close it.

Resource Does Not Exist

One of my courses does NOT have files tab. When synchronizing, it pops up this error. I don't know how to fix it.

Traceback (most recent call last):
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\canvas_grab\__main__.py", line 53, in main
    canvas_snapshot = canvas_snapshot_obj.take_snapshot()
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\canvas_grab\snapshot\canvas_file_snapshot.py", line 51, in take_snapshot
    for _ in self.yield_take_snapshot():
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\canvas_grab\snapshot\canvas_file_snapshot.py", line 62, in yield_take_snapshot
    raise ResourceDoesNotExist("File tab is not supported.")
canvasapi.exceptions.ResourceDoesNotExist: File tab is not supported.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\main.py", line 6, in <module>
    canvas_grab.__main__.main()
  File "D:\Google Drive\Year 2 Sem 2\Z_CANVAS_GRAB\canvas_grab\__main__.py", line 56, in main
    colored(f'{mode} not supported, falling back to alternative mode', 'yellow'))
NameError: name 'mode' is not defined

HTML generated but no files (Doc, PDF) received!

Hi there, although the tool work (very appreciated), I could not manage to generate the file. Instead of that, what I received from course generating was only HTML files. Hope you could fix the problem for me and if possible, please add a function that allows content downloading from a specific course ID!
Thanks!

File with no size attribute

Possible optimization on colorama

colorama supports autoreset, this might help to optimize many prints.

If you find yourself repeatedly sending reset sequences to turn off color changes at the end of every print, then init(autoreset=True) will automate that:

And there're loggers supports colorized terminal such as https://github.com/Delgan/loguru#pretty-logging-with-colors

how to open GUI

Nice to meet you. I am a freshman at SJTU :)
I have a problem to switch to GUI mode

Partial file downloaded

Recently I found that files will be truncated if you're downloading it without logging in (e.g. using canvas_grab). I'm figuring out how to download file with tokens.

May Interrupted By Exception Raised By A Single File

The program can be interrupted immediately if there is something wrong with some specific file and then fails to download the rest.
For example, there is something wrong with the file "SVM_demo.py". By raising exception of "File download not complete", the program will simply reaches to an end, not continuing to download rest of the undamaged files. This may cause inconvenience. If the specific file remains damaged, I may then no longer download other files either.

I guess adding some exception handling here in "main.py" could solve the problem and be more user friendly:)

Subheader in module is not supported

canvas_grab/main.py

Lines 354 to 355 in 5a3df4d

 elif item.type == "SubHeader": 

 pass

In some of my course, there are many subheaders in a module for better items' arrangement. I think it need to be supported.

Interrupt during checkpointing will break checkpoint file

Interrupt during checkpointing will leave a broken file

Download Course Video

Is it possible to:

~~download files by section~~
download embedded videos linked to vshare.sjtu.edu.cn
~~download the books linked to jcbks.lib.sjtu.edu.cn~~

Download files by module

Is there any plan to make it download files by module now? #4

Incompatible checkpoint file with 1.4.7

之前是1.4.7的版本，使用时偶有checkpoint的报错，担心有些课的课件没有完整地下好，于是换成1.7.8，设好参数之后直接就用不了了

以下是异常栈
Traceback (most recent call last):
File ".\main.py", line 422, in
main()
File ".\main.py", line 72, in main
checkpoint.load()
File "C:\Users\a\Desktop\canvas_grab-master\checkpoint.py", line 54, in load
self._checkpoint[k] = CheckpointItem(**v)
TypeError: init() missing 1 required positional argument: 'id'
（已经登录成功，并显示我的姓名和id）

A GUI tool needed

Do you have any plan to design a nice python-based or javascript-based GUI tool for better experience?

Bug on two files of the same name in same folder

Possible optimization on colorama

colorama supports autoreset, this might help to optimize many prints.

If you find yourself repeatedly sending reset sequences to turn off color changes at the end of every print, then init(autoreset=True) will automate that:

And there're loggers supports colorized terminal such as https://github.com/Delgan/loguru#pretty-logging-with-colors

locked file in module mode cause canvas_grab crash

I'm using canvas-grab v2.0.7 and it worked well at first. But a few days ago it can' work because of this error.
Thanks!

Handle redirects correctly

When an instructor uploads a .html file, an HTML file redirecting to https://${canvas_domain}/api/v1/courses/${course_id}/module_item_redirect/${file_id} will be put under modules in place of the actual file. This results in the actual content not being downloaded, but you end up with the file (${var} are redacted variables)

<html>
<head>
    <title>${file_title}</title>
    <meta charset="UTF-8" />
    <meta http-equiv="refresh" content="0; URL=https://${canvas_domain}/api/v1/courses/${course_id}/module_item_redirect/${file_id}" />
</head>
<body>
    <p>Redirecting you to <a href="https://${canvas_domain}/api/v1/courses/${course_id}/module_item_redirect/${file_id}">${file_name}</a></p>
</body>
</html>

Pages not downloaded when there is no Pages tab

Is it possible to download pages with the program? I have Organize by module, download files, links and pages selected, and a course that consists entirely of pages. Nothing is downloaded and no errors are given.
The only message is Updating 0 objects (0 remote objects -> 0 local objects) when the program is trying to download items from the course.

	elif item.type in ["Page", "Discussion", "Assignment"]:
	page_url = item.html_url
	elif item.type == "ExternalUrl":
	page_url = item.external_url
	elif item.type == "SubHeader":

skyzh / canvas_grab Goto Github PK

canvas_grab's Introduction

canvas-grab

Getting Started

Build and Run from Source

Configure

Common Issues

Screenshot

Contributors

License

canvas_grab's People

Contributors

Stargazers

Watchers

Forkers

canvas_grab's Issues

Recommend Projects

Recommend Topics

Recommend Org

Jobs