GithubHelp home page GithubHelp logo

mytardis / mydata Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 4.0 55.33 MB

Desktop application for uploading data to MyTardis

Python 96.73% Shell 1.04% PowerShell 0.77% Batchfile 0.20% Makefile 0.16% Objective-C 1.05% Rich Text Format 0.05%
hacktoberfest mytardis wxpython

mydata's Introduction

MyTardis

Documentation Status Semaphore build status Travis CI build status Codacy Badge Coveralls Badge Codecov Badge

Overview

MyTardis began at Monash University to solve the problem of users needing to store large datasets and share them with collaborators online. Its particular focus is on integration with scientific instruments, instrument facilities and research storage and computing infrastructure; to address the challenges of data storage, data access, collaboration and data publication.

Read more...

Key features for users

The MyTardis data management platform is a software solution that manages research data and the associated metadata. MyTardis handles the underlying storage to ensure that data is securely archived and provides access to the data through a web portal. Data hosted in MyTardis can also be accessed via SFTP.

Read more...

Key features for instrument facilities

MyTardis takes care of distributing data to your users. Its instrument integration technology takes care of securely and automatically shifting data from instrument to repository and makes it accessible by the right users.

Read more...

Developing for MyTardis

MyTardis is mostly written in the Python programming language and is built on top of the Django web framework. A complete installation of the service also includes an Elasticsearch index, a RabbitMQ-based task queue, an Nginx server, and a PostgreSQL database.

To set up and manage these services we employ the Kubernetes orchestration software and cloud technologies.

Read more...

Find out more

Project homepage http://mytardis.org

The source code is hosted at https://github.com/mytardis/mytardis

Documentation at http://mytardis.readthedocs.org includes

  • User documentation
  • Administrator documentation
  • Developer documentation

The wiki at https://github.com/mytardis/mytardis/wiki includes

  • Links to MyTardis add-ons, including apps and post-save filters
  • Information about MyTardis community events

Known deployments

Related projects and repositories

Releases

The default branch on GitHub is develop. This is the cutting edge development version. Please DO NOT use this in production, as it may have bugs that eat your data.

The master branch is the current stable release with all the latest bug fixes included. It will move to newer versions automatically. Follow this branch if you want to stay up to date in a production environment.

Each version has its own branch named by version number. At the time of writing, the latest release is 4.6.0, tagged from the series-4.6 branch. Follow this branch for your production installation and perform version upgrades manually.

Each bug fix or set of fixes bumps the minor version and each new release is tagged, eg. 4.6.1. Use tagged releases if you are paranoid about changes to the code you have not tested yourself.

To follow development, please see the contributing section below.

Reporting Bugs

Bug reports and feature requests can be made via our public issue tracker.

Contributing

New contributors are always welcome, however all developers should review the pull-request checklist before making pull requests.

For any wishes, comments, praise etc. either open a GitHub issue or contact us.

Active developers are also welcome to join our Slack team.

Contact details can be found on mytardis.org.

mydata's People

Contributors

dependabot[bot] avatar dyakhnov avatar grischa avatar jameswettenhall avatar pyup-bot avatar wettenhj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mydata's Issues

Improve exception handling for uploader request approved without working key-pair.

If a MyData instance finds that its uploader registration request has been marked as "approved", it proceeds with its SCP and SSH subprocesses without any specific exception handling for the case where SSH key authentication is not working, e.g. because the MyTardis administrator forgot to paste the public key into the ~mydata/.ssh/authorized_keys file.

This can result in "Permission denied" errors which are not very user friendly.

MyData should give the user a more informative message like "Please ask your MyTardis administrator to ensure that MyData's public key has been added to the authorized_keys file."

And we should ensure that the number of failed authentications is kept to a minimum, so that the client IP address doesn't get banned by fail2ban.

Duplicate experiments on first scan and upload in MyData

When using MyData to upload a new Experiment and Dataset it leads to duplicate Experiments in MyTardis.

The first time the MyData client is run with multiple folders (datasets) to a single experiment using the MyData client in "User Group / Experiment / Dataset" mode, a new experiment for EACH data set is created, rather than a single Experiment with multiple datasets. So, for example, a folder structure of:

  • Group_name
    • Experiment A
      • Dataset 1A
      • Dataset 1B
      • Dataset 1C

results in three experiments in MyTardis all named Group_name - Experiment A. Example of duplicate datasets: example duplicate datasets

The second time the MyData client is run (without making any changes, just click run again) all three are correctly mapped to one of the Experiments only. So in the above example one of the three experiments created in the first run will now contain 3 datasets (i.e. the expected behaviour).

Running a third time time correctly detects that there are no new files.

I found the problem running the mac version of the client and was able to replicate similar problems with the windows version on another machine.

It seems that adding a new folder/dataset to an existing experiment that MyData already knows about does not create duplicate experiments - so the problem only seems to happen if the first scan of a new Experiment level folder contains multiple subfolders at dataset level.

include and/or exclude glob files are re-parsed for every file added (if enabled)

MyData seems to open and re-parse the includesFile and/or excludesFile (if enabled) for every file it finds while walking FolderModels:

The offending open() / readlines() is in MatchesPatterns(). It's called by:

which are called in the os.walk() file loop during FolderModel initialisation.

Worst case: both glob files are re-parsed for every os.walk()-ed file if both useIncludes and useExcludes are true.

Suggestion: check and cache the {in,ex}clude glob files' modification times before the os.walk loop and re-parse them if there's a change.

"Username / “MyTardis” / Experiment / Dataset" folder configuration not working properly

Attempting to use this configuration setting in MyData seems to require that the "MyTardis" folder under the username folder is the only subfolder at that level. Adding another subfolder throws the error:
"A folder name of "[FolderName]" was found where a "MyTardis" folder was expected.

I'm assuming this is not the expected behaviour. Found this is on Mac OS, and replicated on the Windows version of the client.

errors in MyData log (running on Mac)

This occurs when I save new settings in MyData. Please let me know if you need screenshots of settings. Here is the log...

2016-06-08 11:40:16,700 - init.pyc - 454 - SettingsDialogValidation - MainThread - DEBUG - Starting thread SettingsModelValidationThread
2016-06-08 11:40:16,774 - init.pyc - 456 - SettingsDialogValidation - MainThread - DEBUG - Started thread SettingsModelValidationThread
2016-06-08 11:40:16,834 - init.pyc - 361 - Validate - SettingsModelValidationThread - DEBUG - Starting run() method for thread SettingsModelValidationThread
2016-06-08 11:40:16,963 - uploader.pyc - 650 - GetActiveNetworkInterfaces - CheckConnectivityThread - INFO - Determining the active network interface...
2016-06-08 11:40:17,041 - init.pyc - 182 - CheckConnectivityWorker - CheckConnectivityThread - DEBUG - Found at least one active network interface: en0.
2016-06-08 11:40:17,051 - init.pyc - 454 - SettingsDialogValidation - MainThread - DEBUG - Starting thread SettingsModelValidationThread
2016-06-08 11:40:17,072 - init.pyc - 456 - SettingsDialogValidation - MainThread - DEBUG - Started thread SettingsModelValidationThread
2016-06-08 11:40:17,153 - init.pyc - 361 - Validate - SettingsModelValidationThread - DEBUG - Starting run() method for thread SettingsModelValidationThread
2016-06-08 11:40:17,154 - settings.pyc - 853 - Validate - SettingsModelValidationThread - DEBUG - Settings validation - checking folder structure...
2016-06-08 11:40:17,159 - settings.pyc - 1542 - PerformFolderStructureValidation - SettingsModelValidationThread - DEBUG - SettingsModel folder structure validation succeeded!
2016-06-08 11:40:17,159 - settings.pyc - 864 - Validate - SettingsModelValidationThread - DEBUG - Settings validation - checking MyTardis URL...
2016-06-08 11:40:17,345 - settings.pyc - 949 - Validate - SettingsModelValidationThread - DEBUG - Settings validation - checking MyTardis credentials...
2016-06-08 11:40:17,399 - settings.pyc - 977 - Validate - SettingsModelValidationThread - DEBUG - Settings validation - checking MyTardis facility...
2016-06-08 11:40:17,466 - settings.pyc - 1025 - Validate - SettingsModelValidationThread - WARNING - For now, we are assuming that if we find an instrument record with the correct name and facility, then it must be the correct instrument record to use with this MyData instance. However, if the instrument record we find is associated with a different uploader instance (suggesting a different MyData instance), then we really shouldn't reuse the same instrument record.
2016-06-08 11:40:17,511 - instrument.pyc - 154 - GetInstrument - SettingsModelValidationThread - DEBUG - Found instrument record for name "Nikon A1 Microscope" in facility "Microbial Imaging Facility"
2016-06-08 11:40:17,513 - settings.pyc - 1052 - Validate - SettingsModelValidationThread - DEBUG - Validating email address.
2016-06-08 11:40:17,609 - settings.pyc - 1058 - Validate - SettingsModelValidationThread - DEBUG - Done validating email address.
2016-06-08 11:40:17,611 - settings.pyc - 1073 - Validate - SettingsModelValidationThread - DEBUG - Settings validation - checking if MyData is set to start automatically...
2016-06-08 11:40:17,617 - settings.pyc - 1076 - Validate - SettingsModelValidationThread - WARNING - This auto-start on login stuff shouldn't really be in settings validation. I just put it here temporarily to ensure it doesn't run in the main thread.
2016-06-08 11:40:19,703 - settings.pyc - 1228 - Validate - SettingsModelValidationThread - INFO - MyData is already set to start automatically on login.
2016-06-08 11:40:19,733 - settings.pyc - 1279 - Validate - SettingsModelValidationThread - DEBUG - Settings validation - succeeded!
2016-06-08 11:40:19,765 - init.pyc - 447 - Validate - SettingsModelValidationThread - DEBUG - Finished running settingsModel.Validate() 4
2016-06-08 11:40:19,765 - init.pyc - 449 - Validate - SettingsModelValidationThread - DEBUG - Finishing run() method for thread SettingsModelValidationThread
2016-06-08 11:40:24,975 - init.pyc - 571 - ProvideSettingsValidationResults - MainThread - DEBUG - Settings were valid, so we'll save the settings to disk and close the Settings dialog.
2016-06-08 11:40:24,985 - uploader.pyc - 650 - GetActiveNetworkInterfaces - MainThread - INFO - Determining the active network interface...
2016-06-08 11:40:25,061 - uploader.pyc - 259 - init - MainThread - INFO - The active network interface is: en0
2016-06-08 11:40:25,112 - settings.pyc - 726 - SaveToDisk - MainThread - INFO - Saved settings to /Users/108752/Library/Application Support/MyData/MyData.cfg
2016-06-08 11:40:25,881 - init.pyc - 594 - ProvideSettingsValidationResults - MainThread - DEBUG - Traceback (most recent call last):
File "mydata/events/init.pyc", line 589, in ProvideSettingsValidationResults
File "wx/_windows.pyc", line 809, in EndModal
PyAssertionError: C++ assertion "Assert failure" failed at /BUILD/wxPython-src-2.9.5.0/src/osx/cocoa/dataview.mm(2806) in MacRender(): Text renderer cannot render value because of wrong value type; value type: PyObject

2016-06-08 11:40:26,145 - MyData.pyc - 1063 - OnSettings - MainThread - DEBUG - settingsDialog.ShowModal() returned wx.ID_OK
2016-06-08 11:40:26,152 - schedule.pyc - 31 - ApplySchedule - MainThread - DEBUG - Getting schedule type from settings dialog.
2016-06-08 11:40:26,159 - schedule.pyc - 42 - ApplySchedule - MainThread - DEBUG - Schedule type is Manually.
2016-06-08 11:40:26,166 - schedule.pyc - 45 - ApplySchedule - MainThread - DEBUG - Finished processing schedule type.

Large files with filenames containing '$' fail to upload when using SCP via Staging upload method.

When using its SCP via Staging upload method, MyData breaks up large files into chunks.
For a file named "file1.dat", MyData does the following:

  1. Extract chunk
  2. Upload chunk to staging area, naming it ".file1.dat.chunk"
  3. Append chunk to partial upload: ssh staging_host "cat .file1.dat.chunk >> "file1.dat""

This can fail for filenames containing '$' symbols, because the remote shell will try to evaluate the '$' in the string.

For small files, MyData simply SCPs the entire file in one step (without incremental progress updates or the ability to resume partial uploads). Recent testing shows that MyData's small file uploads are not affected by this issue.

Permission denied on version 0.70 of the client

Uploading multiple files via SSH on version 0.7 of client results permission denied error. Files start to upload but fail at seemingly random point.

screen shot 2017-12-01 at 3 09 53 pm

The error is:
scp: /research/mytardis/receiving/Richard_test-ssh9-50/0801/0800/SPECTRO/1AVE: Permission denied

Checking the permissions on the receiving share shows that the directory the client is trying to upload to has the following permissions:

drw-rwS---. 3 mydatauser mydatagrp 3 Dec 1 14:35 SPECTRO

PrivilegesRequired=lowest in Windows installer could prevent MyData from appearing in user's Start Menu

"PrivilegesRequired=lowest" was added to MyData's Windows InnoSetup installer to allow non-privileged users to install MyData on Windows (in a non-standard location). See: http://www.jrsoftware.org/ishelp/topic_setup_privilegesrequired.htm

However, for users who do want to install MyData in a standard location (C:\Program Files), running the installer with minimal permissions will fail, and explicitly running it as an administrator can result in MyData's shortcut icon appearing in the administrator's Start Menu instead of the current user's.

When running as an administrator, we could use:

[Icons]
Name: "{commonprograms}\{#MyDataAppName}"; Filename: "{app}\{#MyDataAppExeName}"

instead of:

[Icons]
Name: "{group}\{#MyDataAppName}"; Filename: "{app}\{#MyDataAppExeName}"

in https://github.com/mytardis/mydata/blob/develop/setup.py#L251

However, then we could break the installer for non-privileged users.

So the simplest way to avoid confusion for now seems to be to remove "PrivilegesRequired=lowest" and drop support for installing MyData on Windows without Admin privileges.

MyData's SingleInstanceChecker not cleaning up after itself on Mac OS X

MyData's failing to clean up its SingleInstanceChecker lock file on Mac OS X appears to have caused a failure-to-launch issue for Mac OS X users upgrading to Mac OS X El Capitan who have an existing ~/Library/Application Support/MyData/MyData lock file.

https://mytardis.slack.com/archives/mydata/p1447130581000013

See: http://osdir.com/ml/wxpython-users/2009-12/msg00363.html

"The real issue though is that you need to cleanup your single instance
checker when your app exits so this condition does not happen at all."

i.e

def OnInit:
    self.checker = wx.SingleInstanceChecker(...)

def OnExit:
    del self.checker

Uploads can fail to start if cache is unreadable

Uploads can fail to start if the verified files cache is unreadable.

"The uploads failed to start.
It just sat there saying scanning... in the bottom left corner, it never found the folder to upload."

2019-06-05 14:04:16,474 - models/settings/model.py - 283 - InitializeVerifiedDatafilesCache - ScanDataDirectoriesThread - WARNING - Traceback (most recent call last):
  File "mydata/models/settings/model.py", line 278, in InitializeVerifiedDatafilesCache
    self.verifiedDatafilesCache = pickle.load(cacheFile)
  File "/usr/local/python/2.7.12-gcc4/lib/python2.7/pickle.py", line 1384, in load
    return Unpickler(file).load()
  File "/usr/local/python/2.7.12-gcc4/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/local/python/2.7.12-gcc4/lib/python2.7/pickle.py", line 972, in load_string
    raise ValueError, "insecure string pickle"
ValueError: insecure string pickle

Should be able to save MyData.cfg somewhere accessible to all users.

It is common for facilities to use a shared login e.g. "imageuser" for all users, in which case MyData's configuration would currently be saved to somewhere like:

C:\Users\imageuser\AppData\Local\Monash University\MyData\MyData.cfg

However some facilities want each user to be able to log in with their individual account but be able to access the same MyData configuration as other users.

MyData should support saving its configuration to a central location like:

C:\ProgramData\Monash University\MyData\MyData.cfg

on Windows. The "C:\ProgramData" location can be determined using the appdirs Python module as appdirs.site_config_dir()

MyData's Windows setup wizard can ensure that all users have permission to access the "C:\ProgramData\Monash University\MyData\ folder.

Since we currently don't use a PKG installer on Mac OS X, just a drag-and-drop DMG, we would need to consider how to create the MyData config folder inside appdirs.site_config_dir() which for Mac OS X is:

/Library/Application Support

On Linux, appdirs.site_config_dir() is:

/etc/xdg

The only Linux distro we create packages for currently is RHEL / CentOS. We could add a postinstall scriptlet to the RPM spec to create /etc/xdg/MyData

MyData fails to upload data of a MyTardis user whose name is listed after a non-MyTardis user

From @iiman on November 10, 2015 3:11

Hi,

I will use a scenario to explain the issue.

Given:

  • DATA_DIRECTORY is a path to datasets
  • MyData is configured to have a folder structure of "Username/Dataset". This folder structure resides under $DATA_DIRECTORY
  • The users that are listed under $DATA_DIRECTORY are James, Martha, Peter and Richard.
  • James, Martha and Richard are MyTardis users but Peter is not.

Result:

  • MyData will upload datasets of James and Martha. But it will not upload Richard's datasets. This is because MyData stops uploading as soon as it finds a non-MyTardis user, which is Peter in this scenario.

Suggestion:
MyData should ignore non-MyTardis users and continue uploading datasets of MyTardis users. This is important especially if $DATA_DIRECTORY is a network drive that is accessed by both MyTardis and non-MyTardis users.

Cheers,

Copied from original issue: wettenhj/mytardis-app-mydata#3

Make cipher configurable for Paramiko SFTP

GCM ciphers are not available in Paramiko yet, see: paramiko/paramiko#982 and paramiko/paramiko#1394

Previously the configurable cipher string defaulted to [email protected],aes128-ctr, i.e. prefer the GCM cipher, and then fall back to aes128-ctr if the preferred GCM cipher is not available.

With the switch from SCP subrocesses to Paramiko SFTP, there's no point in listing [email protected] as the preferred cipher because Paramiko doesn't support it. So we should probably change the default cipher to aes128-ctr.

To make ciphers configurable, we can do something like this:

preferred_ciphers = "aes128-ctr,aes192-ctr" # This will be configurable

Then we can set the preferred ciphers on the transport before initiating the SFTP connection:

transport.get_security_options().ciphers = tuple(preferred_ciphers.split(","))
sftp = paramiko.SFTPClient.from_transport(transport)

The following ciphers are supported by Paramiko:

('aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'aes128-cbc', 'aes192-cbc', 'aes256-cbc', 'blowfish-cbc', '3des-cbc')

Allow use of self-signed certs for mytardis

There should be an option to allow mydata to connect to a mytardis that has ssl enabled but only uses self-signed certificates. Currently it fails with certificate_verify_failed.

Unicode characters in dataset folder names lead to unexpected behaviour

MyData was scanning dataset folders and came across some folders whose names ended with the Unicode character u'\uf028'.

As a result, urllib.quote(description) in mydata/models/dataset.py failed:

2015-11-05 14:38:28,947 - mydata.controllers.folders - 410 - StartDataUploads - StartDataUploadsThread - ERROR - Traceback (most recent call last):
  File "C:\Users\wettenhj\Desktop\git\mydata\master\build\MyData\out00-PYZ.pyz\mydata.controllers.folders", line 408, in StartDataUploads
  File "C:\Users\wettenhj\Desktop\git\mydata\master\build\MyData\out00-PYZ.pyz\mydata.models.dataset", line 130, in CreateDatasetIfNecessary
  File "C:\Users\wettenhj\Desktop\git\mydata\master\build\MyData\out00-PYZ.pyz\urllib", line 1282, in quote
KeyError: u'\uf028'

Then, MyData.exe somehow ended up using 100% of CPU.

In this case, it appears that the u'\uf028' Unicode character was added to the end of the folder names unintentionally, so it can just be removed before re-running MyData.

MyData doesn't necessarily need to support folder names including Unicode characters, but if we choose not to support Unicode folder names, then MyData should at least be able to handle this exception more gracefully.

Improve documentation for "Ignore datasets older than" filter

The "Ignore datasets older than" filter uses the dataset folder's Created date.

This is not sufficiently clear to users. We need to clarify the difference between Created and Modified dates and ensure that users understand that MyData will ignore new files added to old dataset folders when this filter is turned on.

See: http://mydata.readthedocs.io/en/latest/settings.html#filters
and https://github.com/mytardis/mydata/blob/develop/docs/source/settings.rst#filters

Add setting to accept self-signed certificates, for testing/pilot.

Our deployment process includes implementing three MyTardis servers: development (Dev), non-production (NPE), and production (Prod). MyData instances are deployed to instruments and tested through the Dev and NPE servers before Prod, however, only Prod servers get signed certifications. This makes deployment and testing messy.

Please, add a setting to override the following message when using self-signed certificates:

Please enter a valid MyTardis URL.
SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

The current work-around (to add the certificate into the cacert.pem file) is prone to error and too cumbersome when switching instruments between environments (Dev -> NPE -> Prod).

After account migration, MyData can still access the old inactive user account

See also: mytardis/mytardis#2301

MyTardis provides the ability to migrate an account (transfer ownership of data) from an old identity (with an outdated authentication mechanism) to a new identity (using OAuth2).

The MyTardis account migration renames the old account with a suffix indicating the old authentication mechanism, e.g. user1_ldap and the new OAuth account is expected to use an email address for the username, e.g. [email protected]. When MyData finds a user folder, e.g. a folder named using the email address, it tries to find the appropriate MyTardis user to grant access to, but currently the MyTardis API's /api/v1/user/ endpoint has no way to distinguish between active and inactive user accounts, so it appears possible that MyData could incorrectly grant access to the old (inactive) account instead of the new (migrated) account.

404 Client Error: NOT FOUND during progress updates

The following error has been observed in MyData v0.7.2. It is triggered by MyData attempting to update progress bars. A failure to update a progress bar shouldn't raise a critical error.

Traceback (most recent call last):
  File "/usr/local/python/2.7.14/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/local/python/2.7.14/lib/python2.7/threading.py", line 1073, in run
    self.function(*self.args, **self.kwargs)
  File "/home/ec2-user/mydata/mydata/utils/progress.py", line 41, in MonitorProgress
    uploadModel.dataFileId)
  File "/home/ec2-user/mydata/mydata/models/datafile.py", line 100, in GetDataFileFromId
    response.raise_for_status()
  File "/home/ec2-user/virtualenvs/mydata/lib/python2.7/site-packages/requests/models.py", line 909, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
HTTPError: 404 Client Error: NOT FOUND for url: https://store.erc.monash.edu.au/api/v1/mydata_dataset_file/8747542/?format=json

Duplicate uploader records when client SSH version changes

The default format for SSH public key fingerprints has changed in recent versions of OpenSSH.

In OpenSSH v6.7 and earlier, it used to look like this:

$ ssh-keygen -yl -f ~/.ssh/id_rsa
2048 6d:7b:94:49:1d:bf:e9:a9:ef:8f:e9:88:b5:25:6c:c0 comment (RSA)

In OpenSSH v6.8 and later, the default format looks like this:

$ ssh-keygen -yl -f ~/.ssh/id_rsa
2048 SHA256:Vza/1h5wx9quxfZSzsjaS7M7w6zGDZQOn4C2XKwWvdM comment (RSA)

But you can get something which looks similar to the old format as follows:

ssh-keygen -E md5 -yl -f ~/.ssh/id_rsa
2048 MD5:6d:7b:94:49:1d:bf:e9:a9:ef:8f:e9:88:b5:25:6c:c0 comment (RSA)

MyData already has some code to deal with this on Windows where we have more control over what SSH version is being used with MyData:

https://github.com/mytardis/mydata/blob/develop/mydata/utils/openssh.py#L250

However on Mac OS X, as people start to use new SSH versions with the new fingerprint format, we need to ensure that MyData can reconcile this new format with what is already on the MyTardis server.

Here's a description of the problem from MyData's perspective:

  1. MyData: I want to be able to upload data with SCP using my private key in ~/.ssh/MyData
  2. MyData: I need to check if uploads to MyTardis using that private key have already been approved, so let's generate the corresponding public key fingerprint (a summarized version of the public key) from the private key in ~/.ssh/MyData (using ssh-keygen -yl -f private_key) and query the MyTardis server to check whether our private key will be able to authenticate us for SCP uploads.
  3. If the client's SSH version has changed, changing the fingerprint format, then the MyTardis server (actually the "mydata" app within the MyTardis server) can respond with "No, uploads haven't been approved for that key, in fact they haven't even been requested yet".
  4. MyData: I need to create a new uploader registration request, because MyTardis told me that it doesn't have a valid uploader request for my SSH key.
  5. Then, when MyData creates a datafile record via the MyTardis API (actually via the mytardis-app-mydata's extensions to the API), the API tries to determine an appropriate storage box for the datafile object, based on the uploader registration request. (When the uploader request is approved, the MyTardis administrator assigns a storage box.) Currently, this code assumes that there is only one uploader registration request for each uploader, which can raise an exception if multiple requests have been created for the same uploader, due to multiple SSH fingerprint formats: https://github.com/wettenhj/mytardis-app-mydata/blob/master/api.py#L414

Here's an example of this exception in tardis.log:

[27/Oct/2015 09:44:38] WARNING api obj_create Traceback (most recent call last):
  File "/home/mytardis/mytardis/tardis/apps/mydata/api.py", line 414, in obj_create
    UploaderRegistrationRequest.objects.get(uploader=uploader)
  File "/home/mytardis/virtualenvs/mytardis/local/lib/python2.7/site-packages/django/db/models/manager.py", line 127, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/home/mytardis/virtualenvs/mytardis/local/lib/python2.7/site-packages/django/db/models/query.py", line 338, in get
    (self.model._meta.object_name, num)
MultipleObjectsReturned: get() returned more than one UploaderRegistrationRequest -- it returned 2!

As a result of the exception being raise above, the URI field of the DataFileObject is not set appropriately - it remains at its default value of None, and so when MyTardis attempts to upload via SSH/SCP, it doesn't have a valid remote path to upload to.

MyData's log could show a failed attempt to create a directory on the staging server, due to the missing URI in the DataFileObject:

2015-10-27 09:31:17,953 - openssh.pyc - 608 - UploadFileFromPosixSystem - UploadWorkerThread-2 - DEBUG - "/usr/bin/ssh" -i "/Users/james/.ssh/MyData" -c arcfour128 -oControlPath="/var/folders/fx/4r1f3hv56gsd5280xhfbb0lc0000gp/T/tmp80PQ2a" -oIdentitiesOnly=yes -oPasswordAuthentication=no -oStrictHostKeyChecking=no -l mydata mytardisdemo.erc.monash.edu.au "mkdir -p \"/mnt/MYTARDIS_STAGING\""
2015-10-27 09:31:18,908 - folders.pyc - 1441 - run - UploadWorkerThread-2 - DEBUG - Upload failed for datafile photo1.jpg in folder Photos
2015-10-27 09:31:18,957 - folders.pyc - 1467 - run - UploadWorkerThread-2 - DEBUG - Traceback (most recent call last):
  File "mydata/controllers/folders.pyc", line 1216, in run
  File "mydata/utils/openssh.pyc", line 551, in UploadFile
  File "mydata/utils/openssh.pyc", line 774, in UploadFileFromPosixSystem
SshException: bash: /mnt/MYTARDIS_STAGING/: Is a directory

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.