Comments (14)
And I am not even sure that I have really "fixed" some of the bad links, as I
sometimes see colliding file names in different comments like this:
User comment 1: Something doesn't work, attaches config.txt
User comment 2: Workaround doesn't work either, attaches config.txt
Original comment by [email protected]
on 18 Mar 2015 at 12:57
from support-tools.
Thank you for the bug report. I will take a look.
Original comment by [email protected]
on 18 Mar 2015 at 3:21
- Changed state: Accepted
from support-tools.
Any update? We're looking to add an "Import from Google Code" feature to GitLab
and want to import attachments as well, but this issue is standing in our way.
Thank you.
Original comment by [email protected]
on 30 Mar 2015 at 9:16
from support-tools.
We have added a wiki page detailing how our issue attachment mirroring works.
You can get issues without the additional token if you go straight to Google
Cloud Storage.
See: https://code.google.com/p/support-tools/wiki/IssueMirror
However our issue mirroring does have some known issues (we won't fix). We only
mirror _public_ issues. So attachments for private (Restrict-View-*) issues
will not be available. Similarly, we don't mirror attachments for deleted
issues. And, in the occasional situation where you upload two attachments at
the same commit, both having the same name, we only have one of the files on
GCS.
As for your bad-unfixable.txt file, you have uncovered something afoot in
Google Code. For example:
Bad link:
https://storage.googleapis.com/google-code-attachments/tint2/issue-471/comment-3
4/fedora-2015-03-15T10-43-31-220064000Z.webm for issue 471, comment 35,
attachment {"mimetype": "application/octet-stream", "attachmentId":
"4710035000", "fileSize": 10148504, "fileName":
"fedora-2015-03-15T10-43-31-220064000Z.webm"}
The attachment file is actually found in our mirror, though as comment #31:
https://storage.googleapis.com/google-code-attachments/tint2/issue-471/comment-3
1/fedora-2015-03-15T10-43-31-220064000Z.webm
I have no explanation for the discrepancy, other than I probably wrote the
errant code.
Could you tell me more about how your "Import from Google Code" feature of
GitLab works? I take it GitLab supports arbitrary file attachments, so you need
to download the attachments at the time you do the import? If so, that's a
great feature. But note that there is a delay between when issue attachments
are uploaded to Google Code and when they are mirrored onto Google Cloud
Storage.
Original comment by [email protected]
on 31 Mar 2015 at 12:11
from support-tools.
The known restrictions to the IssueMirror are not a problem.
As for migrating public attachments, we have two options, since GitLab does
indeed support arbitrary file attachments:
1. Download all attachments from the IssueMirror and reupload to GitLab when a
user requests a project import.
2. Link directly to the attachment on the IssueMirror from the new GitLab issue.
Both options depend on the IssueMirror URLs actually working, which
[email protected] reports they aren't currently. The first option also
requires that we aren't rate limited or otherwise blocked by Google Cloud
Storage for downloading large numbers of files. The second option is only
viable if there is a guarantee that the IssueMirror will stay up indefinitely.
Downsides to the first option would be the storage and bandwidth requirements
on our side, and the mirror delay you mention (how large is that delay?) The
main downside to the second option would be the ongoing dependency on Google
Storage :)
We are ultimately fine with either option, but we need to be sure the
IssueMirror URLs work.
Original comment by [email protected]
on 31 Mar 2015 at 12:56
from support-tools.
I think you should go with option two, if for no other reason than its
simplicity. As for your concerns about how long the issue mirror will stay
around, it simply is a Google Cloud Storage bucket. So the overhead is
negligible. Read: if I get hit by a bus, the data will still be there. And
unlike other parts of Google Code, issue attachments aren't as problematic of
an abuse vector.
As for the delay in mirroring, at worst it will be a few days. Due to come
technical limitations in how to bridge security from our internal data centers
and the external Google Cloud Storage, I need to run the migration manually.
I will be looking at those `bad-unfixable.txt` attachments today, as it
certainly is a bug somewhere. I'll update this issue when I've hunted it down...
Original comment by [email protected]
on 1 Apr 2015 at 7:45
from support-tools.
All right, thank you Chris. I wasn't sure if this bucket was meant to be
permanent or if the plan was to let it go after some amount of time had passed,
but I guess there's really no point to since the amount of data is negligible
in the grander scheme of things.
Good luck with `bad-unfixable.txt`, I hope you figure it out.
Original comment by [email protected]
on 1 Apr 2015 at 9:43
from support-tools.
In case you were curious, the underlying issue has to do with deleted comments.
We were mirroring the issue as comment #31, but displaying the attachment on
the site as comment #35. That's because before we render the web page we go
through _all_ comments. (Including those that have been deleted[1].) Whereas
the issue mirror just goes through "live" comments.
This mismatch is why the attachment was put in the wrong place. There were four
deleted comments before the attachment was put up. So while Google Code says
you are looking at comment #35, in actuality is is only the 31st LIVE comment.
See the following comments, notice that the one after it isn't shown. For
example, there is a comment #10 and #12, but no #11.
https://code.google.com/p/tint2/issues/detail?id=471#c10
https://code.google.com/p/tint2/issues/detail?id=471#c19
https://code.google.com/p/tint2/issues/detail?id=471#c27
https://code.google.com/p/tint2/issues/detail?id=471#c30
Fixing it might be a pain, but at least we know what the problem is.
[1] Deleting data in large-scale replicated datastores is actually difficult.
So many times things get deleted by simply clearing a "LIVE" field, and
possibly zeroing out the data; but still leaving the placeholder object in
place. This way you don't also have to move lots of data around since you now
have an XX byte hole inside of a YY Gigabyte file.
Original comment by [email protected]
on 1 Apr 2015 at 10:01
from support-tools.
So I'm assuming you're planning to actually fix the data in the bucket? It
would be easiest to simply change the docs to say the ID is based on how many
live comments come before it, but I guess that will mess up the numbering when
a comment is deleted after an attachment is mirrored.
Original comment by [email protected]
on 1 Apr 2015 at 10:13
from support-tools.
Thanks for the explanation. I can confirm that with the attached script I do
not see any bad URLs in project tint2 :D
IIUC things may still go out of sync if someone deletes a comment after you run
the mirroring script but before the takeout is generated; and they will become
in sync again the next time you run the mirroring script?
Original comment by [email protected]
on 1 Apr 2015 at 10:49
Attachments:
from support-tools.
Looking at how we surface the data (HTML frontend, Google Takeout JSON dump,
and the GitHub exporter) it seems like the thing we need to fix is the issue
mirror's counting schema.
That, unfortunately might require some major changes because of how that system
works.
#Summary#
Google Code comments can be deleted. In the HTML frontend, we number comments
including these deleted comments. Similarly, in the Google Code takeout we
include deleted comments in the data dump.
The problem is that in the Issue Mirror we ignore deleted comments, so when we
render a link to issue X, comment Y. "Y" refers to the Yth LIVE comment. Not
the Yth comment overall.
As a workaround, you will need to filter out non-LIVE comments in from your
Google Takeout dump, as @mrovi9000 did in their script.
Similarly, [email protected], you will need to take note of the number of
comments you see when scraping Google Code output; and not look at the comment
numbers we display in the HTML.
Re: "things may still go out of sync if someone deletes a comment after you run
the mirroring script but before the takeout is generated"
Correct. When the mirroring process runs, it will upload an attachment to
something like .../tint2/issue-X/comment-10/... Now if a comment BEFORE the
attachment comment gets deleted, AND you then run Google Takeout to export your
project issues, THEN the comment numbers won't line up.
Re: "will they become in sync again the next time you run the mirroring script?"
No. Currently, the code that does the mirroring exists in a world that doesn't
know anything about deleted comments. So as far as it knows, the comment
numbers it generates are correct.
So it seems like getting the Issue Mirror to be aware of deleted comments will
fix all known problems, so that you can just link to comment X (where X
includes both LIVE and deleted issues).
I'll start looking into how we can fix this. Sorry for the inconvenience! Once
I get the Issue Mirror attachments to include counts from deleted issues, you
should just be able to generate links to Google Cloud Storage as you would
expect. (Either using the comment numbers from our HTML, or directly from the
Google Takeout JSON dump.)
Original comment by [email protected]
on 2 Apr 2015 at 12:01
- Added labels: Priority-High
- Removed labels: Priority-Medium
from support-tools.
Thanks for looking into this! We're planning to include Google Code import in
GitLab 7.10, due to be released April 22nd with code freeze around the 14th.
I'm hopeful this IssueMirror issue will be fixed before then.
As an aside, is there a reason why the attachment URLs aren't included in the
Takeout dump directly? That way the specific way of counting wouldn't have
mattered anyway, as long as the right JSON attachment object had the right URL.
Original comment by [email protected]
on 2 Apr 2015 at 8:03
from support-tools.
As of right now all attachments on Google Code should exist on Google Cloud
Storage with the right comment ID. (This impacted < 5% of all issues mirrored.)
If you see any problems please let me know.
I'll update the IssueMirror FAQ to clarify this, but the issue mirror now
exports issues to match the exact comment number that you see on the site. So
the attachment at comment #X should be on Google Cloud Storage with comment-X.
If any files are attached to an issue when it is initially reported, that is
considered "comment #0".
This also applies to Google Takeout JSON dumps. The "id" property of each
comment should correspond to the Google Cloud Storage bucket number.
Re: "is there a reason why the attachment URLs aren't included in the Takeout
dump directly?"
The issue mirror didn't exist at the time we wrote the Google Takeout support.
As for why not to add direct download links now, it's a known issue. I just
haven't been able to get to it yet.
Re: "We're planning to include Google Code import in GitLab 7.10, due to be
released April 22nd with code freeze around the 14th."
Sounds great. Please contact me ([email protected]) if there is anything I
can do for you. If you sent me a link or doc with how the system works, I'd be
happy to mention it in the project's wiki.
Original comment by [email protected]
on 2 Apr 2015 at 9:50
- Changed state: Fixed
from support-tools.
Thanks Chris, I'll let you know if we need anything else.
Original comment by [email protected]
on 3 Apr 2015 at 9:21
from support-tools.
Related Issues (20)
- Hello. I just diagnosed most of my ongoing issues HOT 1
- Problem with the Google Code Archive HOT 1
- putri andini HOT 1
- 403 thats an eror HOT 1
- <img src=x onerror=alert(1);> HOT 2
- can i takre to retrieve my email addresses HOT 1
- Can not access to ai challenge code base HOT 3
- Problem with the Google Code Archive HOT 1
- Enter one-line summary HOT 1
- being majorly hacked....please fix asap HOT 1
- No HOT 4
- Problem with the Google Code Archive HOT 1
- session attributes lost after node switch
- hacked google account
- outdated project causes confusion for live site HOT 1
- Wrong Syntax use for link tag
- Archived downloads with spaces (%20) result in 404 HOT 2
- Archived issues have broken formatting HOT 7
- Security alert HOT 1
- Security alert Your application has an unsafe implementation of the WebViewClient.onReceivedSslError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from support-tools.