GithubHelp home page GithubHelp logo

Comments (14)

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
And I am not even sure that I have really "fixed" some of the bad links, as I 
sometimes see colliding file names in different comments like this:

User comment 1: Something doesn't work, attaches config.txt
User comment 2: Workaround doesn't work either, attaches config.txt

Original comment by [email protected] on 18 Mar 2015 at 12:57

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
Thank you for the bug report. I will take a look.

Original comment by [email protected] on 18 Mar 2015 at 3:21

  • Changed state: Accepted

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
Any update? We're looking to add an "Import from Google Code" feature to GitLab 
and want to import attachments as well, but this issue is standing in our way.

Thank you.

Original comment by [email protected] on 30 Mar 2015 at 9:16

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
We have added a wiki page detailing how our issue attachment mirroring works. 
You can get issues without the additional token if you go straight to Google 
Cloud Storage.

See: https://code.google.com/p/support-tools/wiki/IssueMirror

However our issue mirroring does have some known issues (we won't fix). We only 
mirror _public_ issues. So attachments for private (Restrict-View-*) issues 
will not be available. Similarly, we don't mirror attachments for deleted 
issues. And, in the occasional situation where you upload two attachments at 
the same commit, both having the same name, we only have one of the files on 
GCS.

As for your bad-unfixable.txt file, you have uncovered something afoot in 
Google Code. For example:

Bad link: 
https://storage.googleapis.com/google-code-attachments/tint2/issue-471/comment-3
4/fedora-2015-03-15T10-43-31-220064000Z.webm for issue 471, comment 35, 
attachment {"mimetype": "application/octet-stream", "attachmentId": 
"4710035000", "fileSize": 10148504, "fileName": 
"fedora-2015-03-15T10-43-31-220064000Z.webm"}

The attachment file is actually found in our mirror, though as comment #31:
https://storage.googleapis.com/google-code-attachments/tint2/issue-471/comment-3
1/fedora-2015-03-15T10-43-31-220064000Z.webm

I have no explanation for the discrepancy, other than I probably wrote the 
errant code.

Could you tell me more about how your "Import from Google Code" feature of 
GitLab works? I take it GitLab supports arbitrary file attachments, so you need 
to download the attachments at the time you do the import? If so, that's a 
great feature. But note that there is a delay between when issue attachments 
are uploaded to Google Code and when they are mirrored onto Google Cloud 
Storage.

Original comment by [email protected] on 31 Mar 2015 at 12:11

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
The known restrictions to the IssueMirror are not a problem.

As for migrating public attachments, we have two options, since GitLab does 
indeed support arbitrary file attachments:

1. Download all attachments from the IssueMirror and reupload to GitLab when a 
user requests a project import. 
2. Link directly to the attachment on the IssueMirror from the new GitLab issue.

Both options depend on the IssueMirror URLs actually working, which 
[email protected] reports they aren't currently. The first option also 
requires that we aren't rate limited or otherwise blocked by Google Cloud 
Storage for downloading large numbers of files. The second option is only 
viable if there is a guarantee that the IssueMirror will stay up indefinitely. 

Downsides to the first option would be the storage and bandwidth requirements 
on our side, and the mirror delay you mention (how large is that delay?) The 
main downside to the second option would be the ongoing dependency on Google 
Storage :)

We are ultimately fine with either option, but we need to be sure the 
IssueMirror URLs work.

Original comment by [email protected] on 31 Mar 2015 at 12:56

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
I think you should go with option two, if for no other reason than its 
simplicity. As for your concerns about how long the issue mirror will stay 
around, it simply is a Google Cloud Storage bucket. So the overhead is 
negligible. Read: if I get hit by a bus, the data will still be there. And 
unlike other parts of Google Code, issue attachments aren't as problematic of 
an abuse vector.

As for the delay in mirroring, at worst it will be a few days. Due to come 
technical limitations in how to bridge security from our internal data centers 
and the external Google Cloud Storage, I need to run the migration manually.

I will be looking at those `bad-unfixable.txt` attachments today, as it 
certainly is a bug somewhere. I'll update this issue when I've hunted it down...

Original comment by [email protected] on 1 Apr 2015 at 7:45

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
All right, thank you Chris. I wasn't sure if this bucket was meant to be 
permanent or if the plan was to let it go after some amount of time had passed, 
but I guess there's really no point to since the amount of data is negligible 
in the grander scheme of things.

Good luck with `bad-unfixable.txt`, I hope you figure it out.

Original comment by [email protected] on 1 Apr 2015 at 9:43

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
In case you were curious, the underlying issue has to do with deleted comments.

We were mirroring the issue as comment #31, but displaying the attachment on 
the site as comment #35. That's because before we render the web page we go 
through _all_ comments. (Including those that have been deleted[1].) Whereas 
the issue mirror just goes through "live" comments.

This mismatch is why the attachment was put in the wrong place. There were four 
deleted comments before the attachment was put up. So while Google Code says 
you are looking at comment #35, in actuality is is only the 31st LIVE comment.


See the following comments, notice that the one after it isn't shown. For 
example, there is a comment #10 and #12, but no #11.

https://code.google.com/p/tint2/issues/detail?id=471#c10
https://code.google.com/p/tint2/issues/detail?id=471#c19
https://code.google.com/p/tint2/issues/detail?id=471#c27
https://code.google.com/p/tint2/issues/detail?id=471#c30

Fixing it might be a pain, but at least we know what the problem is.

[1] Deleting data in large-scale replicated datastores is actually difficult. 
So many times things get deleted by simply clearing a "LIVE" field, and 
possibly zeroing out the data; but still leaving the placeholder object in 
place. This way you don't also have to move lots of data around since you now 
have an XX byte hole inside of a YY Gigabyte file.

Original comment by [email protected] on 1 Apr 2015 at 10:01

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
So I'm assuming you're planning to actually fix the data in the bucket? It 
would be easiest to simply change the docs to say the ID is based on how many 
live comments come before it, but I guess that will mess up the numbering when 
a comment is deleted after an attachment is mirrored.

Original comment by [email protected] on 1 Apr 2015 at 10:13

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
Thanks for the explanation. I can confirm that with the attached script I do 
not see any bad URLs in project tint2 :D

IIUC things may still go out of sync if someone deletes a comment after you run 
the mirroring script but before the takeout is generated; and they will become 
in sync again the next time you run the mirroring script?

Original comment by [email protected] on 1 Apr 2015 at 10:49

Attachments:

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
Looking at how we surface the data (HTML frontend, Google Takeout JSON dump, 
and the GitHub exporter) it seems like the thing we need to fix is the issue 
mirror's counting schema.

That, unfortunately might require some major changes because of how that system 
works.

#Summary#

Google Code comments can be deleted. In the HTML frontend, we number comments 
including these deleted comments. Similarly, in the Google Code takeout we 
include deleted comments in the data dump.

The problem is that in the Issue Mirror we ignore deleted comments, so when we 
render a link to issue X, comment Y. "Y" refers to the Yth LIVE comment. Not 
the Yth comment overall.

As a workaround, you will need to filter out non-LIVE comments in from your 
Google Takeout dump, as @mrovi9000 did in their script.

Similarly, [email protected], you will need to take note of the number of 
comments you see when scraping Google Code output; and not look at the comment 
numbers we display in the HTML.

Re: "things may still go out of sync if someone deletes a comment after you run 
the mirroring script but before the takeout is generated"

Correct. When the mirroring process runs, it will upload an attachment to 
something like .../tint2/issue-X/comment-10/... Now if a comment BEFORE the 
attachment comment gets deleted, AND you then run Google Takeout to export your 
project issues, THEN the comment numbers won't line up.

Re: "will they become in sync again the next time you run the mirroring script?"

No. Currently, the code that does the mirroring exists in a world that doesn't 
know anything about deleted comments. So as far as it knows, the comment 
numbers it generates are correct.

So it seems like getting the Issue Mirror to be aware of deleted comments will 
fix all known problems, so that you can just link to comment X (where X 
includes both LIVE and deleted issues).

I'll start looking into how we can fix this. Sorry for the inconvenience! Once 
I get the Issue Mirror attachments to include counts from deleted issues, you 
should just be able to generate links to Google Cloud Storage as you would 
expect. (Either using the comment numbers from our HTML, or directly from the 
Google Takeout JSON dump.)

Original comment by [email protected] on 2 Apr 2015 at 12:01

  • Added labels: Priority-High
  • Removed labels: Priority-Medium

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
Thanks for looking into this! We're planning to include Google Code import in 
GitLab 7.10, due to be released April 22nd with code freeze around the 14th. 
I'm hopeful this IssueMirror issue will be fixed before then.

As an aside, is there a reason why the attachment URLs aren't included in the 
Takeout dump directly? That way the specific way of counting wouldn't have 
mattered anyway, as long as the right JSON attachment object had the right URL.

Original comment by [email protected] on 2 Apr 2015 at 8:03

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
As of right now all attachments on Google Code should exist on Google Cloud 
Storage with the right comment ID. (This impacted < 5% of all issues mirrored.) 
If you see any problems please let me know.

I'll update the IssueMirror FAQ to clarify this, but the issue mirror now 
exports issues to match the exact comment number that you see on the site. So 
the attachment at comment #X should be on Google Cloud Storage with comment-X.

If any files are attached to an issue when it is initially reported, that is 
considered "comment #0".

This also applies to Google Takeout JSON dumps. The "id" property of each 
comment should correspond to the Google Cloud Storage bucket number.

Re: "is there a reason why the attachment URLs aren't included in the Takeout 
dump directly?"

The issue mirror didn't exist at the time we wrote the Google Takeout support. 
As for why not to add direct download links now, it's a known issue. I just 
haven't been able to get to it yet.

Re: "We're planning to include Google Code import in GitLab 7.10, due to be 
released April 22nd with code freeze around the 14th."

Sounds great. Please contact me ([email protected]) if there is anything I 
can do for you. If you sent me a link or doc with how the system works, I'd be 
happy to mention it in the project's wiki.

Original comment by [email protected] on 2 Apr 2015 at 9:50

  • Changed state: Fixed

from support-tools.

GoogleCodeExporter avatar GoogleCodeExporter commented on September 25, 2024
Thanks Chris, I'll let you know if we need anything else.

Original comment by [email protected] on 3 Apr 2015 at 9:21

from support-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.