Is it possible to put the entire BIN list into GitHub along with the user contributions? about data HOT 19 CLOSED

binlist commented on June 12, 2024

Is it possible to put the entire BIN list into GitHub along with the user contributions?

from data.

Comments (19)

joscarsson commented on June 12, 2024

from data.

joejuzl commented on June 12, 2024

from data.

packrat386 commented on June 12, 2024

from data.

asturur commented on June 12, 2024

+1
I was checking the binlist.net website since long.
No ads, and no problems runnging the web service.

I still need to put those data in a file for offline checking, so i dumped it 10.000 rows at time but now i struggle with updates ( filling holes is ok, updating wrong records less easy ).

I'm actually asking myself which is the purpouse of reading online consuming bandwith and your cpu power if the data is public and free for all.

thanks for the project anyway, i searchead something like that for long.

from data.

abitrolly commented on June 12, 2024

which is the purpouse of reading online consuming bandwith and your cpu power if the data is public and free for all

Tracking and analysis. If you put it on GitHub, then only GitHub can do this.

from data.

asturur commented on June 12, 2024

For sure, but i m reading it offline and polluting the statistics with batch sequential reading that someone has to discard later.

from data.

keithjjones commented on June 12, 2024

What types of "tracking and analysis" is occurring? I think it is perfectly reasonable to place your data online when you depend on open source source data as input. Also, there are many scenarios where you cannot be online in order to run a query, or the latency for a web connection is intolerable. An offline request, like the MaxMind database, makes sense.

from data.

keithjjones commented on June 12, 2024

@asturur if you put your results on GitHub, I'm sure that will help others too.

from data.

asturur commented on June 12, 2024

with permission of maintainers here i can put a mysql dump of what i dumped almost 1 year ago.

from data.

keithjjones commented on June 12, 2024

Of course. I think based upon the project's license (http://www.apache.org/licenses/LICENSE-2.0) you have some restrictions (such as denoting where you got the data) when you choose to post the data. It's obvious we are not the only ones wanting offline access to the data based upon the +1's on this thread. If the data is open source, that would make the most sense so we could write more open source tools to access the data.

from data.

abitrolly commented on June 12, 2024

What types of "tracking and analysis" is occurring?

I don't know what exactly is measured, but with online service you can measure who is using data from where - age, sex, language, location, which numbers/bank are more requested, bots and services etc.

from data.

tjconcept commented on June 12, 2024

Unfortunately the data is not open source thus I cannot dump it here. This service is the best way of opening up this data on a free basis.

The Apache license covers all user contributions, but not the rest of the data. You can do lookups and present the data (credit appreciated - and do tell us about your use case with an issue, so we can get a "seen at" section up and running!), but please don't scrape or redistribute as it remains copyrighted.

I think it is perfectly reasonable to place your data online when you depend on open source source data as input

That would allow people to look up numbers without stressing the website or needing an internet connection.

I'm actually asking myself which is the purpouse of reading online consuming bandwith and your cpu power if the data is public and free for all.

The database currently has > 20 million entries, is several GBs large and is regularly updated.

The current web service is only exposing a partial view but a newer replacement will soon expand on that. Keeping an offline copy up-to-date and performant could quickly eliminate the benefit of "offline" and be a huge burden.

I honestly believe a web service is the better approach.

What types of "tracking and analysis" is occurring?

Basic information: request headers, IP and query parameters. The data is used to give a sense of usage patterns and prevent abuse.

from data.

keithjjones commented on June 12, 2024

I think you prematurely closed this issue without discussion. Where do you get the data that you state it has a closed license? If you have a restrictive license with the data, shouldn't you state it, and where it comes from? That is required by the Apache license. The moment you take Apache licensed user data and add it to your data you apply the Apache license standards to your derivative work as well, according to the Apache license. Therefore, all of your data must adhere to the Apache license standards.

Why can't you use a license like Maxmind uses for their IP database? Any tool written will have your information in it like theirs does. It's a win-win for all parties. That makes much more sense than making someone go to your website to get the information, which is in JSON format and because of that format you do not have them see any other content like ads when the data is pulled. I don't see any benefit for you, while it severely restricts users.

How is it possible to have >20 million entries for 6 possible numbers? That is only a million numbers. What other information do you have to make up 19 million records?

You may think a web service is a better approach, but you cannot lookup information offline when you are not on the internet or with high speed. Both scenarios are required by my use cases 90% of the time. Your use cases may be different then the rest of us. For example, a number of credit card searching tools, and I contribute to them, hit the same BIN over and over and look it up to see if it could be valid in order to alert the user of a possible credit card leak. It is unrealistic to keep a list for all of the lookups when you are searching 500GB of data, or between sets of 500GB of data in parallel. On one computer, I have had 5 million lookups. That does not scale for a web based service. You become an exponential bottleneck for that application at that point. In that scenario it would query your web service many many times, unnecessarily, because the information cannot be saved globally, for the same information. And in that scenario you are not feeding the user ads, so you do not personally benefit from their access. Performance would be much better looking it up off of an offline file. You could have 1000's of lookups per second offline while the web service may have, what, 1 lookup per 5 seconds? Plus, there are no throttle issues, like there are with the website. Therefore, your assessment of "eliminate the benefit of "offline" and be a huge burden" is not correct in all scenarios. It is one simple database dump for you which would literally take a few minutes per month or so, and could be entirely scripted. I'll even help you write the script. It is one simple download off of Github or some other source and upload into a database for others. That is not a burden at all. You can implement many other types of "tracking and analysis", if you really need that information for some legitimate purpose, using many other technologies than a web server. I am happy to discuss other mechanisms with you.

I'm happy to give you a "seen as" section for the tools I have written if I can get the data into some more efficient means. Right now, it is unusable for 99% of my use cases. It is almost there, but the web site requirement makes it useless for my tools. If I added your information to my tools and provided them to the public, it would literally cripple your whole website. I don't want to cripple your website.

Please respond here so I do not have to open a new issue for this discussion.

from data.

asturur commented on June 12, 2024

I do not agree with @keittjones.
The data is here, user contributions are available. If he has an agreement for which he cannot dump the data, that is it.

Anyway some bins are 10digits ( china union pay for example) so there may be more record than 1milion. And 99% of cards are in the 3 - 6 range, so way less than one million.

To my memory this is the only free, big, bin list i can remember. It is available for free i would not mind about a full dump you can still do by yourself.

from data.

keithjjones commented on June 12, 2024

@asturur your response did not make sense. What is your point? Specifically:

What does this mean: "If he has an agreement for which he cannot dump the data, that is it."?

What does this mean: "It is available for free i would not mind about a full dump you can still do by yourself."?

from data.

asturur commented on June 12, 2024

That i do not think he has to discuss what kind of source he uses, and what kind of license forbid him to put an offline database for users.

Any official bintable i found in my experience was always coupled with a visa or mastercard notice that forbids to redistribuite it.

If they made this webservice available, the data is effectively available. If you need to consult it offline, just dump it record by record as i did.

from data.

keithjjones commented on June 12, 2024

@asturur you may not think he has to, but the very license he chose says he has to. By combining user data licensed with the Apache license, his whole data set is subject to the Apache license as a derivative work. Please consult the Apache license:

"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship. For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.
...
Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions. Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
...
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.

The license says that if a separate license is not stated, then everything is the Apache license by default. If a license is stated, it has to abide by the Apache license.

from data.

tjconcept commented on June 12, 2024

Your use cases may be different then the rest of us.

That could be true. This service deliberately has a narrow focus to stay simple, and that is where our energy is concentrated.

I'm sorry we do not fit your use case. I know there are sites where you can buy datasets and it sounds like those would be the perfect tool for you.

I think you prematurely closed this issue without discussion.

I'm sorry you feel that way. We cannot post the data and the only alternative is no service.

How is it possible to have >20 million entries for 6 possible numbers? That is only a million numbers. What other information do you have to make up 19 million records?

We currently have ~26 mio rows with IINs anywhere from 1 to 11 numbers and 01 not being the same as 1 it gives us 10^11 + 10^10 + ... combinations, some appearing multiple times due to multiple sources.

That is why I still believe a web service is superior in most use cases, but that really doesn't matter to the topic of the issue.

By combining user data licensed with the Apache license, his whole data set is subject to the Apache license as a derivative work. Please consult the Apache license

The web service is merely a switch between data sources. It searches this repository (which is open source) as well as other sources.

It simply seemed like a nice addition to add a user-source to the search engine, and of course we don't want ownership over that information.

Have a great weekend everyone!

from data.

keithjjones commented on June 12, 2024

@tjconcept so if I understand you correctly, you get your data from somewhere. You aren't willing to tell us where. And there is some kind of restriction that won't allow you to provide the data outside of your control. However, we can use the data all we want through the web interface without restriction. Is that correct?

And you also claim none of the user data is mingled with any other data, correct? Do you have a separate license for your "other sources"?

Web services are only good if a) you have unlimited bandwidth, or b) if the data you query is a small subset of the large data set. Neither are the case, for most realistic uses of this data.

Commercial BIN lists are not affordable for free and open source tools.

It's too bad. I would have liked to include your data in the tools I wrote with proper attribution like Maxmind does. When and if I release any of my tools publicly it would overwhelm your site and the lookup performance would be so bad no one would use the tools. No one wins.

from data.

Is it possible to put the entire BIN list into GitHub along with the user contributions? about data HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Jobs