GithubHelp home page GithubHelp logo

dotnet / datalab Goto Github PK

View Code? Open in Web Editor NEW
197.0 70.0 24.0 164 KB

This repo is for experimentation and exploring new ideas involving ADO.NET, EF Core, and other areas related to .NET data.

License: MIT License

C# 100.00%

datalab's Introduction

.NET Data Lab

This repo is archived. See Woodstar experiment summary for more information.


This repo is for experimentation and exploring new ideas involving ADO.NET, EF Core, and other areas related to .NET data.

Current projects

SqlServer.Core (Project Woodstar)

Microsoft.Data.SqlClient is a fully-featured ADO.NET database provider for SQL Server. It supports a broad range of SQL Server features on both .NET Core and .NET Framework. However, it is also a large and old codebase with many complex interactions between its behaviors. This makes it difficult to investigate the potential gains that could be made using newer .NET Core features. Therefore, we are starting this experiment in collaboration with the community to determine what potential there is for a highly performant SQL Server driver for .NET.

Important! Investment in Microsoft.Data.SqlClient is not changing. It will continue to be the recommended way to connect to SQL Server and SQL Azure, both with and without EF Core. It will continue to support new SQL Server features as they are introduced.

License

This project is licensed under the MIT license.

.NET Foundation

This project is a part of the .NET Foundation.

Other .NET data projects on GitHub

If you're interested in making .NET data better, then consider contributing to one of the many open-source repos hosted on GitHub.

Microsoft repos

Community repos

Feel free to send a pull request to add your .NET data related GitHub repo to this list.

datalab's People

Contributors

ajcvickers avatar dennisseders avatar giorgi avatar j0rgeserran0 avatar phongnguyend avatar roji avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datalab's Issues

Hack simple query scenario to get upper bound for TDS/SQL Server

The idea here is to take the simple scenario implemented in #11 and hack together a raw implementation using low-level .NET constructors to hard-code some TDS. This gives us an upper bound as to how much perf is potentially on the table, at least in the very simple case.

Folder Structure

What should the folder structure look like for some rough starting place code? I am pretty busy but am hoping to carve out a little time to write a basic "connect, do a query, get a result" code but would like to avoid having to rename and refolder everything after I have made it.

I am proposing folder under the "datalab" folder of

datalab/SqlServer.Core

and then a solution file in that folder with the same name and then two sub folders under that

/src
/test

for now that is all I would really need, with "projects" under the src mostly

guidance on an Apache Arrow Layer

Hi

I'm working off a comment by yzorg regarding integration of Apache Arrow into this project, and the answer was that this would be implemented at a higher layer than driver level.

I was after some pointers really,. my aim here is to intercept the ODBC driver's storage mechanism, and store it to "feather" format instead (a SIMd memory-optimized file format).

can you recommend:

  • materials to skill up on the current ODBC driver
  • a good place to try to insert what i'm doing in the current ODBC driver
  • other example projects that do a similar thing..?

Thanks!

Woodstar experiment summary

First, thank you for being so patient waiting for news here; we probably should have provided more timely updated on the project state. This was partially a result of us simply not knowing what's going to happen with Woodstar and SqlClient (e.g. the recently-started SqlClientX - see below), and partially a result of us simply being very overloaded with other things.

Woodstar's original idea was an exploratory, greenfield SQL Server (TDS) driver; the goal was to use modern, high-performance .NET techniques, liberated from SqlClient's technical debt, and to see where that would lead in terms of performance. Specifically, we were interested in seeing what kind of performance gains we would see on the TechEmpower Fortunes benchmark, compared to SqlClient. There was no clear future for Woodstar as an actually supported product that's usable in production - it was purely a technical experiment.

The main work actually done was initial experimentation/prototyping by @NinoFloris (a core contributor on Npgsql) and myself; we built a minimal TDS client that could support TechEmpower Fortunes, and nothing more; for example, parameters were not yet supported, as well as many other features. The experiment was async-only, did not implement ADO.NET, and used System.IO.Pipelines for I/O. The prototype source code is available on this repo, as-is: it really is just an exploratory prototype, nothing more.

For the very simple TechEmpower Fortunes scenario, the prototype did not provide meaningful performance improvements over SqlClient. This does not mean that SqlClient has no performance issues: it certainly does (see this discussion) - just not in the very narrow TechEmpower Fortunes usage scenario. Our exploration did yield some valuable conclusions; two important ones are the following:

  • We gained some interesting insights around TDS and its processing that impact both the client and the server side, and so we engaged internally with the SQL Server org. This has been a positive engagement and various things are happening behind the scenes.
  • System.IO.Pipelines (with SequenceReader) work great when parsing relatively large payloads; but in a client-side database driver scenario, the user repeatedly calls in to parse very small values in the resultset (e.g. an int). In that kind of usage, reinstantiating a SequenceReader (ref struct) each time is too much overhead. Similarly, continuously slicing ReadOnlySequence for each tiny was costly, so there was no good way for us to cheaply store the current position.

Further work on Woodstar did not continue, simply because we had other, more important things that got prioritized over this. However, the lessons learned from the experiment were quite valuable, shared with relevant parties internally, and are present in discussions with the SqlClient team.

On the SqlClient side, the SqlClientX effort has recently begun - this is a project to reimplement the I/O layer and pooling implementation inside SqlClient, allowing users to opt into the new experimental implementation and eventually switching to it as the default. In a way, SqlClientX is the spiritual successor to Woodstar; although SqlClientX it's not a greenfield new driver since it's being done within SqlClient and must respect backwards compat, the goals of the two projects are the same - arrive at a modern, efficient SQL Server driver without all the technical debt, and which is able to evolve safely and quickly. The future of SqlClientX is also much clearer, being owned and maintained by the SqlClient itself, whereas Woodstar was purely an experiment with no clear path to becoming a supported product at any point.

For now, we will be archiving this GitHub repo, as work in this area is not happening here.

Use pipelines?

There has been some discussion about whether use of data pipelines is the way to go for highly async, highly performant binary communication such as Tabular Data Stream (TDS) to and from SQL Server.

@davidfowl Thoughts? (I believe you discussed this with @roji already.)

@JamesNK We were wondering what gRPC uses?

Another Tds client

4 years ago I did a almost complete rewrite of the sqlClient for fun and to do some performance test. The bulk insert was fast but for small queries the improvement was small. Most of the time was spend on the sql server and communication.
Async was wrong implemented and probably much more but a lot could be simplified.

the Repo can be found here.

The old repro was based on a data reader and writer. I skipped that part and did the reading and writing directly to a poco. I think with EF core this should also possible. There is some mapping between sql fields and poco properties and there are some value convertors.

Directly converting json strings from the input stream will skip the copy part.

Progress?

It was mentioned on an EF livestream that progress was being made but the only commit is from 2021. Is there code being worked on? If so can we see it in case we can help out?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.