GithubHelp home page GithubHelp logo

microsoft / service-fabric-appdrtool Goto Github PK

View Code? Open in Web Editor NEW
14.0 5.0 9.0 1.8 MB

The Service Fabric Application Disaster Recovery Tool is a disaster recovery tool for Service Fabric clusters which allows users to recover their primary cluster data in the event of a disaster. SFAppDRTool allows the user to backup their primary cluster on a secondary cluster via periodic backup-restore on secondary cluster of your backups on primary cluster.

License: MIT License

C# 66.80% PowerShell 3.95% HTML 16.38% CSS 0.55% JavaScript 12.32%

service-fabric-appdrtool's Introduction

Service Fabric Application Disaster Recovery Tool

The Service Fabric Application Disaster Recovery Tool is a disaster recovery tool for Service Fabric applications which allows users to recover data from primary cluster in the event of a disaster. Service Fabric application disaster recovery tool allows the user to backup application data from their primary cluster and periodically restore it on a secondary cluster via periodic backup-restore feature.

Getting Started

Ensure that you have setup your Service Fabric clusters and have backup restore service enabled for the primary and secondary clusters. Ensure that appropriate backup policy is applied to desired application on your primary cluster so as to satisfy RPO for your disaster recovery requirements.

Deploy Service Fabric Application Disaster Recovery Tool

You need to deploy Service Fabric application disaster recovery tool on a Service Fabric cluster. Note that the application can be deployed on any Service Fabric cluster, it is not mandatory to deploy it on primary or secondary cluster. For deploying the application you need to first generate application package, following steps describe how to generate & deploy application package:

  1. Clone this repo.
  2. Update the configuration as described in Configuration section.
  3. Build it using Visual Studio
  4. Then deploy the generated application package to target Service Fabric cluster.
  5. Ensure that port 8080 is opened up on the corresponding load balancer and mapped to 8080 port on backend pool.

Then after the application deployment completes successfully, open a web browser and locate to https://<cluster url>:8080 where you can find the application landing page.

Please see the USAGEGUIDE for a guide to use the tool.

Configuration

Backup Policy credentials are encrypted using x509 certificate. The thumbprint of the certificate to be used should be specified at PolicyStorageCertThumbprint in ApplicationParameters/Cloud.xml for cloud deployements and similarly in Local1Node.xml / Local5Node.xml for local deployments. Specify the thumbprint in 'HttpsCert' EndpointCertificate under the Certificates tag in ApplicationManifest.xml as well, to be used for HTTPS connection.

In case rollout of encryption certificate is required, then ensure that you rollout new certificate with previous certificate still installed on the machine.

Restore of data to application on secondary cluster is attempted periodically every 5 mins. This scans availability of new backup from primary cluster and if available then restores it on secondary cluster. The timespan can be changed in RestoreDataFrequencyPeriod in ApplicationParameters/Cloud.xml for cloud deployements and similarly in Local1Node.xml / Local5Node.xml for local deployments.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

service-fabric-appdrtool's People

Contributors

hrushib avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar vshan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

service-fabric-appdrtool's Issues

Web Interface Configuration Improvements

This is a very useful tool, but the Web Interface is quite difficult to use and almost scared me off the first time.

  1. Configuration errors result in a generic error message, while the actual error is logged in the server hosting the primary partition. This should be quite easy to surface out back to the UI, and will tremendously help in getting the configuration right.
  2. After a configuration error, hitting the back button does not preserve the input. I have to type in everything all over again.
  3. Screens jump a lot, giving the app an overall unpolished look.

I'll allocate some time to help fixing this, but want to get this issue out here in the open, in case someone else gets here first :)

Restore Service crashes and recovers often

I see the following logs in my production cluster. They appear to the harmless as the service recovers, but nonetheless concerning.

Application: RestoreService.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.NullReferenceException
   at RestoreService.RestoreService+<OnTimerTick>d__7.MoveNext()

Exception Info: System.AggregateException
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean)
   at System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken)
   at System.Threading.Tasks.Task.Wait()
   at RestoreService.RestoreService.TimerTickCallback(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()

and

Application: RestoreService.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Fabric.FabricNotPrimaryException
   at System.Fabric.Store.TStore`5[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].ThrowIfNotWritable(Int64)
   at System.Fabric.Store.TStore`5+<AddOrUpdateAsync>d__236[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
   at Microsoft.ServiceFabric.Data.Collections.DistributedDictionary`2+<AddOrUpdateAsync>d__97[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089],[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
   at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
   at RestoreService.RestoreService+<OnTimerTick>d__7.MoveNext()

Exception Info: System.AggregateException
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean)
   at System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken)
   at System.Threading.Tasks.Task.Wait()
   at RestoreService.RestoreService.TimerTickCallback(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.