Build Internally vs Buy? A Fundamental Decision in Machine Learning Operations (MLOps)

The decision whether to build your own data science or machine learning platform, or pay for an existing solution, is not an easy one. It can be a large investment and will have long-lasting ramifications for your organization’s data science capabilities and agility. 

We hope by the end of this post you will see why buying an existing platform as a service can save your organization significant time, money and resources in the long-run. If you have data science or machine learning (ML) experiments you want to run, you probably already know that time is of the essence in today’s competitive industries - the sooner you can get started, the better. 

The prospect of building your own platform is a daunting one. For computationally intensive algorithms, you’ll want to be able to distribute your program onto multiple compute resources so it can operate in parallel for quicker turn-around times and faster time to production. This means setting up and deploying to a Cloud environment or your own on-premise compute cluster. A robust platform includes several key parts:

  • Hardware resources including the latest high-speed CPUs and GPUs that you will need to spin up and spin down as needed
  • Integration with popular data science and ML libraries / tools
  • Frontend - Web UI or CLI for uploading and running experiments
  • Monitoring and instrumentation so you can see what’s happening during and after your model runs
  • Version control and reproducibility - integration with GitHub for tracking and experiment reproducibility
  • API interfaces for retrieving and interacting with models
  • Maintenance of the compute cluster including updating software to maintain a secure and stable environment

When you consider all the requirements of building a complete machine learning platform, it’s easy to see that it can be a very time-consuming endeavor. Before building your own platform, one must consider all the costs involved: headcount (engineering salaries), resources (hardware / networking), time to market (amount of time your organization will be delayed waiting for the platform to be ready), and total cost of ownership (particularly maintenance and support).

Headcount Costs

The skills required to build this software are beyond that of most data scientists - you’ll likely need a team of software engineers. Building the platform and working out all the bugs and quirks could take many months. Considering the high cost of software engineering time, your organization might be better served having the engineering team focused on more critical needs rather than reinventing the wheel - ie, developing a machine learning platform that is already available as a service. 

Depending on the completeness of the platform you wish to build, it could easily take a team of 10+ full-time software engineers three months or more to build and test. The engineers will need to be experienced in many different parts of the technology stack - Cloud platforms, frontend, databases, systems / networking, as well as tooling, documentation, and testing.

Resource Costs

Running your own computational cluster, or paying a Cloud provider for compute instances and storage, has significant resource costs associated with it. Careful monitoring is needed to ensure you efficiently use these resources and don’t waste them. Within organizations it’s quite common for data scientists or engineers to spin up compute instances for a temporary experiment and forget to stop them or clean up after completion. This results in wasted resources and unnecessary costs. 

By using a well-designed machine learning platform you’ll have intelligent cluster management so that unused resources are monitored and shutdown. Machine learning and data science loads are often bursty - you may need a lot of resources for a short period of time. The platform can provide many compute resources simultaneously while handling the cleanup afterward automatically. Furthermore, it’s often possible to save on resource costs by using spot instances. By supporting multiple Cloud platforms and instance types, you can also optimize cost by choosing machine types best suited for your workload.

Time Costs

While it may not be a concrete cost, it’s important to consider the time it would take for your organization to spin up its own machine learning platform. If it takes three months or more (quite possibly a year), during that time you’re unable to efficiently execute experiments and are falling behind competitors who are already running and learning from their data science or machine learning programs. 

Having your own platform could be more of a hindrance than a competitive advantage - due to the wasted time maintaining and fixing the platform rather than innovating on the data science or machine learning results that you need. If what matters most is getting started quickly, then buying the services of an existing platform will make more sense.

Total Cost of Ownership

When considering the costs involved in setting up a machine learning platform it’s also important to consider the total cost of ownership. After the platform is implemented, it will still require maintenance and support to keep up with changing hardware and software, and operational bugs that emerge during use. Administration, training, and new feature requests also figure into the total cost of ownership. 

Spell - A Machine Learning Platform

Spell provides a ready-to-go platform for machine learning and data science in the Cloud or bare metal. Spell is built for team collaboration and makes it easy to deploy and scale projects to compute resources. It includes monitoring and metrics, and integration with GitHub for reproducibility. 

Spell workspaces allow you to collaborate with team members on shared Jupyter notebooks. Models are stored in a single shared file system, and made easily deployed as a REST API. 

If you choose to use Spell, it could dramatically help your team’s productivity - try out Spell by requesting a demo today at www.Spell.run/demo

Ready to Get Started?

Create an account in minutes or connect with our team to learn how Spell can accelerate your business.