The current MLOps Engineers come from diverse domains. Some were formerly Data Engineers, I mean most actually; some were Data Scientists, Cloud Engineers and even Software Engineers. What does it take to be an MLOps engineer? If you ask this questions, you will get different answers almost all the time. Why is that you ask, it is plainly because MLOps can easily lean to any one side (cloud engineering, data engineering or software engineering) as long as what you are doing enables the smooth running of a machine learning pipeline in production. To make it even more variable, companies have different expectations of an MLOps Engineer (some frankly absurd especially for ‘juniors’). At the end of the day, it balls down to personal strengths and interests. What exactly is it you want to major on as an MLOps Engineer? You should ask that question before starting this career path. The answer will give you a way in and enable you to later branch out into other aspects of MLOps. Ideally there is no such thing as a “Junior MLOPs engineer” because from day one you are expected to understand, own and manage a broad range of things. In the same way, there is no such thing as a “Full Stack MLOPs engineer”, because the role itself is full stack.
Also I like to disband the notion of the MLOPs being new, I think we should all get mature now, and make the expectations and standards about the practice fairly straightforward. Without rambling further, lets go straight to it.
Software Engineering & Programming - One
This is one thing that most practicitioners lack, it is no shock that most “Data Scientists” , “ML engineers”, and anyone simply in the development of machine learning models, come largely from the research space. Most have backgrounds in science and analytical subjects like Physics and Mathematics. They probably began their ML career using tools like R, MATLAB, some got to use basic Python too. I myself have a Physics background and began my career as a Data Scientist for a Clinical Data Management company, managing clinical trial studies.
I am trying to point out that, whilst these group of poeple are great at building models, they often do not have traditional software engineering experience. And ideally this is a gap you are brought in to fill
Whilst it is nice to be able to build the fancy LLMs and Chat bots and all else that is the rave at the moment, having a proper understanding and maturing in the use of software engineering concepts is critical.
First things first, pick a languauge, dive deep into it, it could be JAVA, python, RUST etc. Understand the priniciples that guide the design of software using that language. Understand error handling, know how to write clean code, understand concurrency early, how to make your code faster, how to debug and test for bugs early. Do some leetcode questions, undertstand recursion, iteration, and their tradeoffs. This often forms the basis of a strong MLOps engineer.
Now go further to undertanding design patterns employed in the use of the language, what to look out for and what to leverage on.
It is largely about gaining the understanding of a ‘software’ in its raw form without any of the fancy model add-ons.
Hey 👀, also understand git and version control; github is a good place to start.
Cloud Engineering - Two
After you have gained a good command of software engineering, specialising in a language of choice, do not be surprised that the second most important step is gaining a strong understanding of a cloud platform. Most companies now leverage the cloud for their workflows and largely host their services on there. No company is going to ask you to set up and run their K8s cluster or ML model on your local machine (yes even if it is an M3 🤧). Pick a cloud service, GCP, Azure or AWS. I would not specifically recommend any yet; they all provide transferrable skills. Now most cloud platforms have SDKs and CDKs. An SDK helps you build software that can interact with services on the cloud (like S3 using boto3 for AWS) whilst a CDK helps to orchestrate the cloud services largely using the “cloud’s langauge”, think of it like buiding lego bricks with the available cloud services. Since you already have a good understanding of programming and software engineering, it will be easy for you to understand these kits (SDks and CDKs) and effectively use them to build services on the cloud. Sometimes there is not sufficient documentation for what you are trying to build and you will have to make use of your own intuition. If you understand how software works, this will be easy for you.
Moreso, you should know your way around the console, how to manage services, where things are, accessing logs, recovering files and so on. I began my cloud journey with AWS, did not do too much with it at the time, went on to learn Azure on my own whilst working at a startup and eventually got back to AWS in my current place of work. It was easy for me to get accustomed to AWS since I had a good understanding of a cloud service.
System Design and Communcation - Three
I will keep this short and sweet. Can you design a scalable diagram, a pictorial representation of a solution or workflow you propose to implement? If not you need to learn that quickly, this also ties in with communication. You ability to desgin and explain things using the right tools and with visual aids can make you stand out quickly. You need to understand design patterns, mircroservices and possible tradeoffs, and being able to explain why a solution is more suited than another. Excalidraw is one useful tool to leverage for visual aids.
Containerisation & Infrastructure-as-Code - Four
The last thing I will recommend for anyone trying to get on the MLOps path is getting an understanding or how containerised applications work, why they are useful and how to manage them. You can simply go on the docker website and get some training on building images and deploying them as containers. That will serve you when it is time to use more complex tools like kubernetes to manage large clusters of containerised applications.
IaC or Infrastructure-as-code as it is commonly called is useful in enabling you track your infrastructure built on the cloud. Tools like Jenkins and Terraform are good places to start. Try to spin up a couple services on the cloud using either of those, then build up from there. If you understand how git and version control works, the idea behind this should be easy to grasp. You want to ensure that you can track the state of “your Cloud”, and ensure it is consistent with what is expected.
I once asked one of my MLOps mentors how he was able to do so much stuff, and frankly his response was simply “doing so much stuff”. So I recommend you keep building stuff and with time you will naturally get better at eventually every area you aim and need to operate at.
Most importantly, yes most importantly, please, do not spin up resoures and leave them running whilst not in use, 👀!
Let me know your thoughts in the comment section below.
Thank You!