Feature Store Design Pattern

Write-up on the benefit of the feature store design pattern, an approach MLOps Engineers should be familiar with.

May 28, 2024

When building solutions for machine learning problems, the design pattern is a crucial aspect to be determined. Just like building a house, you need to be sure what architecture(s) is most practical, based on its intended use. The feature store is one of several design patterns you would consider when building the framework for a machine learning solution.

A feature store is a version-controlled repository for feature datasets that decouples feature engineering from feature usage.

If you were working within a small organisation running one or two pipelines, a feature store design is likely not necessary. However, in larger organisations, where there are multiple teams using the same datasets for different use cases e.g:

A pipeline that determines how long a tyre should last on track.
Another pipeline that determines the cost of tyres for a season.

Both use cases require the tyre type feature, though they could be used somewhat differently. To save time on pipeline development, you want teams to have a central location where they can pick transformed features they need instead of writing the same transformation function independently. You do not only save time by using a feature store, you also ensure consistency and reusability.

Remnants of Verstappen's tyres following the brake issues that caused one of his retirements - Source: Formula1

It is also important that the form of input features used to train your model is the same as what is is being used when serving the model. If ‘soft tyres’ was encoded as [0,1,0], you want it as so when making a prediction, and not ‘soft tyres`. Hence, using a feature store helps to ensure the transformations required for training and serving data identical, tracked and stored in one location. This is incredible useful in several machine learning use cases.

Programming Section

To give you a more rounded knowledge as an MLOps Engineer, I will be adding python leetcode questions and their solutions. This is aimed at upskilling your python skills and understanding of algorithms and data structures.

In this question, you are asked to find a ‘Peak Element` which is simply one that is greater than its neighbours.

Given a 0-indexed integer array ‘nums`, find a peak element and return its index. If the array contains multiple peaks, return the index to any of the peaks.

Solution:

from typing import List

class Solution:
   def findPeakElement(self, nums:List[int]):
       left, right = 0, len(nums) - 1
       
       while left < right:
           mid = left + (right - left) // 2

           if nums[mid] > nums[mid + 1]:
              right = mid
           else:
              left = mid + 1

       return left

Explanation

This solution uses a `binary search algorithm` that runs in O(log n) time where n is the number of elements in the array.

When trying to type, imagine doing so with just one hand, now compare that to doing the same with both hands. This is essentially the concept of the `binary search algorithm`. You are carrying out a task with two hands instead of one, hence instead of taking `n` time to find whichever element being looked for, it only takes O(log n) time at most.

The code initializes two pointers, left and right, representing the start and end indices of the array. We set a condition saying while `left` is less than `right`, some operation should be performed. This condition is set like so since within the operation we will always bring either the left pointer forward or the right pointer backward. When both meet, then we would stop the operation to avoid looking at indices that have been compared.

Within the `while` operation, we calculate the middle index and effectively reduce the search space by half in each iteration by checking if the element in that index is greater than whats to the right of it. In the case that it is, we will want to look at what is to the left of it and determine if that is not the case. So the function will keep moving left or right till it reaches a point where both pointers are equal; at that point the condition is met and you can either return the ‘left’ or ‘right’ pointer.

If the mid element is greater than what is next to the right of it, we will keep checking the left partition.

In the same way, if the mid element was less than whats next to the right of it, then we will keep checking the right partition.

I hope you have learnt a thing a two here. Feel free to leave comments or ask questions below, have a good week!

MLOPs Substack

Discussion about this post