layout | title | parent |
---|---|---|
default |
Feathr FAQ |
Feathr Concepts |
This page addresses the most frequently asked questions we receive from end users.
A feature store is typically needed when you have entities/keys (e.g. you are trying to model a user's behavior, an account behavior, for a specific item, etc.). However, it's likely not necessary for tasks like regular image recognition.
In Feathr, each feature is typically associated with a specific key. Keys are also referred to as Entities
in many other feature stores, signifying that a feature is connected to a specific entity. For instance, if you are creating a recommendation system, you may have f_item_sales_1_week
for item sales, and f_user_location
for historical user purchasing data. In this case, f_item_sales_1_week
should be linked to item_id
, and f_user_location
should be associated with user_id
. This is because each feature represents something specific to a particular Entity
(item or user, in this case).
When querying features, you'll need to specify the key if you're seeking item-related features, such as f_item_sales_1_week
. The exception to this rule is for features defined in INPUT_CONTEXT
, which generally do not require keys as they're directly computed from observation data and aren't typically reused.
What is the role of Feature Anchors in Purview? Are we actually generating feature values to display as a view?
Feature Anchors
can be likened to "views" in standard SQL terms. They don't store the feature value; instead, they are a collection of features with individual Features
being columns in the view. Hence, when grouping features together, it's important to note that they are essentially feature groupings.
key_column
is a map to the source table, while full_name
is used for reference.
Yes, although it may lead to errors if the source table splits.
DerivedFeatures are features calculated from other features. They could be computed from anchored features or other derived features.
This function joins observation data with a feature list.
FeatureAnchor is implicitly used. The method uses anchors and features created using build_features
.
Yes, multiple queries may be initiated simultaneously.
This function registers features that were part of built features in client.build()
or features from configuration files.
Only for online data.
Yes, the online backfill API is specifically for the latest feature.
Yes, this is referred to as a "Feathr Anchor".
Is it possible to modify a feature retrieved from the Registry (Purview) and update it in the registry using Feathr Client?
This would involve using the get_features_from_registry
function of FeathrClient for a specific project name, viewing the feature code, and modifying the feature.
This becomes important when using features from the registry in the consumption flow, as the user must have access to all source data files before the feature can be utilized. This can be challenging, particularly in our data lake and DDS setup.
Currently, it seems that only one function can be passed, but this is subject to change based on specific requirements.
What does the error message "java.lang.RuntimeException: The 0th field 'key0' of input row cannot be null" mean?
This error message signifies that some rows in the input data lack a key. Users should add a filter to the source to exclude these rows as they cannot be utilized.
Keys do not have to be unique for input data, but they cannot be null. The output dataset is a key-value map with the key specified, and the value is the combination of all requested features for each key. This process can be viewed as grouping or bucketing.
If I'm going to materialize the feature data into offline storage, how can I read that through the Feathr API?
The offline store is simply a standard table. Currently, the offline store consists of parquet/avro files on hdfs. Users can read the table in offline storage without the Feathr API.