Features and labels in Featureform

Define entities, features, labels, and aggregate windows in Featureform.

Features and labels are the core semantic objects in Featureform. They describe what you want to predict, how feature values are keyed, and how those values should be computed over time.

Entities

An entity represents the business object that features belong to, such as a user, merchant, account, device, or product.

@ff.entity
class User:
    pass

Attach features and labels to the entity class so Featureform can reason about keys and lineage.

Define features

Use the builder-style API to define features from registered datasets or transformations:

@ff.entity
class User:
    avg_transaction_amount = (
        ff.Feature()
        .from_dataset(
            user_transaction_features,
            entity="user_id",
            values="avg_transaction_amount",
            timestamp="latest_transaction",
        )
    )

Aggregate features

Aggregate features create time-windowed feature values from event data:

from datetime import timedelta

@ff.entity
class User:
    transaction_count = (
        ff.Feature()
        .from_dataset(
            transactions,
            entity="user_id",
            values="amount",
            timestamp="timestamp",
        )
        .aggregate(
            function=ff.AggregateFunction.COUNT,
            windows=[timedelta(days=7), timedelta(days=30)],
        )
    )

Access a specific window by indexing the feature with the matching timedelta:

count_7d = User.transaction_count[timedelta(days=7)]

Define labels

Labels represent the target you want to predict:

@ff.entity
class User:
    is_fraud = (
        ff.Label()
        .from_dataset(transactions)
        .value("is_fraud")
        .timestamp("timestamp")
        .entity("user", "user_id")
        .description("Binary fraud classification label")
    )

Point-in-time correctness

Timestamps are central to correct ML training and serving workflows. When you include the relevant event timestamp in your feature and label definitions, Featureform can align historical examples correctly instead of leaking future information into the training data.

Use timestamps consistently when:

  • defining event-derived features
  • defining labels with historical outcomes
  • creating aggregate windows
  • building training sets from time-varying data

Best practices

  • Model entities after real business keys that your applications already use.
  • Prefer the builder-style feature API for new work.
  • Add timestamps wherever feature values change over time.
  • Keep feature names stable and descriptive so they can be reused across training sets and feature views.
RATE THIS PAGE
Back to top ↑