Data Modeling is what makes data usable

Raw data is stored. Modeled data is understood.

The misconception

Most teams think of data modeling as a technical step.

Something done after collection.
Something handled by engineers.
Something optional.

The assumption:

If the data is in BigQuery, it’s ready to use.

That’s not what actually happens.

What data modeling actually is

Data modeling is the system layer that defines structure.

Not just moving data.

Defining how it behaves.

This includes:

  • shaping tables into consistent formats
  • defining relationships between entities
  • standardizing metrics and dimensions
  • enforcing reusable logic

This is where data becomes predictable.

Structure defines system behavior.
Modeling is where that structure is created.

Why raw data isn’t usable

Raw data reflects how systems collect information—not how it should be interpreted.

In systems like Google Analytics 4 exports:

  • data is nested
  • events are inconsistent
  • logic is implicit
  • relationships are undefined

This makes it difficult to:

  • write reliable queries
  • define consistent metrics
  • compare results over time

If you query raw data directly, you’re rebuilding logic every time.

Without defined context, every query becomes an interpretation.
And interpretation is where inconsistency begins.

This is where systems begin to drift.

For a deeper breakdown, see How GA4 BigQuery Export Changes Everything.

What modeling actually does

Modeling introduces structure into the system.

Not as a step—but as a foundation.

1. Defines entities

Instead of raw events, you work with:

  • users
  • sessions
  • transactions

These become stable units of analysis.

2. Standardizes logic

Metrics are defined once—not recreated in every query.

This ensures:

  • consistency across reports
  • alignment across teams
  • repeatable outputs

For where this logic should live, see Where Logic Belongs in a Data Estate.

3. Creates relationships

Modeling defines how data connects.

Without it:

  • joins are inconsistent
  • results conflict

With it:

  • relationships are predictable
  • queries become stable

4. Simplifies querying

Queries no longer reconstruct logic.

They express intent.

This is what allows systems to scale.

AI reduces the effort required to query data.
It does not reduce the responsibility of structuring it.

Modeling and AI

AI does not model your data.
It queries it.

AI does not understand your data.
It relies on how your system defines it.

If your data is not modeled:

  • structure must be inferred
  • queries vary unpredictably
  • outputs become inconsistent

This is why modeling is not optional.

It defines what AI is able to interpret.

For how this appears at the interface, see Conversational analytics.

Modeling vs pipelines

This is where confusion happens.

Pipelines move data.

Modeling defines it.

You can have:

  • perfect pipelines
  • automated ingestion

And still have unusable data.

Because:

movement is not structure

For a deeper breakdown, see Data Pipelines vs Data Systems.

Where modeling fails

Modeling doesn’t break.

It drifts.

1. Inconsistent definitions

Metrics change over time.

Different queries produce different answers.

2. Logic duplication

The same logic exists in multiple places.

Each version behaves differently.

3. Fragile queries

Small changes break outputs.

Because structure isn’t enforced.

4. Misaligned reporting

Dashboards disagree.

Not because tools are wrong—but because structure is missing.

If this is happening:

the issue isn’t your reports
it’s your model

For how this propagates, see Why AI Analytics Fails.

Where this fits in your system

Data modeling sits between:

  • data collection
  • and data interpretation

It is part of the processing layer of your data estate.

It does not:

  • collect data
  • define meaning

It provides the structure that makes both usable.

It connects directly to:

  • semantic layer (meaning)
  • data agents (execution)
  • conversational analytics (interface)

What this enables (when it works)

Modeling doesn’t add capability.
It enables reliability.

  • consistent metrics
  • reusable logic
  • stable querying
  • trustworthy outputs

This is what allows:

  • reporting to align
  • AI to function
  • decisions to reflect reality

The value of AI in analytics is not better answers.
It is faster access to answers—if the system is correct.

Connection to AI-ready data

AI-ready data is not possible without modeling.

Because AI depends on:

  • predictable schemas
  • consistent logic
  • defined structure

If modeling is missing:

AI is forced to interpret raw data

And that interpretation is inconsistent.

What to do next

If the same question produces different answers, the issue isn’t the query.

It’s the structure.

Modeling is what makes AI usable

See AI-ready data

Evaluate your system

See Evaluate

Final principle

Raw data reflects activity.

Modeled data reflects structure.

And without structure:

the same question will never return the same answer twice.

Doug McCaffrey
Designs and maintains analytics systems that remain reliable over time.

Explore how this connects across your data estate: