Skip to main content

Modeling tags from the Responses API (and avoiding row explosion)

Guide on how to avoid duplicate response fields when using data from the Responses API

Updated this week

The Responses API returns several many-to-many fields (like tags, key_topics, citations, and competitors).

If you flatten these arrays the wrong way in your warehouse or BI tool, you can accidentally create one row per combination (for example, every combination of tag × competitor), which inflates counts and breaks your metrics.

This guide walks through how to model tags so you don’t end up with a row for every combination.


The core issue: many-to-many fields

Each response from the /responses API includes arrays like:

{
"id": 123,
"prompt_id": 456,
"tags": ["Evergreen", "Consideration"],
"competitors": [
{ "name": "Brand A", "present": true },
{ "name": "Brand B", "present": true }
],
"citations": [
{ "url": "https://example.com/one" },
{ "url": "https://example.com/two" }
],
"..."
}

If you explode/UNNEST multiple arrays at once, you end up with:

  • 2 tags × 2 competitors × 2 citations

  • = 8 rows for a single response

Any metric that just does COUNT(*) or SUM() on those rows will be inflated.


Principle #1: Only explode one array at a time

When you need a view “by tag”, only explode tags:

-- Good: one row per response × tag
WITH response_tags AS (
SELECT
r.id AS response_id,
t AS tag
FROM responses r
CROSS JOIN UNNEST(r.tags) AS t
)
SELECT
tag,
COUNT(DISTINCT response_id) AS responses
FROM response_tags
GROUP BY tag;

Avoid exploding tags and competitors in the same SELECT:

-- Avoid: this creates one row per (response × tag × competitor)
SELECT
r.id,
t AS tag,
c.name AS competitor_name
FROM responses r
CROSS JOIN UNNEST(r.tags) AS t
CROSS JOIN UNNEST(r.competitors) AS c;

You can still work with both, but do it in separate steps (see next principle).


Principle #2: Build separate “bridge” tables

A clean pattern is to keep one base fact table plus separate “bridge” tables for each many-to-many field:

  • responses (one row per response)

  • response_tags (one row per response × tag)

  • response_competitors (one row per response × competitor)

  • response_citations (one row per response × citation), if needed

Example modeling:

-- Base fact: one row per response
CREATE TABLE responses AS
SELECT
id,
created_at,
prompt_id,
platform,
brand_present,
brand_sentiment,
brand_position,
response_text
FROM raw_responses;

-- Bridge: tags
CREATE TABLE response_tags AS
SELECT
r.id AS response_id,
t AS tag
FROM raw_responses r
CROSS JOIN UNNEST(r.tags) AS t;

-- Bridge: competitors
CREATE TABLE response_competitors AS
SELECT
r.id AS response_id,
c.name AS competitor_name,
c.present,
c.position,
c.sentiment
FROM raw_responses r
CROSS JOIN UNNEST(r.competitors) AS c;

From there:

  • Use responses for most metrics (brand presence, sentiment, etc.)

  • Join to one bridge table at a time depending on the slice you need

Example: brand presence by tag:

WITH base AS (
SELECT
r.id,
r.brand_present
FROM responses r
),
by_tag AS (
SELECT
t.tag,
COUNT(DISTINCT b.id) AS responses,
AVG(CASE WHEN b.brand_present THEN 1 ELSE 0 END) AS brand_presence_pct
FROM base b
JOIN response_tags t
ON t.response_id = b.id
GROUP BY t.tag
)
SELECT *
FROM by_tag;

Notice we:

  • Aggregate at the response level first, then

  • Join tags, and

  • Use COUNT(DISTINCT) to avoid double counting.


Principle #3: Pre-aggregate before joining bridge tables

If you need a view like brand presence by tag and competitor, don’t explode both arrays at once.

Instead:

  1. Aggregate brand metrics at the response level

  2. Join to tags or competitors at the level you care about

  3. If you truly need “tag × competitor”, be explicit and use COUNT(DISTINCT response_id)

Example: brand presence by competitor (ignoring tags):

WITH base AS (
SELECT
id,
brand_present
FROM responses
),
by_competitor AS (
SELECT
c.competitor_name,
COUNT(DISTINCT b.id) AS responses,
AVG(CASE WHEN b.brand_present THEN 1 ELSE 0 END) AS brand_presence_pct
FROM base b
JOIN response_competitors c
ON c.response_id = b.id
GROUP BY c.competitor_name
)
SELECT *
FROM by_competitor;

If you absolutely need tag × competitor:

WITH base AS (
SELECT
id,
brand_present
FROM responses
),
by_tag AS (
SELECT
response_id,
tag
FROM response_tags
),
by_competitor AS (
SELECT
response_id,
competitor_name
FROM response_competitors
)
SELECT
t.tag,
c.competitor_name,
COUNT(DISTINCT b.id) AS responses,
AVG(CASE WHEN b.brand_present THEN 1 ELSE 0 END) AS brand_presence_pct
FROM base b
JOIN by_tag t
ON t.response_id = b.id
JOIN by_competitor c
ON c.response_id = b.id
GROUP BY
t.tag,
c.competitor_name;

The key is COUNT(DISTINCT b.id) rather than COUNT(*).


Principle #4: In BI tools, use distinct counts + one exploded field

In tools like Looker Studio, Power BI, or Tableau:

  • Build one data source that already follows the pattern above (base fact + bridge)

  • When creating charts:

    • Only use one exploded dimension (tags or competitors) in a given dataset

    • Use COUNT_DISTINCT(response_id) for counts instead of plain COUNT(*)

    • If you see metrics jump when you add a second dimension, you’re probably hitting the combination problem


Quick checklist

When working with /responses:

✅ Keep a base responses table (one row per response)

✅ Create 1 bridge table per array (response_tags, response_competitors, etc.)

✅ Use COUNT(DISTINCT response_id) when joining bridge tables

✅ In BI tools, only explode one many-to-many dimension per dataset

❌ Don’t explode tags, competitors, and citations in a single step

❌ Don’t sum metrics over the fully exploded combination grid

Did this answer your question?