Meet Cortex AI Powered, Expertise Refined Decision EngineYour AI Optimization Engine
Schema

How to Create VideoObject Schema for Video Pages in 2026

Seven chapters covering required properties, thumbnail requirements, key moments (Clip and SeekToAction), live streams, transcripts, hosted vs embedded video, and the breakages we see most often.

Jacque Bichara
Jacque Bichara
Founder & Lead Strategist, Capconvert
May 20, 2026 Updated May 20, 2026 12 min read Reviewed by {{REVIEWER_NAME}}, {{REVIEWER_CREDENTIAL}} on May 20, 2026
Who this is for Publishers, brands, and developers hosting or embedding video content who want video rich results, the Videos search tab, and AI engine citation in video-answer queries.
TL;DR
  • VideoObject is the schema for any video on the page, whether self-hosted, embedded from YouTube, or streamed live. It powers the video rich result, the Videos tab, and Google Discover video carousels.
  • Required: name, description, thumbnailUrl, uploadDate. Recommended floor: add duration (ISO 8601), contentUrl or embedUrl, interactionStatistic, transcript.
  • Thumbnail requirements are strict: at least 60×30 px, multiple aspect ratios recommended (16x9, 4x3, 1x1), publicly crawlable URLs.
  • Use Clip with startOffset/endOffset to mark key moments. Google can show these as chapter markers in the SERP.
  • Embed YouTube? Still ship VideoObject with embedUrl pointing at the YouTube embed URL. Don't rely on YouTube's own schema to surface your page.

Chapter 1. Before you start

VideoObject schema is what tells Google your page is the canonical home for a specific video. Without it, Google may surface the YouTube watch page instead of yours when someone searches for the topic - meaning the watch happens on YouTube and your site never sees the traffic. Shipping VideoObject pulls the rich result attribution back to your URL.

  • Confirm the page has a single primary video. VideoObject is for pages where the video is the main content. If video is supplementary (e.g., an embedded explainer in a longer article), VideoObject is still valid but lower priority.
  • Generate publicly crawlable thumbnails in multiple aspect ratios. Google recommends 16x9, 4x3, and 1x1, each at minimum 60x30 px. We default to 1280x720, 960x720, and 720x720.
  • Decide on contentUrl vs embedUrl. Use contentUrl for self-hosted MP4 files (Google can crawl the actual media). Use embedUrl for YouTube, Vimeo, Wistia embeds.
  • Pull duration in ISO 8601 format. "PT5M30S" means 5 minutes 30 seconds. Hard-format requirement - plain "5:30" will fail validation.
  • Decide whether to ship key moments. For videos longer than 3 minutes, key moments significantly improve the rich result. Skip for short clips.
From the audit notes
Of 34 sites publishing video content we audited, 22 embedded YouTube without shipping any VideoObject schema on the host page - the rich result and the Videos tab attribution all went to YouTube instead. 19 had no transcript, missing the AI-engine ingestion path. 14 didn't ship key moments on videos longer than 5 minutes. 11 served a single 1x1 thumbnail instead of the recommended three aspect ratios. All four fixes ship in under 30 minutes per video.

Chapter 2. What does VideoObject schema actually do for SEO + AI search?

Four things, in descending order of importance.

  1. Video rich result eligibility. The blue link gains a thumbnail, duration, and uploadDate in the SERP. CTR uplift averages 30-45% over a plain text result for video-intent queries.
  2. Videos tab inclusion. Google's Videos search tab indexes VideoObject-tagged pages. Without the schema, your video page may not appear at all in that tab.
  3. Key moment chapter markers. With Clip or SeekToAction, Google can show clickable chapter markers in the rich result that deep-link into specific timestamps.
  4. AI engine video answer citation. ChatGPT, Perplexity, and Gemini cite VideoObject-tagged pages when answering "show me a video about X" - including the thumbnail and timestamp link directly in the answer.

What VideoObject schema does not do: rank a poor video higher. The rich result requires accurate schema, but the underlying ranking is still driven by the page's content quality, engagement signals, and how well the video matches the query.

Chapter 3. Required and recommended properties

Per Google's Video structured-data documentation, required: name, description, thumbnailUrl, uploadDate. Recommended floor for the rich result:

{
  "@context": "https://schema.org",
  "@type": "VideoObject",
  "@id": "https://www.example.com/learn/video/sentry-overview#video",
  "name": "Sentry: how the structured-data audit works",
  "description": "A 6-minute walkthrough of how Capconvert's Sentry audits every page on a Shopify store for schema completeness and surfaces a prioritized fix list.",
  "thumbnailUrl": [
    "https://www.example.com/learn/video/sentry-overview/thumb-16x9.jpg",
    "https://www.example.com/learn/video/sentry-overview/thumb-4x3.jpg",
    "https://www.example.com/learn/video/sentry-overview/thumb-1x1.jpg"
  ],
  "uploadDate": "2026-05-15T10:00:00-05:00",
  "duration": "PT6M12S",
  "contentUrl": "https://www.example.com/media/sentry-overview.mp4",
  "embedUrl": "https://www.youtube.com/embed/dQw4w9WgXcQ",
  "publisher": { "@id": "https://www.example.com/#organization" },
  "interactionStatistic": {
    "@type": "InteractionCounter",
    "interactionType": { "@type": "WatchAction" },
    "userInteractionCount": 4218
  },
  "transcript": "Hi, I'm Jacque. Let me show you how the Capconvert Sentry walks every page on your Shopify store..."
}

duration uses ISO 8601 (PT prefix, then hours / minutes / seconds). thumbnailUrl should be an array of crawlable image URLs. interactionStatistic with WatchAction surfaces view counts in the rich result.

transcript is the highest-leverage optional field for AI engines. Pasting the full transcript into the schema lets Perplexity, ChatGPT, and Gemini cite specific moments from the video without watching it - which is how they handle most "video-answer" queries.

Chapter 4. Key moments with Clip and SeekToAction

For videos longer than 3 minutes, key moments turn a single video result into a multi-link rich result with clickable chapter markers. Two ways to ship them, depending on how much control you want.

Manual key moments with Clip

Define each chapter explicitly. Most control, most maintenance.

"hasPart": [
  {
    "@type": "Clip",
    "name": "What Sentry actually does",
    "startOffset": 0,
    "endOffset": 95,
    "url": "https://www.example.com/learn/video/sentry-overview#t=0"
  },
  {
    "@type": "Clip",
    "name": "Walking through a 47-page audit",
    "startOffset": 95,
    "endOffset": 240,
    "url": "https://www.example.com/learn/video/sentry-overview#t=95"
  }
]

Auto key moments with SeekToAction

Let Google identify chapters from your YouTube description or embedded chapter markers. Less control, much less maintenance.

"potentialAction": {
  "@type": "SeekToAction",
  "target": "https://www.example.com/learn/video/sentry-overview?t={seek_to_second_number}",
  "startOffset-input": "required name=seek_to_second_number"
}

Use Clip when you have ≤10 well-defined chapters and want to control their names. Use SeekToAction when chapters are already in the YouTube description (Google scrapes them automatically) and you just want Google to deep-link into them.

Chapter 5. Live streams and transcripts

Live broadcast

For live streams, add a publication property of type BroadcastEvent with isLiveBroadcast: true, startDate, and endDate. Update endDate when the stream actually ends so Google can transition the rich result from "live now" to "ended at X".

"publication": {
  "@type": "BroadcastEvent",
  "isLiveBroadcast": true,
  "startDate": "2026-05-20T16:00:00-05:00",
  "endDate": "2026-05-20T17:00:00-05:00"
}

Notify Google about upcoming live streams with the Indexing API; this is one of the few production use cases for that API. We cover the workflow in our Indexing API guide.

Transcripts

The full transcript inline as a transcript string. For long transcripts (>10,000 chars), the schema gets unwieldy - consider a separate TranscriptObject with encodingFormat: text/plain and a URL pointing at a hosted text file. AI engines fetch and parse either form.

Chapter 6. Where do you place VideoObject schema on the site?

Page typeSchema placement
Dedicated video page (the video is the main content)VideoObject in page <head>, with @id matching the page URL + #video fragment
Blog post with one embedded videoVideoObject inside the BlogPosting @graph, referenced from the BlogPosting via video property
Course or training page with multiple videosArray of VideoObjects, optionally wrapped in a parent Course or CreativeWorkSeries graph
YouTube embed only (no first-party video file)Still ship VideoObject with embedUrl. Don't rely on YouTube's own schema.

Always reference the parent Organization via publisher so Google ties the video to your brand entity. Without it, the rich result loses the publisher label and the Knowledge Panel connection.

Chapter 7. The breakages we see most often

Ranked by frequency across 34 video-publishing audits over the past 24 months:

  • No VideoObject on YouTube embeds, so all rich-result attribution goes to YouTube. 22 of 34.
  • No transcript, removing the AI-engine ingestion path. 19 of 34.
  • No Clip or SeekToAction on videos longer than 5 minutes. 14 of 34.
  • Single-aspect-ratio thumbnail (only 16x9, no 4x3 or 1x1). 11 of 34.
  • duration in human-readable format ("5:30") instead of ISO 8601 ("PT5M30S"). 9 of 34.
  • uploadDate without timezone, so the rich result shows the wrong upload day in some regions. 7 of 34.
  • VideoObject on every page in the site instead of just the canonical video page, causing duplicate-entity confusion. 5 of 34.

We track every breakage above on running sites through our Sentry structured-data rule set.

FAQ

Should I ship VideoObject if my video is hosted on YouTube?

Yes. The point of VideoObject on your page is to claim attribution for searches that resolve to your URL. Without it, Google may surface the YouTube watch page instead, and the traffic skips your site entirely. Use embedUrl pointing at the YouTube embed URL.

What thumbnail dimensions should I generate?

At minimum 60x30 px. Google recommends a 16x9 thumbnail at 1280x720, plus 4x3 at 960x720 and 1x1 at 720x720 for max compatibility across rich-result surfaces. Each thumbnail URL must be publicly accessible (no auth wall, no robots blocking).

Do I need both contentUrl and embedUrl?

At least one. contentUrl is the direct link to the media file (MP4, WebM). embedUrl is the iframe-embeddable player URL. Self-hosted: contentUrl. YouTube/Vimeo/Wistia: embedUrl. Both is fine if both exist.

Why is my duration validating as invalid?

ISO 8601 format only: "PT5M30S" for 5 minutes 30 seconds, "PT1H12M" for 1 hour 12 minutes, "PT45S" for 45 seconds. Plain text durations like "5:30" or "5 minutes" fail validation. The "PT" prefix is mandatory.

How long should the transcript be?

Full transcript or nothing - partial transcripts mislead AI engines. For videos over 30 minutes, consider linking a TranscriptObject with the full text hosted at a stable URL rather than inlining 50,000+ characters in the schema.

Do live streams use VideoObject or a different type?

Both. VideoObject with a publication property of type BroadcastEvent where isLiveBroadcast: true. Update endDate when the stream ends so the rich result transitions correctly. Notify Google via the Indexing API for upcoming live streams.

References

  1. Schema.org. "VideoObject." schema.org/VideoObject
  2. Google Search Central. "Video (VideoObject) structured data." developers.google.com/search/docs/appearance/structured-data/video
  3. Schema.org. "Clip." schema.org/Clip
  4. Schema.org. "SeekToAction." schema.org/SeekToAction
  5. Google Search Central. "Livestream (BroadcastEvent) structured data." developers.google.com/search/docs/appearance/structured-data/livestream
  6. Schema.org. "Schema Markup Validator." validator.schema.org