Extract Website Metadata with an API: Title, Description, OG Tags

Every website has metadata hidden in its HTML — titles, descriptions, Open Graph tags, favicons, Twitter cards, and more. Extracting this metadata is essential for building **link previews**, **SEO analysis tools**, **competitive intelligence dashboards**, and **content aggregators**. In this guide, we'll show you how to use the **ToolCenter Metadata API** to extract structured metadata from any URL, with practical examples and real-world use cases. ## What Is Website Metadata? Website metadata is information embedded in a page's HTML `` section that describes the page's content. It includes: ### Standard HTML Meta Tags ```html Page Title ``` ### Open Graph Tags (Facebook/LinkedIn) ```html ``` ### Twitter Card Tags ```html ``` ### Other Metadata ```html ``` ## Why Extract Metadata with an API? ### The DIY Challenge You could scrape metadata yourself with libraries like BeautifulSoup or Cheerio, but you'll quickly run into problems: - **JavaScript-rendered pages** — Many modern sites render metadata client-side; simple HTML parsing misses it - **Redirects and canonical URLs** — Following redirect chains correctly is tricky - **Rate limiting and blocking** — Sites block scrapers; APIs handle this with rotating proxies - **Character encoding** — UTF-8, ISO-8859-1, and other encodings need proper handling - **Malformed HTML** — Real-world HTML is messy; robust parsing is non-trivial ### The API Advantage The ToolCenter Metadata API handles all of these challenges: - Renders JavaScript-heavy pages with a real browser - Follows redirects and resolves canonical URLs - Handles rate limiting and retries automatically - Returns clean, structured JSON - Extracts favicons, OG tags, Twitter cards, and more ## Use Cases ### 1. Link Preview Generation Build rich link previews like Slack, Discord, or iMessage: ``` ┌─────────────────────────────────┐ │ 🌐 example.com │ │ │ │ Example Domain │ │ This domain is for use in │ │ illustrative examples... │ │ │ │ [Preview Image] │ └─────────────────────────────────┘ ``` ### 2. SEO Analysis Build an SEO audit tool that checks: - Does the page have a title? Is it the right length (50-60 chars)? - Is there a meta description? Is it 150-160 characters? - Are OG tags properly configured? - Is there a canonical URL? - What's the favicon? ### 3. Competitive Analysis Monitor competitors' pages for: - Title and description changes (A/B testing detection) - New OG images (marketing campaign tracking) - Schema markup changes - Technology stack detection ### 4. Content Aggregation Build RSS-like feeds from websites that don't offer RSS: - Extract titles and descriptions from article pages - Pull OG images for visual feeds - Get author and publication dates ### 5. Bookmark Managers Create rich bookmarks with automatically extracted metadata — title, description, favicon, and preview image. ## ToolCenter Metadata API ### API Endpoint ``` POST https://toolcenter.dev/api/v1/metadata ``` ### Parameters | Parameter | Type | Required | Description | |---|---|---|---| | `url` | string | Yes | URL to extract metadata from | ### Response Format ```json { "url": "https://example.com", "canonical_url": "https://example.com/", "title": "Example Domain", "description": "This domain is for use in illustrative examples.", "author": null, "favicon": "https://example.com/favicon.ico", "og": { "title": "Example Domain", "description": "This domain is for illustrative examples.", "image": "https://example.com/og-image.jpg", "url": "https://example.com", "type": "website", "site_name": "Example" }, "twitter": { "card": "summary_large_image", "title": "Example Domain", "description": "This domain is for illustrative examples.", "image": "https://example.com/twitter-image.jpg", "site": "@example" }, "meta_tags": { "viewport": "width=device-width, initial-scale=1", "theme-color": "#ffffff", "robots": "index, follow" }, "links": { "canonical": "https://example.com/", "icon": "https://example.com/favicon.ico", "apple-touch-icon": "https://example.com/apple-touch-icon.png" } } ``` ## Code Examples ### cURL ```bash curl -X POST "https://toolcenter.dev/api/v1/metadata" \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://github.com"}' | jq . ``` ### Node.js — Link Preview Builder ```javascript const ToolCenter = require('devtoolbox-sdk'); const client = new ToolCenter('YOUR_API_KEY'); async function buildLinkPreview(url) { const metadata = await client.metadata(url); return { title: metadata.og?.title || metadata.title || 'Untitled', description: metadata.og?.description || metadata.description || '', image: metadata.og?.image || metadata.twitter?.image || null, favicon: metadata.favicon || null, siteName: metadata.og?.site_name || new URL(url).hostname, url: metadata.canonical_url || url, }; } // Generate link preview const preview = await buildLinkPreview('https://github.com'); console.log(preview); // { // title: "GitHub: Let's build from here", // description: "GitHub is where over 100 million developers...", // image: "https://github.githubassets.com/images/modules/site/social-cards/...", // favicon: "https://github.githubassets.com/favicons/favicon.svg", // siteName: "GitHub", // url: "https://github.com" // } ``` ### Python — SEO Analyzer ```python from devtoolbox import ToolCenter client = ToolCenter("YOUR_API_KEY") def analyze_seo(url: str) -> dict: """Analyze a URL's SEO metadata and return a report.""" meta = client.metadata(url=url) issues = [] score = 100 # Check title title = meta.get("title", "") if not title: issues.append("❌ Missing page title") score -= 20 elif len(title) > 60: issues.append(f"⚠️ Title too long ({len(title)} chars, max 60)") score -= 5 elif len(title) < 30: issues.append(f"⚠️ Title too short ({len(title)} chars, min 30)") score -= 5 # Check description desc = meta.get("description", "") if not desc: issues.append("❌ Missing meta description") score -= 15 elif len(desc) > 160: issues.append(f"⚠️ Description too long ({len(desc)} chars, max 160)") score -= 5 # Check OG tags og = meta.get("og", {}) if not og.get("title"): issues.append("⚠️ Missing og:title") score -= 10 if not og.get("description"): issues.append("⚠️ Missing og:description") score -= 10 if not og.get("image"): issues.append("❌ Missing og:image — social shares will look bad") score -= 15 # Check Twitter card twitter = meta.get("twitter", {}) if not twitter.get("card"): issues.append("⚠️ Missing twitter:card") score -= 5 # Check favicon if not meta.get("favicon"): issues.append("⚠️ Missing favicon") score -= 5 return { "url": url, "score": max(0, score), "title": title, "description": desc, "issues": issues, "og_complete": bool(og.get("title") and og.get("description") and og.get("image")), } # Run analysis report = analyze_seo("https://example.com") print(f"SEO Score: {report['score']}/100") for issue in report["issues"]: print(f" {issue}") ``` ### PHP — Bookmark Manager ```php use ToolCenter\Client; $client = new Client('YOUR_API_KEY'); function createBookmark(string $url) use ($client): array { $metadata = $client->metadata($url); return [ 'url' => $url, 'title' => $metadata['og']['title'] ?? $metadata['title'] ?? 'Untitled', 'description' => $metadata['og']['description'] ?? $metadata['description'] ?? '', 'image' => $metadata['og']['image'] ?? $metadata['twitter']['image'] ?? null, 'favicon' => $metadata['favicon'] ?? null, 'site_name' => $metadata['og']['site_name'] ?? parse_url($url, PHP_URL_HOST), 'saved_at' => date('Y-m-d H:i:s'), ]; } // Save a bookmark $bookmark = createBookmark('https://github.com'); echo "Saved: {$bookmark['title']}\n"; // Store in database // DB::table('bookmarks')->insert($bookmark); ``` ## Bulk Metadata Extraction Extract metadata from multiple URLs at once: ```javascript const ToolCenter = require('devtoolbox-sdk'); const client = new ToolCenter('YOUR_API_KEY'); const urls = [ 'https://github.com', 'https://stackoverflow.com', 'https://dev.to', 'https://hackernews.com', ]; const results = await client.bulk('metadata', urls.map(url => ({ url })) ); results.forEach(result => { console.log(`${result.title} — ${result.og?.description || 'No description'}`); }); ``` ## Best Practices ### 1. Cache Results Metadata doesn't change often. Cache results for 24-48 hours to reduce API calls: ```javascript const cache = new Map(); async function getMetadata(url) { const cached = cache.get(url); if (cached && Date.now() - cached.timestamp < 86400000) { return cached.data; } const data = await client.metadata(url); cache.set(url, { data, timestamp: Date.now() }); return data; } ``` ### 2. Handle Missing Data Gracefully Not all websites have complete metadata. Always provide fallbacks: ```javascript const title = metadata.og?.title || metadata.title || new URL(url).hostname; const image = metadata.og?.image || metadata.twitter?.image || '/default-preview.png'; ``` ### 3. Validate URLs Always validate and sanitize URLs before passing them to the API: ```python from urllib.parse import urlparse def is_valid_url(url: str) -> bool: try: result = urlparse(url) return all([result.scheme in ('http', 'https'), result.netloc]) except: return False ``` ### 4. Respect Rate Limits Use bulk endpoints for multiple URLs instead of individual requests. This is faster and more efficient. ## Pricing Metadata extraction is included in all ToolCenter plans — no extra charge: | Plan | Price | Monthly Requests | |---|---|---| | Free | $0 | 100 | | Starter | $9/mo | 5,000 | | Pro | $29/mo | 25,000 | | Business | $79/mo | 100,000 | ## Conclusion Extracting website metadata is a fundamental building block for many developer tools — link previews, SEO analyzers, bookmark managers, and content aggregators. The **ToolCenter Metadata API** makes it simple with a single endpoint that returns clean, structured JSON. Combined with ToolCenter's screenshot, PDF, QR code, and OG image tools, you have everything you need to build powerful web automation workflows. **[Extract Metadata Now →](https://toolcenter.dev)** --- *Check the [API documentation](https://toolcenter.dev/docs) for detailed endpoint references and response schemas.*