Regex for Data Extraction and TransformationLesson 6.2
Extracting emails phone numbers and URLs from unstructured text
extract vs validate distinction, word boundary use, overlapping matches, deduplication, running multiple patterns over same text
Extraction Is Looser Than Validation
Validation checks if a string IS a value. Extraction finds all occurrences of a value IN a string. Extraction patterns are anchored to word boundaries, not to the full string.
const text = 'Contact sales@example.com or call 415-555-0000. See https://example.com';
const emails = text.match(/[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}/g);
const phones = text.match(/\b\(?\d{3}\)?[\s.\-]?\d{3}[\s.\-]?\d{4}\b/g);
const urls = text.match(/https?:\/\/[\w.\-]+(?:\/[^\s]*)?/g);
