Aggregation PipelineLesson 4.5

How to write MongoDB aggregation pipelines for real analytics

$facet multi-pipeline, $out and $merge to persist results, allowDiskUse for large datasets, aggregation pipeline optimization, pipeline explain

$facet - multiple sub-pipelines in one query

$facet processes the same input documents through multiple independent aggregation sub-pipelines and returns all results in one query response. This is ideal for analytics dashboards that need several different metrics from the same collection without multiple database round-trips.

const dashboard = await db.collection('orders').aggregate([{
  $facet: {
    byStatus: [
      { $group: { _id: '$status', count: { $sum: 1 }, total: { $sum: '$total' } } }
    ],
    revenueByMonth: [
      { $group: {
        _id: { $month: '$createdAt' },
        revenue: { $sum: '$total' }
      }},
      { $sort: { _id: 1 } }
    ],
    topCustomers: [
      { $group: { _id: '$userId', spent: { $sum: '$total' } } },
      { $sort: { spent: -1 } }, { $limit: 5 }
    ]
  }
}]).toArray()
console.log(dashboard[0]) // { byStatus: [...], revenueByMonth: [...], topCustomers: [...] }

Persisting pipeline results to a collection

// $out: replace the entire target collection
{ $out: 'monthly_revenue_cache' }

// $merge: upsert results into an existing collection
{ $merge: { into: 'reports', on: '_id', whenMatched: 'replace' } }

Large dataset processing

// Pipelines exceeding 100 MB working memory will error by default
db.collection('events').aggregate(pipeline, { allowDiskUse: true })

Use $out for scheduled report generation and $merge for incremental updates to existing reporting tables. Both are atomic - the target collection is not partially updated if the pipeline fails mid-way.

Ready to practice?

MCQs · Coding challenges · Mini project