Custom Data Labelling — Toronto Innovation Lab

Industrial‑first labelling, tied to your SOPs

Generic labels miss context. We encode your definitions of defects, safe states, and pass/fail criteria, then bind expert feedback to exact clips and timestamps.

SOP‑aligned ontology

Shared taxonomy for parts, tools, defects, and outcomes—version‑controlled.

Expert‑in‑the‑loop

Supervisors can annotate live or review passively; disagreements are resolved via playbook.

Export‑ready

COCO / YOLO / Pascal VOC, CSV, JSONL; video event spans and transcripts included.

Label types we support

Bounding boxes

Detect parts, tools, and PPE states in images/video.

Polygons / masks

Instance/semantic segmentation for surface defects and zones.

Keypoints

Pose/landmarks (e.g., hand placement, alignment markers).

Temporal events

Start/stop, interventions, pass/fail, near‑miss windows.

Sequences

Step ordering and compliance with SOP checklists.

3D / point clouds

Annotations for depth/LiDAR where applicable.

OCR & transcripts

Read gauges, screens; align audio text with frames.

Multimodal

Join video, sensor logs, and operator comments.

Our process

From raw footage to a reliable dataset, fast.

1) Intake & ontology

Collect SOPs, defect catalogs, and prior examples; define label schema and classes.

2) Guideline authoring

Create visual playbooks with positive/negative examples and edge‑case rules.

3) Pilot & calibration

Label a small slice, measure agreement, refine definitions until stable.

4) Production labelling

Trained annotators label at scale; disagreements routed to experts.

5) QA & gold tests

Inject gold items, run double‑blind checks, and compute agreement metrics.

6) Handoff to training

Export datasets + docs; support model training and error analysis.

Quality assurance

Multi‑pass review

1st pass label → 2nd pass review → lead auditor sign‑off on edge cases.

Agreement metrics

Inter‑annotator agreement and error heatmaps guide retraining.

Gold‑set governance

Seeded checks with precision/recall scored per labeler and class.

We maintain versioned guidelines and changelogs so labels remain consistent over time.

What you get

Labeling guidelines

Illustrated rules with examples, edge cases, and decision trees.

Ontology & schema

Classes, attributes, and relationships—version‑controlled.

Labeled dataset slice

Images/video/audio with annotations + timestamps; transcripts when relevant.

QA report

Agreement, precision/recall on gold items, and error analysis.

Exports & tooling files

COCO, YOLO, Pascal VOC, CSV, JSONL; export scripts if needed.

Data room setup

Folder structure, permissions, and retention policy aligned to your rules.

How we measure

Throughput

Frames/hour or events/hour by labeler, normalized by class difficulty.

Agreement

Inter‑annotator agreement on a rolling window; trend to target before scale‑up.

Quality

Precision/recall on gold items and auditor spot‑checks.

Latency

Capture‑to‑label and label‑to‑train cycle times.

Coverage

Edge‑case representation and class balance over time.

Cost per unit

Transparent pricing per asset/hour with QA overhead visible.

Security & IP

Deployment: on‑prem or your private VPC; data never leaves your control.
Access control & audit: least‑privilege roles, audit logs, and SSO.
Privacy: redaction zones, blur for faces/badges/screens; configurable retention.
Ownership: labels and resulting models are your IP; no cross‑customer training.

We align to your compliance requirements and document controls during onboarding.

Tooling & integrations

Use your stack or ours

We can label in your environment or provide a managed stack with dashboards and exports.

Formats & pipelines

COCO / YOLO / Pascal VOC / CSV / JSONL; simple scripts to push to your training jobs.

We also support linking operator comments and transcripts directly to labeled moments for RLHF workflows.

FAQ

Minimum dataset size?

We can start with a small pilot slice (e.g., a few hours of video or a few thousand frames) to stabilize guidelines before scaling.

Who owns the labels?

You do. All annotations, guidelines, and derived datasets are your IP.

Can you label audio and logs?

Yes—transcripts can be aligned to frames; machine logs can be joined for multimodal labels.

Pricing?

Transparent per‑asset or per‑hour rates with QA tiers; we’ll scope during the free consult.