Computer Vision

Computer vision gives machines the ability to extract meaningful information from images, video, and visual data. In enterprise settings, it transforms manual inspection, monitoring, and analysis tasks into automated, consistent, scalable processes.

What it is

Computer vision encompasses any task where a system needs to understand visual input: recognising objects, detecting defects, reading text from images, tracking movement, measuring dimensions, classifying scenes, and segmenting images into meaningful regions.

For a non technical audience, the simplest way to understand it is this: anywhere a person currently needs to look at something and make a judgement based on what they see, computer vision can potentially do the same thing faster, more consistently, and at scale.

How it works

Modern computer vision is powered by deep neural networks (particularly convolutional neural networks and vision transformers) trained on large datasets of labelled images. The network learns to extract visual features at multiple levels of abstraction, from edges and textures through to complex objects and scenes.

For specific applications, models are typically fine tuned on domain specific imagery (product photos, medical scans, satellite images, manufacturing line footage) to achieve the accuracy required for production use.

Where it creates real value

Computer vision is most valuable where visual inspection is currently manual, inconsistent, or a bottleneck. Practical examples include quality control and defect detection in manufacturing, document digitisation and data extraction from physical records, security and surveillance with automated alerting, inventory management through visual counting and tracking, medical image analysis to support clinical decision making, and remote monitoring of infrastructure, equipment, or environments.

Where it is commonly misapplied

Computer vision requires high quality, representative training data. When the training images do not reflect real world conditions (different lighting, angles, wear, or variation), performance degrades dramatically in production.

It is also misapplied when the accuracy threshold is unrealistic (no vision system is 100% accurate and the consequences of errors must be designed for), when edge cases are critical but rare (the long tail of unusual situations is where most failures occur), or when the problem could be solved more reliably with a sensor, barcode, or structured data approach.

How it relates to architectural decisions

Computer vision raises architectural questions about edge versus cloud processing (where does inference happen, especially with video), data volume and storage (visual data is large and grows quickly), real time requirements (inspection at manufacturing line speed demands low latency), camera and sensor integration (the input pipeline is as important as the model), and annotation infrastructure (labelling training data is often the most expensive and time consuming part of a computer vision project).

How it connects to other disciplines

Computer vision is built on deep learning, frequently combined with NLP in multimodal systems (for example, describing what an image contains in natural language), and benefits from MLOps for model deployment and monitoring. Responsible AI is relevant where vision systems are used in surveillance, identification, or any context with privacy implications.

← NLP Generative AI →