PNG: The Lossless Graphics Standard

Complete technical deep dive into PNG file format, compression mechanisms, color handling, transparency, interlacing, and hands-on FFmpeg manipulation techniques.

Overview and History

Portable Network Graphics (PNG) is a lossless raster image format officially standardized by the W3C in 1996. Created as a free alternative to GIF and LZW compression patents, PNG combines advanced compression with rich metadata support.

Key Features

Lossless compression preserves all pixel data
Bit depths: 1, 2, 4, 8, 16, 24, 32, 48, 64-bit
Full per-pixel alpha transparency (0-255 opacity)
Multiple color models (grayscale, RGB, indexed)
Adam7 progressive interlacing
Embedded gamma and color profile data

Typical Use Cases

Web graphics and UI elements
Screenshots without artifacts
Logos, icons, and vector-based art
Scientific and medical imaging
Technical diagrams and infographics
Archival and long-term storage

PNG File Structure and Chunks

A PNG file consists of an 8-byte signature followed by a series of chunks. Each chunk contains metadata or image data with built-in error detection via CRC (Cyclic Redundancy Check).

File Signature

Every PNG file begins with an invariant 8-byte signature:

Decimal:  137  80  78  71  13  10  26  10 Hex:      89  50  4E  47  0D  0A  1A  0A ASCII:    ‰  P   N   G  CR  LF  SUB  LF

This signature serves multiple purposes: it identifies the file as PNG, catches common transmission errors (the control characters detect non-8-bit-clean channels), and is human-readable in hex editors.

Chunk Structure

All data following the signature is organized into chunks. Each chunk has a fixed 12-byte overhead plus variable data:

┌─────────────────────────┐ │ Length (4 bytes)        │ Big-endian unsigned 32-bit integer ├─────────────────────────┤ │ Chunk Type (4 bytes)    │ ASCII characters (e.g., "IHDR", "IDAT") ├─────────────────────────┤ │ Chunk Data (variable)   │ 0 to (2³¹ − 1) bytes ├─────────────────────────┤ │ CRC (4 bytes)           │ CRC32 of type + data └─────────────────────────┘

Critical Chunks

PNG defines certain chunks as "critical"—if an unrecognized critical chunk appears, the file is invalid. Critical chunk names have uppercase first letters.

IHDR (Image Header)

Critical, must be first. Contains 13 bytes:

Width (4 bytes) – pixels
Height (4 bytes) – pixels
Bit Depth (1 byte) – 1, 2, 4, 8, or 16
Color Type (1 byte) – 0, 2, 3, 4, or 6
Compression (1 byte) – always 0 (deflate)
Filter (1 byte) – always 0 (adaptive)
Interlace (1 byte) – 0 (none) or 1 (Adam7)

IDAT (Image Data)

Critical, contains compressed pixel data. May span multiple chunks. Data format:

Zlib header (2 bytes)
Deflated scanlines
Adler-32 checksum (4 bytes)
Each scanline: filter type (1 byte) + pixels

IEND (Image Trailer)

Critical, must be last. Zero-length chunk marking end of file. CRC value: 0xAE426082.

PLTE (Palette)

Ancillary, required for color type 3. 1–256 color entries, 3 bytes each (RGB):

Length must be multiple of 3. Palette defines RGB for indexed-color images.

Ancillary Chunks

Optional metadata chunks (lowercase first letter = ancillary). Unknown ancillary chunks can be safely ignored.

tRNS (Transparency)

For color types 0, 2, 3: transparency information. Grayscale/RGB specify transparent color; indexed-color lists alpha values per palette entry.

gAMA (Gamma)

4-byte gamma value × 100,000. Ensures brightness consistency. Typical: 45455 (≈ 2.2 for sRGB), 100000 (linear).

cHRM (Chromacities)

CIE 1931 color space coordinates for RGB primaries and white point. 32 bytes total.

sRGB (sRGB Profile)

1-byte rendering intent (0–3). If present, overrides gAMA/cHRM. Ensures web-standard color.

iCCP (ICC Profile)

Embedded ICC color profile for precise device-independent color. May be large; often overrides other color chunks.

tIME (Timestamp)

7-byte modification time (year, month, day, hour, minute, second).

pHYs (Physical Pixel)

Pixels per unit (X, Y, unit). Unit: 0 = unknown ratio, 1 = meters (DPI conversion).

tEXt (Text Data)

Uncompressed text key-value pairs (author, description, copyright, etc.).

zTXt (Compressed Text)

Deflate-compressed text. Useful for large metadata or comments.

Color Types and Bit Depths

PNG supports 5 color types, each with valid bit depth combinations. The color type byte in IHDR determines the image's color model.

Color Type	Description	Bit Depths	Bytes/Pixel
0	Grayscale	1, 2, 4, 8, 16	1/8 to 2
2	Truecolor (RGB)	8, 16	3 to 6
3	Indexed-color (palette)	1, 2, 4, 8	1/8 to 1
4	Grayscale + Alpha	8, 16	2 to 4
6	Truecolor + Alpha (RGBA)	8, 16	4 to 8

Bit Depth Details

Bit depth defines the number of bits per sample (grayscale or color channel):

1-bit – 2 colors (black and white). Ideal for line art, binary masks.
2-bit – 4 colors. Rarely used; old computer graphics.
4-bit – 16 colors (or 16 gray levels). Historical graphics.
8-bit – 256 colors (or 256 gray levels). Web-standard for indexed images; standard for grayscale.
16-bit – 65,536 levels per channel. Professional photography, scientific imaging; larger files.

PNG Compression Mechanism

PNG employs a two-stage compression pipeline: filtering (prediction) followed by DEFLATE (LZ77 + Huffman). This hybrid approach achieves excellent compression without information loss.

Stage 1: Filtering (Prediction)

Before compression, each 8-bit scanline (pixel row) is preprocessed using one of 5 filter algorithms. The filter type byte is stored at the start of the compressed scanline. Filters exploit spatial correlation—adjacent pixels are often similar—converting data into a form more compressible by Huffman coding.

Filter 0: None

No transformation; raw pixel values stored. Useful when pixels are already uncorrelated (e.g., random noise, high-frequency content).

Output[x] = Original[x]

Filter 1: Sub

Subtracts the pixel to the left. Effective for gradual horizontal transitions and photographs.

Output[x] = Original[x] − Original[x−1]

Filter 2: Up

Subtracts the pixel above. Exploits vertical correlation; works well for vertical gradients.

Output[x] = Original[x] − Original_above[x]

Filter 3: Average

Subtracts the average of left and above neighbors. Balances horizontal and vertical correlation.

Output[x] = Original[x] − ⌊(Left + Up) / 2⌋

Filter 4: Paeth

Paeth predictor: selects left, up, or upper-left based on distance to target. Most sophisticated; best compression on graphics.

Output[x] = Original[x] − PaethPredictor(Left, Up, UpLeft)
Predictor = p where p ∈ {Left, Up, UpLeft} minimizes distance

Stage 2: DEFLATE Compression

After filtering, the entire preprocessed image data (all scanlines concatenated) is compressed with DEFLATE. DEFLATE combines:

DEFLATE Algorithm

LZ77 (Lempel-Ziv) – A sliding window algorithm that replaces repeated sequences with back-references (offset, length pairs). Typical window: 32 KB. If a sequence appears multiple times, only store distance to previous occurrence + length.
Huffman Coding – Assigns variable-length binary codes to symbols: frequent values get short codes, rare values get long codes. Reduces average bits per symbol.
Zlib Wrapper – PNG IDAT chunks use zlib format: 2-byte header (method, flags) + compressed data + 4-byte Adler-32 checksum.

Encoder Optimization

Sophisticated PNG encoders (e.g., libpng, ImageMagick) use multiple strategies to minimize output size:

Try all 5 filters on each scanline; select the one producing smallest compressed size.
Optimize DEFLATE parameters (compression level, block size) based on image type (photo vs. graphics).
For indexed-color images, analyze palette and remap colors to minimize entropy.
Bit-depth reduction (e.g., 8-bit → 4-bit) if image contains ≤ 16 unique colors.
Strip ancillary chunks not needed for display (metadata, gamma, color profiles) for smaller files.

Adam7 Interlacing

PNG supports the Adam7 interlacing algorithm, enabling progressive display. Instead of storing pixels left-to-right, top-to-bottom, Adam7 rearranges them into 7 passes. After the first pass, a low-resolution preview is visible; each pass refines the image.

Adam7 Passes

Each pass targets specific pixel positions in an 8×8 repeating pattern:

Pass	X Start	X Step	Y Start	Y Step	Resolution
1	0	8	0	8	1/8 × 1/8
2	4	8	0	8	2/8 × 1/8
3	0	4	4	8	2/8 × 2/8
4	2	4	0	4	4/8 × 2/8
5	0	2	2	4	4/8 × 4/8
6	1	2	0	2	8/8 × 4/8
7	0	1	1	2	8/8 × 8/8

Progressive Display Benefits

Pass 1 alone provides a 1/8 × 1/8 resolution preview (12.5% of pixels) – typically 30–50 KB into the file.
Pass 1–3 yields 2/8 × 2/8 resolution (25% complete preview) after ≈ 15% of file bytes.
Pass 1–4 reaches 4/8 × 2/8 resolution (50% complete). By this point, user sees general image structure.
All 7 passes complete the full resolution. Network delays allow incremental refinement instead of blank screen.

Trade-offs

Adam7 interlacing increases file size by 5–10% due to extra scanline predictor data. Use interlacing for web images (progressive perception); disable for fast local viewing or archival.

Transparency and Alpha Channels

PNG supports two complementary transparency mechanisms: embedded alpha channels (for color types 4 and 6) and the tRNS chunk (for types 0, 2, 3).

Alpha Channel (Types 4 & 6)

Color types 4 and 6 include a full alpha channel: an 8-bit (or 16-bit) value per pixel, 0 = fully transparent, 255 = fully opaque. Intermediate values enable anti-aliasing edges.

Type 4 (Grayscale+Alpha) – Each pixel: grayscale (1–16 bits) + alpha (same bit depth). Useful for masks and overlays.
Type 6 (RGBA) – Each pixel: red + green + blue + alpha (8 or 16 bits each). Standard for web graphics with transparency.

tRNS Chunk (Types 0, 2, 3)

For images without built-in alpha, tRNS specifies a single "transparent" color:

Type 0 (Grayscale) – tRNS contains a 2-byte grayscale value; pixels matching this value are transparent.
Type 2 (RGB) – tRNS contains three 2-byte RGB values; pixels matching (r, g, b) are transparent.
Type 3 (Indexed) – tRNS lists an alpha value (0–255) for each palette entry; enables gradual transparency on paletted images.

Alpha Composition

When rendering PNG with alpha, viewers composite the image over a background using the standard formula:

Output = (Foreground × Alpha) + (Background × (1 − Alpha))

PNG does not specify background color; each viewer chooses (typically white or the page background). To ensure portable appearance, images destined for web are often pre-composited against white or a specific background color, sacrificing the alpha channel but guaranteeing consistent appearance.

Gamma Correction and Color Spaces

PNG stores color-critical metadata to ensure images display consistently across different devices. Gamma correction, color space definitions, and embedded profiles bridge the gap between image creation (linear RGB) and display (gamma-corrected sRGB).

Gamma Basics

Gamma (γ) is a non-linear mapping between pixel values and display brightness:

Display_Intensity = Input_Value ^ (1 / γ)

A gamma of 2.2 is standard for sRGB (typical web and Windows default). macOS historically used 1.8. Cameras embed gamma to match their target display, not for linear light capture.

gAMA Chunk

The gAMA chunk stores a single 4-byte unsigned integer representing gamma × 100,000:

45455 – sRGB standard (γ ≈ 2.2)
45750 – Apple standard (γ ≈ 2.187, ≈ 1.8 inverse)
100000 – Linear (no gamma)
55556 – γ ≈ 1.8

sRGB Chunk

If present, sRGB chunk (1 byte) overrides gAMA and cHRM, indicating the image is in standard sRGB color space. Rendering intent:

0 – Perceptual
1 – Relative colorimetric
2 – Saturation
3 – Absolute colorimetric

cHRM Chunk (Chromacities)

Defines the RGB color space by specifying CIE 1931 chromaticity coordinates for the three primaries and white point. Each value is 4 bytes (× 100,000):

sRGB Primaries: Red:   (0.64, 0.33)   → (64000, 33000) Green: (0.30, 0.60)   → (30000, 60000) Blue:  (0.15, 0.06)   → (15000, 6000) White D65: (0.3127, 0.329) → (31270, 32900)

ICC Profile (iCCP Chunk)

For maximum color accuracy, PNG can embed a full ICC color profile—a binary blob describing device-specific color transformations. Overrides gAMA, cHRM, sRGB. Typical size: 1–4 KB. Used in professional photography and print workflows.

PNG Manipulation and Experiments via FFmpeg

FFmpeg is a versatile multimedia toolkit that can read, decode, filter, encode, and manipulate PNG files. Below are comprehensive techniques for exploring and experimenting with PNGs.

Basic Information and Inspection

Inspect PNG metadata and properties

ffprobe -v error -show_streams -show_format image.png

Outputs detailed metadata: dimensions, pixel format (color type), bit depth, color space, frame rate (for APNG), etc. in a structured format.

JSON output for scripting

ffprobe -v error -print_format json -show_format -show_streams image.png | jq

Parses metadata as JSON; useful for programmatic workflows.

View detailed stream information

ffmpeg -i image.png 2>&1 | grep -E "Stream|Duration|pixel|kb/s"

FFmpeg prints color model, resolution, and estimated bitrate on stderr.

Format Conversion and Encoding

Convert PNG to JPEG

ffmpeg -i image.png -q:v 85 output.jpg

Lossy conversion. -q:v ranges 2–31 (2 = highest quality, ~95%; 31 = lowest, ~5%).

Convert PNG to WebP

ffmpeg -i image.png -c:v libwebp -q:v 80 output.webp

WebP is more efficient than JPEG/PNG for web. Use -lossless 1 for lossless WebP.

Convert PNG to AVIF (next-gen)

ffmpeg -i image.png -c:v libaom-av1 -q:v 32 output.avif

AVIF offers excellent compression. Lower -q:v = better quality.

Convert to different color types

# Convert to grayscale
ffmpeg -i image.png -pix_fmt gray output_gray.png

# Convert to RGB (force truecolor)
ffmpeg -i image.png -pix_fmt rgb24 output_rgb.png

# Convert to RGBA (add alpha channel)
ffmpeg -i image.png -pix_fmt rgba output_rgba.png

# Convert to indexed (paletted)
ffmpeg -i image.png -pix_fmt pal8 -c:v png output_indexed.png

Bit Depth Manipulation

Convert to 8-bit per channel

ffmpeg -i input_16bit.png -pix_fmt rgb24 output_8bit.png

Reduces file size significantly (rgb48 → rgb24).

Expand to 16-bit per channel

ffmpeg -i input_8bit.png -pix_fmt rgb48be output_16bit.png

Increases precision; useful for professional workflows. Use rgb48le for little-endian on Intel systems (usually the default).

Grayscale bit depth variants

# 16-bit grayscale
ffmpeg -i input.png -pix_fmt gray16be output_16bit_gray.png

# 8-bit grayscale
ffmpeg -i input.png -pix_fmt gray output_8bit_gray.png

Resizing and Scaling

Scale to fixed dimensions

ffmpeg -i input.png -vf "scale=800:600" output.png

Resizes to exactly 800×600. May distort if aspect ratio differs.

Scale with aspect ratio preservation

ffmpeg -i input.png -vf "scale=800:600:force_original_aspect_ratio=decrease" output.png

Preserves aspect; reduces dimensions to fit within 800×600 box.

Scale by percentage

ffmpeg -i input.png -vf "scale=iw*0.5:ih*0.5" output_50percent.png

iw = input width, ih = input height.

Cropping and Padding

Crop a rectangular region

ffmpeg -i input.png -vf "crop=400:300:100:50" output.png

Crop 400×300 pixels starting at (x=100, y=50).

Pad with color (add border)

ffmpeg -i input.png -vf "pad=1000:1000:(ow-iw)/2:(oh-ih)/2:black" output.png

Pads to 1000×1000, centering original image on black background. Use white, 0x00FF00 (green), or 0xFFFFFF.

Color and Brightness Adjustments

Adjust brightness and contrast

ffmpeg -i input.png -vf "eq=brightness=0.1:contrast=1.2" output.png

brightness range: −1 to 1 (−1 = pitch black, 0 = original, 1 = overexposed). contrast: 1 = original.

Saturation adjustment

ffmpeg -i input.png -vf "hue=s=1.5" output.png

s=1.5 boosts saturation by 50%; s=0 = grayscale.

Colorize (apply hue tint)

ffmpeg -i input.png -vf "hue=h=120" output.png

Shifts all hues by 120°. Use h=180 for inverse colors, h=90 for 90° rotation.

Gamma correction

ffmpeg -i input.png -vf "eq=gamma=2.2" output.png

Apply gamma curve. Values > 1 brighten; < 1 darken.

Filtering and Effects

Blur

ffmpeg -i input.png -vf "boxblur=5" output.png

Box blur with 5-pixel radius. For Gaussian blur: -vf "gblur=sigma=2".

Sharpen

ffmpeg -i input.png -vf "unsharp=5:5:1.5" output.png

Unsharp mask with 5×5 kernel and 1.5× strength.

Edge detection

ffmpeg -i input.png -vf "sobel" output.png

Sobel edge detector. Also: laplacian, roberts, canny.

Denoise

ffmpeg -i input_noisy.png -vf "nlmeans=s=5:p=7:r=15" output_denoised.png

Non-local means denoise. Parameters: s (strength), p (patch size), r (research radius).

Transparency and Alpha Manipulation

Add alpha channel if missing

ffmpeg -i input.png -pix_fmt rgba output_rgba.png

Converts RGB → RGBA; opaque alpha (255) everywhere.

Make specific color transparent

ffmpeg -i input.png -vf "colorkey=0x00FF00:0.1:0.5" output.png

Key out green (0x00FF00). Similarity tolerance: 0.1; blend: 0.5.

Adjust alpha channel opacity

ffmpeg -i input_rgba.png -vf "split[fg][bg]; [bg]lut=a=a/2[bg]; [fg][bg]overlay" output.png

Reduces alpha by 50% (more transparent). For simpler adjustment, use lut filter with custom look-up tables.

Composite PNG over background

ffmpeg -i background.png -i foreground_alpha.png -filter_complex "overlay=(W-w)/2:(H-h)/2" output.png

Places foreground (centered) over background, respecting alpha.

Batch Processing and Frame Extraction

Extract frames from video as PNG sequence

ffmpeg -i input.mp4 -vf "fps=fps=1" frames_%04d.png

Extracts 1 frame per second: frames_0001.png, frames_0002.png, etc.

Extract specific frame at timestamp

ffmpeg -ss 00:01:30 -i input.mp4 -vframes 1 frame_at_1m30s.png

Grabs frame at 1 minute 30 seconds.

Create video from PNG sequence

ffmpeg -framerate 24 -i frames_%04d.png -c:v libx264 -pix_fmt yuv420p output.mp4

Combines PNGs into H.264 video at 24 fps.

Metadata and Embedding

View PNG metadata

ffmpeg -i image.png 2>&1 | grep -i metadata

Displays embedded text, timestamps, color profiles, etc.

Copy PNG with new metadata (via FFmpeg)

ffmpeg -i input.png -metadata title="My Image" -metadata comment="Example" -codec copy output.png

Adds tEXt chunks without re-encoding (lossless copy).

Strip all metadata

ffmpeg -i input.png -map_metadata -1 -codec copy output.png

Removes all ancillary chunks (gamma, color profiles, text) for a smaller file.

Combining Multiple PNGs

Horizontal concat (side-by-side)

ffmpeg -i left.png -i right.png -filter_complex "hstack" output.png

Places two PNGs horizontally.

Vertical concat (stacked)

ffmpeg -i top.png -i bottom.png -filter_complex "vstack" output.png

Stacks two PNGs vertically.

Grid layout (montage)

ffmpeg -i image1.png -i image2.png -i image3.png -i image4.png   -filter_complex "
    [0:v][1:v]hstack=inputs=2[top];
    [2:v][3:v]hstack=inputs=2[bottom];
    [top][bottom]vstack=inputs=2[v]
  " -map "[v]" output.png

Arranges four PNGs in a 2×2 grid.

Advanced: APNG (Animated PNG)

Create APNG from PNG sequence

ffmpeg -framerate 10 -i frame_%04d.png -plays 0 -c:v apng output.apng

Generates APNG looping infinitely (−plays 0). Note: apng codec requires FFmpeg built with libapng support.

Extract frames from APNG

ffmpeg -i animated.apng frame_%04d.png

Extracts each animation frame as individual PNG files.

Complex Filter Graph Example

Here's an advanced example combining multiple operations:

ffmpeg -i input.png   -vf "
    scale=1200:800:force_original_aspect_ratio=decrease,
    pad=1200:800:(ow-iw)/2:(oh-ih)/2:white,
    eq=brightness=0.05:contrast=1.1,
    unsharp=5:5:1.0
  "   -c:v png output_processed.png

This workflow: (1) scales respecting aspect ratio, (2) pads to 1200×800 with white borders, (3) adjusts brightness/contrast, (4) sharpens, (5) encodes as PNG.

PNG vs. Other Formats

Property	PNG	JPEG	WebP	GIF
Compression	Lossless	Lossy	Lossy/Lossless	Lossless (indexed)
Bit Depth	1–64 bits	24 bits	8–32 bits	8 bits indexed
Transparency	Full alpha	None	Full alpha	1-bit (indexed)
Animation	APNG	None	WebP anim	Native
Metadata	Rich (chunks)	EXIF/IPTC	Limited	Limited
Color Profiles	ICC, sRGB, gamma	EXIF/ICC	ICC, VP8	None
Artifacts	None (lossless)	Banding, blockiness	Minimal (lossy)	Color banding
Typical Use	Web graphics, UI, screenshots	Photos (lossy)	Modern web (smaller files)	Animated memes/loops

$ PNG Deep Dive Summary

File Structure – Signature + chunks (IHDR, IDAT, IEND, plus ancillary). Each chunk: length, type, data, CRC.
Compression – Two-stage: filtering (5 types: None, Sub, Up, Average, Paeth) then DEFLATE (LZ77 + Huffman).
Color & Bit Depth – 5 types (0–6); supports 1–64 bits. Type 6 (RGBA) standard for web with transparency.
Transparency – Full alpha channel (types 4, 6) or single transparent color via tRNS chunk.
Interlacing – Adam7 enables progressive display (7 passes, 12.5% preview in first pass).
Color Management – gAMA, cHRM, sRGB, ICC profiles ensure display consistency.
FFmpeg – Powerful tool for conversion, scaling, filtering, metadata manipulation, frame extraction, APNG creation, and batch processing.