Building A Video Editor That Never Leaves the Browser

A demo and overview of building a client-only video editor with EDL, WebGL effects, and object segmentation.

I recently spent some time building a small video editor that runs entirely in the browser. No backend processing, no uploads, no servers doing the heavy lifting. The goal was simple: see how far modern web APIs, WASM, and WebGL can be pushed for real video editing workflows, even if the end result is still a prototype.

What started as a weekend experiment quickly turned into a surprisingly capable editor for stitching and trimming clips, swapping their order on a timeline, and even running computer vision effects per frame.

Deciding on the UI library

I work with Material UI on a daily basis, so for this project I wanted to try something different. I looked at a few options, including Chakra UI, Ant Design, and Mantine, but in the end I went with shadcn built on top of Radix UI. The main reason was its clean, sleek look and how little it gets in your way. The components feel well thought out, accessible by default, and easy to adjust when you need something slightly custom. You can check out the components for yourself here.

I was a bit overwhelmed at first by how much inline styling was needed. It reminded me of how I used to work with MUI’s styled API, which tends to keep things feeling more contained and tidy. In practice though, the inline styles work very well. Splitting components into smaller pieces is enough to keep the code readable and to reduce the cognitive load for developers.

For example, below is a comparison of defining a styled button between MUI and Radix + shadcn + Tailwind.

muiButton.tsx

import { styled } from '@mui/material/styles';
import Button from '@mui/material/Button';

const PrimaryButton = styled(Button)(({ theme }) => ({
  padding: '8px 16px',
  borderRadius: 8,
  textTransform: 'none',
  fontWeight: 600,
  backgroundColor: theme.palette.primary.main,

  '&:hover': {
    backgroundColor: theme.palette.primary.dark,
  },

  '&.Mui-disabled': {
    backgroundColor: theme.palette.action.disabledBackground,
    color: theme.palette.action.disabled,
  },
}));

export function SaveButton() {
  return <PrimaryButton>Save</PrimaryButton>;
}

radixButton.tsx

// components/ui/primary-button.tsx

import * as React from 'react';
import { cn } from '@/lib/utils';

type PrimaryButtonProps = React.ButtonHTMLAttributes<HTMLButtonElement>;

export const PrimaryButton = React.forwardRef<
  HTMLButtonElement,
  PrimaryButtonProps
>(({ className, ...props }, ref) => {
  return (
    <button
      ref={ref}
      className={cn(
        'px-4 py-2 rounded-md font-semibold normal-case transition-colors',
        'bg-primary text-primary-foreground',
        'hover:bg-primary/90',
        'disabled:bg-muted disabled:text-muted-foreground disabled:cursor-not-allowed',
        className
      )}
      {...props}
    />
  );
});

Non-destructive editing on the client

At the core of the editor is a non-destructive editing model. Video files are never modified during editing. Every cut, trim, and move is stored as metadata in an Edit Decision List (EDL). Playback uses native HTML5 video seeking, so interacting with the videos feels instant.

A clip on the timeline is just a reference into a source video:

export interface VideoSource {
  id: string;
  file: File;
  objectUrl: string;
  durationInSeconds: number;
  fileName: string;
  widthInPixels?: number;
  heightInPixels?: number;
  /** Detected frame rate (fps). Falls back to 30 if detection fails. */
  frameRate: number;
}

interface TimelineClip {
  id: string;
  sourceId: string;
  /** The start time within the source video (where playback begins in the original file) */
  sourceInPointSeconds: number;
  /** The end time within the source video (where playback ends in the original file) */
  sourceOutPointSeconds: number;
  /** The position on the timeline where this clip starts (independent of the source video's timing) */
  timelinePositionSeconds: number;
  trimmed: TrimmedRegions;
}

On the frontend, it means instant feedback and no expensive recomputation. In a real production setup, this approach becomes crucial, since it allows to send the same EDL to a backend service where we can run ffmpeg natively to render a final MP4 with full codec control.

State management and undo for free

For state management, I chose Zustand. It offers a very efficient re-rendering model, which is critical for keeping the UI responsive.

One particularly nice surprise was combining Zustand with zundo. Undo and redo support came almost for free, with very little extra code. This made it easy to experiment with timeline actions without worrying about breaking state, and it fits naturally with the way Zustand stores are structured.

timelineStore.ts

import { temporal } from 'zundo';

export const useTimelineStore = create<TimelineStore>()(
  temporal(
    (...args) => ({
      ...createClipSlice(...args),
      ...createColorSlice(...args),
    }),
    {
      // Track both clips and colorRegions for undo/redo
      partialize: (state) => ({
        clips: state.clips,
        colorRegions: state.colorRegions.map((r) => ({
          id: r.id,
          clipId: r.clipId,
          isProcessing: r.isProcessing,
        })),
      }),
      equality: shallowEqualColorRegions,
      limit: 50,
    },
  ),
);

example.ts

import { useTimelineStore } from './timelineStore';

const Component = () => {
   const handleUndo = useCallback(() => {
      useTimelineStore.temporal.getState().undo();
    }, []); 
}

WebGL and real-time color isolation

For visual effects, I implemented color isolation using WebGL. The isolation pass runs in a fragment shader and performs well enough to keep scrubbing smooth. One interesting observation is how forgiving WebGL can be for this kind of workload: as long as you stay disciplined with texture sizes and avoid unnecessary passes, the browser GPU pipeline holds up well even under frequent seeks.

Tracking, segmentation, and per-frame processing

The more experimental part of the project focuses on object tracking and segmentation in the browser. It starts with a simple user click: from that click, a SAM mask is generated for the selected object. For every subsequent frame, MediaPipe is used to detect all objects in the scene. Each detected bounding box is then compared against the original one using IoU, and the box with the highest overlap is selected as the same object. That bounding box becomes the input for generating the next segmentation mask.

In practice, this approach works very well. Once the object is selected, the mask stays stable across frames. It’s a surprisingly robust setup for something running fully in the browser. If this were moved server-side, the whole flow could be simplified by deferring to SAM v2, which already includes built-in object tracking.

Export: choosing the boring option on purpose

To handle video export, I explored a few different approaches, each with clear upsides and downsides:

ffmpeg compiled to WASM
Pros: Extremely powerful, full control over codecs, formats, and rendering logic. Matches what you would do on the server.
Cons: Heavy to load, slow in practice, and complex to integrate and maintain in a browser environment.
WebCodecs with an MP4 muxer
Pros: Modern API, efficient encoding path, and the ability to produce MP4 directly in the browser.
Cons: Very complex to implement, fragile, and limited to a small set of browsers, which makes it hard to rely on.
MediaRecorder
Pros: Simple to use, stable, and widely supported across browsers. Easy to integrate and maintain.
Cons: Limited to WebM output and capped at playback speed during export.

I ultimately chose MediaRecorder. While it has its own limitations, the reliability and simplicity made it the best fit for this project. Since it records exactly what is being played back, it does not require re-running color isolation, reapplying EDL logic, or managing a separate rendering pipeline. If this were to evolve into a production tool, the natural next step would be to send the EDL to a backend and let native ffmpeg handle the final render.

Final thoughts

Overall, this was a genuinely fun project and a good reminder of how far the modern web has come. With the right architecture in place, it’s possible to build an editor that feels fast, looks clean, and even runs fairly advanced ML models without ever leaving the browser.

shadcn and Radix made the UI enjoyable to work with, Tailwind helped keep iteration quick, Zustand together with zundo made state changes feel safe, and WebGL combined with Mediapipe and SAM enabled effects that would have seemed unrealistic on the frontend not that long ago.

Feel free to browse the entire code on my github.

Thank you!