ExtraTech Logo
HomeProjectsStudentsRecommendationsContact Us

© 2025 ExtraTech Bootcamps. All rights reserved.

← Back to Projects

This project is a sub-project of BSDFlow

Kaltura

Documents

Mentored by: Kaltura

Documents - A multi-tenant, dynamic management system for entities, events, workflows, and real-time operations.

Documents
React
NestJS
.NET Core
Kafka
PostgreSQL
Redis
WebSocket
SignalR
Microservices
LLM Integration

Description

Documents sub-project of BSDFlow. A full-scale microservices-based management platform supporting dynamic entities, groups, processes, and events. Includes multi-tenant table architecture, generic Kafka handlers with correlation-based async RPC, Redis caching for performance, real-time updates via WebSocket & SignalR, integration with LLMs, Excel watchers, advanced validation using Strategy Pattern, RBAC permissions, GIS mapping, smart search with PostgreSQL FTS+GIN+Trigrams, and a complete load-testing engine. Enables organizations to define custom workflows, attach documents, track participants, and manage system-wide behaviors across distributed services.

Mentors

S

Sunny Simantov

Creating a wonderful OTT system

Kaltura

Team Members

Cohort: Backend Bootcamp 2025 (Backend)

Shevi N. - Task Preview
Shevi N.

Responsibilities:

  • End‑to‑End OCR Domain Design & Implementation

    Led the full design and implementation of the OCR domain inside BSDFlow – a large‑scale, microservices‑based management platform built in collaboration with Kaltura and KamaTech.

    I designed a production‑ready OCR microservice in NestJS (TypeScript) that processes documents end‑to‑end, from upload to extracted structured text.

    The service supports multiple file types including scanned PDFs, image files (JPG/PNG), and Office documents, and integrates Tesseract OCR and LibreOffice for document conversion and text extraction.

    I focused on clean architecture, separation of concerns, and extensibility, so that new OCR providers or file pipelines can be added with minimal changes.

  • Smart “Quality Gate” Decision Engine (Local vs Cloud OCR)

    Designed and implemented a smart quality‑gate pipeline that automatically decides whether to use embedded text, local OCR, or delegate to a cloud OCR provider.

    The pipeline measures text coverage, confidence, and extraction quality, and applies clear heuristics to decide when the result is good enough and when a fallback is required.

    This reduced unnecessary cloud calls, improved accuracy, and created a resilient, cost‑aware OCR flow.

    The design uses Strategy and Adapter patterns so that each OCR provider and file type is handled through a clean, testable abstraction.

  • Integration with API Gateway, Caching & Observability

    Integrated the OCR and document flows behind a central API Gateway, including request routing, health checks, structured error handling, and consistent API contracts for the frontend.

    Implemented Redis‑based caching using file fingerprints (SHA‑256 hashes) to avoid reprocessing identical documents, significantly improving performance under load.

    Added observability hooks with Prometheus metrics and structured logging, enabling monitoring of OCR paths, fallback rates, and system health for debugging and production readiness.

  • Testing, Docker & Reproducible Development Environment

    Backed the service with Jest unit and integration tests, covering the OCR pipeline, quality gate, and provider adapters.

    Built a full Docker & Docker Compose environment so the entire stack (OCR, Gateway, Redis, PostgreSQL, Kafka, monitoring) can be spun up locally in a consistent and reproducible way.

    This enabled fast onboarding, reliable demos, and realistic production‑like testing for the whole team.

  • • Wrote Jest unit and integration tests and containerized the services with Docker and Docker Compose to support a consistent, team-wide development environment.

...and more contributions not listed here

Dive in 🚀
Pnina S. - Task Preview
Pnina S.

Responsibilities:

  • Backend development in C# / .NET as part of a Documents team

  • Developed a backend service communicating with a Worker that sends CUD commands

  • Designed and implemented data models using Entity Framework Code First

  • Created database tables to store and manage document-related data

  • Implemented a validation mechanism in the service layer

  • Extensive use of interfaces, abstraction, and polymorphism

  • Worked with a clean, layered architecture and maintainable code

...and more contributions not listed here

Dive in 🚀
Ruth P. - Task Preview
Ruth P.

Responsibilities:

  • Designed and implemented backend services in NestJS and C# (.NET) to manage the full document lifecycle: upload, storage, retrieval and deletion.

  • Integrated MinIO object storage with PostgreSQL for document metadata, permissions and controlled access to documents.

  • Developed an internal React-based UI for teams to browse, search, filter and manage documents.

  • Built event-driven workflows with Kafka to publish document events (uploaded / updated / deleted) and trigger asynchronous processing.

  • Collaborated in an Agile team environment, including code reviews, Git-based workflow and close mentorship from senior engineers.

...and more contributions not listed here

Dive in 🚀