Bio
Analytical thinker with strong attention to detail and excellent organizational skills. Excels in teamwork, takes full ownership of tasks, and demonstrates a high level of commitment to their success. Performs well under pressure, shows initiative when needed, and delivers high-quality results with dedication and persistence. Eager to grow professionally and contribute to impactful and challenging projects.
Skills
Bootcamp Project
AI-powered optimization engine for next-generation computing
Mentored by: Next Silicon
Embedded Systems Bootcamp 2025 (Embedded)
Responsibilities:
Responsibilities
- Developed full GPU integration for new SYCL operators (SET, ARANGE)
Implemented SYCL kernels, connected them to the GGML operator system, and integrated them into the llama.cpp graph builder, including validation on real GPU hardware.
- Optimized SET and ARANGE operators with full benchmarking
Performed kernel-level GPU optimizations, executed controlled Before/After benchmarks, and achieved significant speedups, including a ×4 performance improvement for ARANGE.
- Opened PRs and merged new operators into upstream
Handled maintainer reviews, refined code design, updated documentation, ensured CI stability, and successfully merged both operators into ggml / llama.cpp.
- 🔗 Merged PR — SET: [LINKTOSET_PR]
- 🔗 Merged PR — ARANGE: [LINKTOARANGE_PR]
- Designed and implemented the Sparse-K algorithm to reduce complexity
Developed a mechanism that reduces Attention complexity from O(n²) to O(n·k) by dynamically selecting the Top-K most relevant tokens per query.
- 🔗 Sparse-K PR / branch: [LINKTOSPARSEKPROR_BRANCH]
- Built a dynamic Sparse Mask generator during graph construction
Implemented buildsparsekmask, generating the mask using existing GGML operators only, without modifying model weights or model architecture.
- Integrated Sparse-K into Flash Attention based on maintainer feedback
Adjusted the design so Sparse-K is computed inside Flash Attention during graph build, following reviewer guidelines for clean, backend-consistent integration.
- Enabled full Sparse-K usage across all Attention layers
Ensured the Sparse-K mask is automatically applied in every Attention layer, with no additional per-layer code required.
- Conducted performance and accuracy evaluations vs. baseline
Ran full Prompt Evaluation and Decode benchmarks, performed profiling and comparisons, and validated a 2.3× speedup with no accuracy degradation.
- Validated backend compatibility using existing GGML operators (e.g., CUDA)
Since Sparse-K relies exclusively on existing GGML operators, any supported backend (such as CUDA) can execute it naturally. Performed HPC runs to confirm correctness and efficiency.
- Converted Hugging Face models to GGUF and embedded Sparse-K parameters
Downloaded models, performed GGUF conversion and quantization, and added Sparse-K metadata fields so all Sparse-K settings are loaded directly from the model, without environment variables.

Additional Projects
Photo Printing Management System:
• Developed a photo-printing management system using .NET Core, MVC, Entity Framework, and SQL Server.
• Implemented UI, order processing, and image handling based on clean architecture principles.
• Integrated PDF invoice emailing using iText, GemBox, and Gmail SMTP, exceeding project requirements.
• Participated in daily stand-ups, sprint planning, and code reviews.
Online Store Project:
• Development of a responsive e-commerce web application using React, Redux, and React Router,
including dynamic navigation, shopping cart, and checkout workflows.
• Built a modern UI with MUI and Bootstrap, and integrated additional libraries such as jsPDF, along with
features that extended beyond the original project requirements.
• The project was developed without AI code-generation tools, with all implementation written manually.
• Deployed on Vercel as an independent initiative to enhance the project and improve the user experience.
English Level
Fluent