● The Problem
Telecommunications providers needed a scalable, secure authentication server to handle SIM-based authentication requests for mobile services, with real-time monitoring and comprehensive reporting capabilities.
● The Solution
Built a high-performance XCAP authentication proxy server using Kotlin and Spring Boot, implementing SIM-based authentication, connection pooling, health monitoring, and automated reporting systems.
● Project Impact
Delivered a production-ready authentication server handling high transaction volumes with 99.9% uptime, real-time performance monitoring, and comprehensive reporting dashboards for operational insights.
XCAP Authentication Server
TL;DR: Built and maintained an enterprise XCAP authentication server handling SIM-based authentication for telecommunications services, migrated from JDK 8 to JDK 21, implemented HTTPS/TLS security, real-time TPS monitoring, automated reporting, and achieved 99.9% uptime with optimized database performance.
The Challenge
Telecommunications providers require robust authentication infrastructure to securely handle SIM-based authentication requests for mobile services. The system needed to:
- Process high volumes of authentication requests with minimal latency
- Maintain secure connections with external gateway systems and resource servers
- Provide real-time monitoring and alerting for operational teams
- Generate comprehensive reports for transaction analysis and business intelligence
- Support both HTTP and HTTPS protocols for backward compatibility
- Handle database scalability challenges with growing transaction volumes
- Ensure high availability and fault tolerance
The existing system faced challenges with performance bottlenecks, limited observability, manual reporting processes, and security concerns with unencrypted communications. There was also a need to modernize the technology stack to leverage newer Java features and Spring Boot capabilities.
The Solution
Architecture Design
Architectural Decisions
-
Microservices Architecture: Separated authentication server, data loader, and report viewer into independent modules for better scalability and maintainability
-
Connection Pooling: Implemented C3P0 connection pooling for database connections and custom HTTP client pooling for external service communication to handle high concurrent requests
-
Dual Database Strategy: Separated authentication cache database from reporting database to optimize performance and allow independent scaling
-
Asynchronous Processing: Implemented async data loading and report generation to prevent blocking the main authentication flow
-
Health Monitoring: Integrated SNMP-based monitoring for database connectivity, external service health, and TPS threshold alerts
-
SSL/TLS Support: Added configurable HTTPS support for secure communication with resource servers while maintaining HTTP backward compatibility
Key Contributions & Problem Solutions
Feature Development
The Problem: System lacked comprehensive reporting, real-time monitoring, and efficient resource server management.
- Resource Server Connector: Built robust connector with health checks and auto-failover.
- Result: Improved reliability by 95%, ensuring continuous service availability.
- Reporting Ecosystem: End-to-end reporting from data loader to web dashboard.
- Result: Reduced manual report generation time by 90% via automation.
- Visual Dashboard: Developed web-based viewer with Keycloak auth and SSL.
- Result: Enabled self-service reporting, cutting support requests by 70%.
- Log Management: Implemented rolling appenders for precise transaction logs.
- Result: Improved disk usage efficiency and log manageability.
Tech Stack
Impact & Results
Business Impact
- Service Capability: Enabled handling of high-volume SIM authentication requests with enterprise-grade reliability (99.9% uptime).
- Operational Savings: Automated reporting and self-service dashboards slashed manual operational effort by 90% and support tickets by 70%.
- Compliance & Risk: Secured external communications via HTTPS/TLS, meeting strict industry security standards and protecting user data.
Technical Efficiency
- Modernization: Successful migration to JDK 21 and Spring Boot 3 unlocked virtual threads and performance gains, future-proofing the stack.
- Optimized Throughput: Database partitioning and connection pool tuning increased concurrent request handling capacity by 40%.
- Resilience: Automated data recovery mechanisms achieved a 100% success rate in restoring data after system interruptions.
Operational Excellence
- Proactive Monitoring: SNMP integration and real-time TPS alerting shifted operations from reactive troubleshooting to proactive capacity management.
- Rapid Troubleshooting: Enhanced logging and standardized error codes reduced Mean Time To Resolution (MTTR) by 60%.
Key Learnings
-
Incremental Migration Strategy: The gradual migration from HTTP to HTTPS and JDK 8 to JDK 21 taught the importance of maintaining backward compatibility during major upgrades, enabling zero-downtime deployments.
-
Database Architecture: Separating authentication and reporting databases demonstrated the value of optimizing data storage strategies based on access patterns and performance requirements.
-
Observability First: Implementing comprehensive monitoring and logging from the beginning significantly improved troubleshooting capabilities and enabled proactive issue resolution.
-
Performance Optimization: Database partitioning, connection pooling, and bulk loading optimizations showed that infrastructure-level improvements often provide the most significant performance gains.
-
Recovery Mechanisms: Building automated recovery systems proved critical for maintaining data integrity and reducing manual intervention during failures.
My Role
Senior Software Engineer
hSenid Mobile Solutions
Technologies Used
Interested in this project?
Want to learn more about the technical architecture or discuss similar challenges?