|
| 1 | +# IRT Ruby Performance Benchmarks |
| 2 | + |
| 3 | +This directory contains comprehensive performance benchmarks for the IRT Ruby gem, helping users understand the computational characteristics and scaling behavior of the different IRT models. |
| 4 | + |
| 5 | +## Available Benchmarks |
| 6 | + |
| 7 | +### 1. Performance Benchmark (`performance_benchmark.rb`) |
| 8 | + |
| 9 | +**Purpose**: Comprehensive performance analysis across different dataset sizes and model types. |
| 10 | + |
| 11 | +**What it measures**: |
| 12 | +- Execution time (iterations per second) for Rasch, 2PL, and 3PL models |
| 13 | +- Memory usage analysis (allocated/retained objects and memory) |
| 14 | +- Scaling behavior analysis (how performance changes with dataset size) |
| 15 | +- Impact of missing data strategies on performance |
| 16 | + |
| 17 | +**Dataset sizes tested**: |
| 18 | +- Tiny: 10 people × 5 items (50 data points) |
| 19 | +- Small: 50 people × 20 items (1,000 data points) |
| 20 | +- Medium: 100 people × 50 items (5,000 data points) |
| 21 | +- Large: 200 people × 100 items (20,000 data points) |
| 22 | +- XLarge: 500 people × 200 items (100,000 data points) |
| 23 | + |
| 24 | +### 2. Convergence Benchmark (`convergence_benchmark.rb`) |
| 25 | + |
| 26 | +**Purpose**: Detailed analysis of convergence behavior and optimization characteristics. |
| 27 | + |
| 28 | +**What it measures**: |
| 29 | +- Impact of tolerance settings on convergence time and success rate |
| 30 | +- Learning rate optimization analysis |
| 31 | +- Dataset characteristics impact on convergence |
| 32 | +- Missing data pattern effects on convergence |
| 33 | + |
| 34 | +**Key insights provided**: |
| 35 | +- Optimal hyperparameter settings for different scenarios |
| 36 | +- Convergence reliability across different conditions |
| 37 | +- Trade-offs between speed and accuracy |
| 38 | + |
| 39 | +## Running the Benchmarks |
| 40 | + |
| 41 | +### Prerequisites |
| 42 | + |
| 43 | +Install benchmark dependencies: |
| 44 | +```bash |
| 45 | +bundle install |
| 46 | +``` |
| 47 | + |
| 48 | +### Running Individual Benchmarks |
| 49 | + |
| 50 | +```bash |
| 51 | +# Full performance benchmark suite (takes 5-10 minutes) |
| 52 | +ruby benchmarks/performance_benchmark.rb |
| 53 | + |
| 54 | +# Convergence analysis (takes 3-5 minutes) |
| 55 | +ruby benchmarks/convergence_benchmark.rb |
| 56 | +``` |
| 57 | + |
| 58 | +### Running All Benchmarks |
| 59 | + |
| 60 | +```bash |
| 61 | +# Run both benchmark suites |
| 62 | +ruby benchmarks/performance_benchmark.rb && ruby benchmarks/convergence_benchmark.rb |
| 63 | +``` |
| 64 | + |
| 65 | +## Understanding the Results |
| 66 | + |
| 67 | +### Performance Benchmark Output |
| 68 | + |
| 69 | +1. **Iterations per Second (IPS)**: Higher is better |
| 70 | + - Shows relative speed between Rasch, 2PL, and 3PL models |
| 71 | + - Includes confidence intervals and comparison ratios |
| 72 | + |
| 73 | +2. **Memory Usage**: |
| 74 | + - Total allocated: Memory used during computation |
| 75 | + - Total retained: Memory still held after computation |
| 76 | + - Object counts: Number of Ruby objects created |
| 77 | + |
| 78 | +3. **Scaling Analysis**: |
| 79 | + - Shows computational complexity (O(n^x)) |
| 80 | + - Helps predict performance for larger datasets |
| 81 | + |
| 82 | +### Convergence Benchmark Output |
| 83 | + |
| 84 | +1. **Convergence Rate**: Percentage of runs that converged within tolerance |
| 85 | +2. **Average Iterations**: Typical number of iterations needed |
| 86 | +3. **Time**: Wall-clock time to convergence |
| 87 | + |
| 88 | +## Interpreting Results for Your Use Case |
| 89 | + |
| 90 | +### For Educational Assessment (typical: 100-1000 students, 20-100 items) |
| 91 | +- Focus on Medium to Large dataset results |
| 92 | +- Rasch model typically fastest, 3PL slowest but most flexible |
| 93 | +- Missing data strategies have < 10% performance impact |
| 94 | + |
| 95 | +### For Psychological Testing (typical: 50-500 participants, 10-50 items) |
| 96 | +- Focus on Small to Medium dataset results |
| 97 | +- All models should complete in < 1 second |
| 98 | +- Consider convergence reliability for different tolerance settings |
| 99 | + |
| 100 | +### For Large-Scale Analysis (1000+ participants) |
| 101 | +- Review XLarge dataset results and scaling analysis |
| 102 | +- Consider batching or parallel processing for very large datasets |
| 103 | +- Monitor memory usage to avoid system limits |
| 104 | + |
| 105 | +## Customizing Benchmarks |
| 106 | + |
| 107 | +You can modify the benchmark scripts to test your specific scenarios: |
| 108 | + |
| 109 | +1. **Custom Dataset Sizes**: Edit `DATASET_CONFIGS` array |
| 110 | +2. **Different Hyperparameters**: Modify tolerance, learning rate configs |
| 111 | +3. **Specific Missing Data Patterns**: Adjust missing data generation |
| 112 | +4. **Model-Specific Tests**: Focus on particular IRT models |
| 113 | + |
| 114 | +## Performance Tips |
| 115 | + |
| 116 | +Based on benchmark results: |
| 117 | + |
| 118 | +1. **Choose the Right Model**: Rasch is fastest, use 2PL/3PL only when needed |
| 119 | +2. **Optimize Tolerance**: `1e-5` typically good balance of speed/accuracy |
| 120 | +3. **Adjust Learning Rate**: Start with `0.01`, increase for faster convergence |
| 121 | +4. **Handle Missing Data**: `:ignore` strategy typically fastest |
| 122 | +5. **Consider Iteration Limits**: 100-500 iterations usually sufficient |
| 123 | + |
| 124 | +## Comparing with Other IRT Libraries |
| 125 | + |
| 126 | +These benchmarks can help you compare IRT Ruby against other implementations. Key metrics to compare: |
| 127 | + |
| 128 | +- Time per data point processed |
| 129 | +- Memory efficiency |
| 130 | +- Convergence reliability |
| 131 | +- Scaling behavior with dataset size |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +*Note: Benchmark results will vary based on your hardware. Run benchmarks on your target deployment environment for most accurate performance estimates.* |
0 commit comments