Skip to content

Commit f7fcf8a

Browse files
authored
feat: add performance benchmarks (#13)
1 parent 6fe9a42 commit f7fcf8a

File tree

7 files changed

+633
-12
lines changed

7 files changed

+633
-12
lines changed

Gemfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,7 @@ gem "rake", "~> 13.0"
99
gem "rspec", "~> 3.0"
1010

1111
gem "rubocop", "~> 1.21"
12+
13+
# Performance benchmarking
14+
gem "benchmark-ips", "~> 2.0"
15+
gem "memory_profiler", "~> 1.0"

README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,9 +125,33 @@ For 2PL and 3PL:
125125

126126
This prevents extreme or invalid parameter estimates.
127127

128+
## Performance Benchmarks
129+
130+
IRT Ruby includes comprehensive performance benchmarks to help you understand the computational characteristics of different models:
131+
132+
```bash
133+
# Run all benchmarks (takes 8-15 minutes)
134+
bundle exec rake benchmark:all
135+
136+
# Quick performance check (2-3 minutes)
137+
bundle exec rake benchmark:quick
138+
139+
# Individual benchmark suites
140+
bundle exec rake benchmark:performance
141+
bundle exec rake benchmark:convergence
142+
```
143+
144+
The benchmarks test:
145+
- **Performance**: Execution speed across dataset sizes (50 to 100,000 data points)
146+
- **Memory Usage**: Object allocation and memory efficiency
147+
- **Scaling**: How computational complexity grows with data size
148+
- **Convergence**: Optimization behavior under different conditions
149+
150+
See `benchmarks/README.md` for detailed information about interpreting results.
151+
128152
## Development
129153

130-
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake test` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
154+
After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.
131155

132156
To install this gem onto your local machine, run `bundle exec rake install`. To release a new version, update the version number in `version.rb`, and then run `bundle exec rake release`, which will create a git tag for the version, push git commits and the created tag, and push the `.gem` file to [rubygems.org](https://rubygems.org).
133157

Rakefile

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,28 @@ require "rubocop/rake_task"
1010
RuboCop::RakeTask.new
1111

1212
task default: %i[spec rubocop]
13+
14+
# Benchmark tasks
15+
namespace :benchmark do
16+
desc "Run performance benchmarks"
17+
task :performance do
18+
ruby "benchmarks/performance_benchmark.rb"
19+
end
20+
21+
desc "Run convergence analysis benchmarks"
22+
task :convergence do
23+
ruby "benchmarks/convergence_benchmark.rb"
24+
end
25+
26+
desc "Run all benchmarks"
27+
task all: %i[performance convergence] do
28+
puts "All benchmarks completed!"
29+
end
30+
31+
desc "Run quick benchmarks (reduced dataset sizes)"
32+
task :quick do
33+
puts "Running quick performance benchmark..."
34+
ENV["QUICK_BENCHMARK"] = "1"
35+
ruby "benchmarks/performance_benchmark.rb"
36+
end
37+
end

benchmarks/README.md

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
# IRT Ruby Performance Benchmarks
2+
3+
This directory contains comprehensive performance benchmarks for the IRT Ruby gem, helping users understand the computational characteristics and scaling behavior of the different IRT models.
4+
5+
## Available Benchmarks
6+
7+
### 1. Performance Benchmark (`performance_benchmark.rb`)
8+
9+
**Purpose**: Comprehensive performance analysis across different dataset sizes and model types.
10+
11+
**What it measures**:
12+
- Execution time (iterations per second) for Rasch, 2PL, and 3PL models
13+
- Memory usage analysis (allocated/retained objects and memory)
14+
- Scaling behavior analysis (how performance changes with dataset size)
15+
- Impact of missing data strategies on performance
16+
17+
**Dataset sizes tested**:
18+
- Tiny: 10 people × 5 items (50 data points)
19+
- Small: 50 people × 20 items (1,000 data points)
20+
- Medium: 100 people × 50 items (5,000 data points)
21+
- Large: 200 people × 100 items (20,000 data points)
22+
- XLarge: 500 people × 200 items (100,000 data points)
23+
24+
### 2. Convergence Benchmark (`convergence_benchmark.rb`)
25+
26+
**Purpose**: Detailed analysis of convergence behavior and optimization characteristics.
27+
28+
**What it measures**:
29+
- Impact of tolerance settings on convergence time and success rate
30+
- Learning rate optimization analysis
31+
- Dataset characteristics impact on convergence
32+
- Missing data pattern effects on convergence
33+
34+
**Key insights provided**:
35+
- Optimal hyperparameter settings for different scenarios
36+
- Convergence reliability across different conditions
37+
- Trade-offs between speed and accuracy
38+
39+
## Running the Benchmarks
40+
41+
### Prerequisites
42+
43+
Install benchmark dependencies:
44+
```bash
45+
bundle install
46+
```
47+
48+
### Running Individual Benchmarks
49+
50+
```bash
51+
# Full performance benchmark suite (takes 5-10 minutes)
52+
ruby benchmarks/performance_benchmark.rb
53+
54+
# Convergence analysis (takes 3-5 minutes)
55+
ruby benchmarks/convergence_benchmark.rb
56+
```
57+
58+
### Running All Benchmarks
59+
60+
```bash
61+
# Run both benchmark suites
62+
ruby benchmarks/performance_benchmark.rb && ruby benchmarks/convergence_benchmark.rb
63+
```
64+
65+
## Understanding the Results
66+
67+
### Performance Benchmark Output
68+
69+
1. **Iterations per Second (IPS)**: Higher is better
70+
- Shows relative speed between Rasch, 2PL, and 3PL models
71+
- Includes confidence intervals and comparison ratios
72+
73+
2. **Memory Usage**:
74+
- Total allocated: Memory used during computation
75+
- Total retained: Memory still held after computation
76+
- Object counts: Number of Ruby objects created
77+
78+
3. **Scaling Analysis**:
79+
- Shows computational complexity (O(n^x))
80+
- Helps predict performance for larger datasets
81+
82+
### Convergence Benchmark Output
83+
84+
1. **Convergence Rate**: Percentage of runs that converged within tolerance
85+
2. **Average Iterations**: Typical number of iterations needed
86+
3. **Time**: Wall-clock time to convergence
87+
88+
## Interpreting Results for Your Use Case
89+
90+
### For Educational Assessment (typical: 100-1000 students, 20-100 items)
91+
- Focus on Medium to Large dataset results
92+
- Rasch model typically fastest, 3PL slowest but most flexible
93+
- Missing data strategies have < 10% performance impact
94+
95+
### For Psychological Testing (typical: 50-500 participants, 10-50 items)
96+
- Focus on Small to Medium dataset results
97+
- All models should complete in < 1 second
98+
- Consider convergence reliability for different tolerance settings
99+
100+
### For Large-Scale Analysis (1000+ participants)
101+
- Review XLarge dataset results and scaling analysis
102+
- Consider batching or parallel processing for very large datasets
103+
- Monitor memory usage to avoid system limits
104+
105+
## Customizing Benchmarks
106+
107+
You can modify the benchmark scripts to test your specific scenarios:
108+
109+
1. **Custom Dataset Sizes**: Edit `DATASET_CONFIGS` array
110+
2. **Different Hyperparameters**: Modify tolerance, learning rate configs
111+
3. **Specific Missing Data Patterns**: Adjust missing data generation
112+
4. **Model-Specific Tests**: Focus on particular IRT models
113+
114+
## Performance Tips
115+
116+
Based on benchmark results:
117+
118+
1. **Choose the Right Model**: Rasch is fastest, use 2PL/3PL only when needed
119+
2. **Optimize Tolerance**: `1e-5` typically good balance of speed/accuracy
120+
3. **Adjust Learning Rate**: Start with `0.01`, increase for faster convergence
121+
4. **Handle Missing Data**: `:ignore` strategy typically fastest
122+
5. **Consider Iteration Limits**: 100-500 iterations usually sufficient
123+
124+
## Comparing with Other IRT Libraries
125+
126+
These benchmarks can help you compare IRT Ruby against other implementations. Key metrics to compare:
127+
128+
- Time per data point processed
129+
- Memory efficiency
130+
- Convergence reliability
131+
- Scaling behavior with dataset size
132+
133+
---
134+
135+
*Note: Benchmark results will vary based on your hardware. Run benchmarks on your target deployment environment for most accurate performance estimates.*

0 commit comments

Comments
 (0)