Last months I have been working on a web application that measures popularity of words in internet media, can calculate popularity values and make comparison between different words. It's similar to Google Trends, except that here information are time critical with live updates every 5 minutes.
Implementation of the crawler was a little bit specific because words in different html tags have different values, but using Hpricot html parser it was few lines of code. From time to time when new media were added if crawler encounters new problem specific to that site, quick fix was done and after restart everything was going well again. And, that was the case until recently when I started getting "stack level too deep (SystemStackError)" on some of the recently added media in the application.
Since I was sure that there was not an endless recursion in the code (other sites were going well), after a bit of research I found out that newer versions of Ruby (production server used Ruby 1.8.6) may solve the problem with the stack level. But after installing Ruby 1.8.7 with Ruby Version Manager on the production server, I continued getting the same error again.
Then, I thought it may be some bug or issue with Hpricot having deeper recursion than it would be with Nokogiri, but after a quick rewrite using Nokogiri parser, it continued again to display the same "stack level too deep (SystemStackError)" exception for the same media.
Next possible solutions that came to my mind were: write custom html parser (which would be time consuming with debugging invalid html tags), or change the stack level if it is possible!? I was getting the error only on production server with 64 bit architecture, and not on development machine with 32 bit architecture.
Finally, I found out about ulimit command which basically provides control over the resources available to the shell and processes started by it, on systems that allow such control.
You can see the current limits with 'ulimit -a':
dalibor@kreator:~$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
And, you can change the stack size using 'ulimit -s' command.
ulimit -s 16384
After doubling the stack size to 16384 I stopped getting the "stack level too deep (SystemStackError)" exception.
If you add the above line to .bashrc file on the server, every time you ssh to that machine, it will change the stack size and run all processes in the new environment.
At the end, I solved the problem, and hopefully it will give you an experience not to spend the whole day experimenting with different potential solutions if you have similar problem.