Last months I worked on a web application that measures popularity of words in internet media, can calculate popularity values and compare different words values. It’s similar to Google Trends, except that here information are time critical with live updates every 5 minutes.
Implementation of the crawler was a little bit specific because words in different html tags have different values, but using Hpricot html parser it was just a few lines of code to implement that. From time to time when new media were added, crawler would encounter some problem specific to that site, that is fixes easily and everything was going well again. It was all well until recently when I started getting
stack level too deep (SystemStackError) on some of the recently added media in the application.
I was sure that there was no endless recursion in the code (other sites were going well). After a bit of research I found out that newer versions of Ruby (production server used Ruby 1.8.6) may solve the problem with the stack level. But after installing Ruby 1.8.7 with Ruby Version Manager on the production server, I continued getting the same error again.
Then, I thought it may be some bug or issue with Hpricot having deeper recursion than it would be with Nokogiri, but after a quick rewrite using Nokogiri parser, it continued again to display the same
stack level too deep (SystemStackError) exception for the same media.
Next possible solutions that came to my mind was: write custom html parser (which would be time consuming by debugging invalid html tags), or change the stack level if it is possible!? I was getting the error only on production server with 64 bit architecture, and not on development machine with 32 bit architecture.
Finally, I found out about ulimit command which basically provides control over the resources available to the shell and processes started by it, on systems that allow such control.
You can see the current limits with
dalibor@kreator:~$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 20 file size (blocks, -f) unlimited pending signals (-i) 16382 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) unlimited virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
And, you can change the stack size using
ulimit -s command.
ulimit -s 16384
After doubling the stack size to 16384 I stopped getting the
stack level too deep (SystemStackError) exception.
If you add the above line to .bashrc file on the server, every time you ssh to that machine, it will change the stack size and run all processes in the new environment.
In the end, I solved the problem with that simple solution, hopefully that will prevent you from spending the whole day experimenting with different potential solutions if you have a similar problem. :)