And a happy new year!

I wish you all my best wishes for this new year.
I hope 2013 will be as exciting as 2012 for the MySQL community.

Thank you for spending time on MySQL[Plus] in 2012, I hope to have more and more readers in 2013.

If you would like to read (or read back) my posts published in 2012, here is the summary :

And all the most useful slides of 2012 are also on [Plus] :  http://www.mysqlplus.net/mysql-docs/

Don’t forget to follow me on twitter for the latest news about the MySQL ecosystem :


And if you didn’t discover MyXplain before, please, take a look : http://myxplain.net

Happy new year again on [Plus]

Temporary file behavior… (and how lsof save my life)

I would like to share this story based on a true event about the temporary files behavior in MySQL.

MONyog reports this error to my already full mailbox several times a day :


 

1 – Catch the query (if you can) !

 

I don’t have access to the client logs but I would like to know which query is involved in this error.

Let me explain how I can retrieve informations about this query with MONyog and a very simple shell loop :

  •  Enable the query sniffer in MONyog (based on processlist) : Edit server -> Advanced settings -> Sniffer settings

 

  • Monitor your MySQL TMPDIR directory with this simple shell loop :
  • [ Use this command to retrieve the MySQL temporary directory : show variables like ‘tmpdir’; ]
  • [ In this case : tmpdir=/database/tmp ]
while [ 1 = 1 ]; do { date;ls -artlh /database/tmp ; df -h /database/tmp;
  lsof | grep mysql | grep /database/tmp;sleep 1; }; done
 
With this loop, I can follow in real time the informations about the files created in the MySQL temporary directory and it will be very useful to find how exactly the error happens (see below).

With the MONyog query sniffer, I found the query involved by comparing the two emails sent by MONyog (Error message above and the emails about long queries).
The explain plan of this query was as follow :

 

And finaly, the real error was :
ERROR 3 (HY000): Error writing file '/database/tmp/MYakhJC5' (Errcode: 28)
 Error (Code 3): Error writing file '/database/tmp/MYakhJC5' (Errcode: 28)
 Error (Code 1028): Sort aborted
# perror 28
 OS error code  28:  No space left on device

 

2 – No space left on device, really ?!

 

Let see the filesystem definition for /database/tmp with the df command :
And here is a list view of the files in the directory when the error occured :
 total 2,3G
 drwxrwx--- 11 mysql mysql 4,0K 16 févr. 10:37 ..
 -rw-rw----  1 mysql mysql    6  6 avril 15:39 mysql.pid
 srwxrwxrwx  1 mysql mysql    0  6 avril 15:39 mysql.sock
 -rw-rw----  1 mysql mysql 1,0K 12 avril 15:41 #sql_7237_0.MYI
 -rw-rw----  1 mysql mysql 2,3G 12 avril 15:42 #sql_7237_0.MYD
 -rw-rw----  1 mysql mysql 1,0K 12 avril 15:42 #sql_7237_5.MYI
 -rw-rw----  1 mysql mysql    0 12 avril 15:42 #sql_7237_5.MYD
 drwxrwx---  2 mysql mysql 4,0K 12 avril 15:43 .
The /database/tmp filesystem has a size of 6,5Go and there was a total of 2,3Go of files when the error occured !

 

3 – “Was passiert ?!”

 

The lsof command was my very good friend to discover what happened.
Let see the trace of my tiny loop when the error occured (lsof part only) :

 

 

I can see a 2,3Go temporary file but there are two other temporary files which continue to grow until the end of the world :-(
And these two (deleted) files appears to be the cause of my issue.

 

4 – What next ?

 

The killer question : why did you have only 6.5 GB for your temporary space ?
Answer : Why not :-)

You have to consider this behavior to set your temporary space properly, specially if this space is a ramdisk filesystem.

Tschüs !

PS : I let the experts explains why these two additional files are created (hint : look at the Exta column)