<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jurriaanpersyn.com</title>
	<atom:link href="http://www.jurriaanpersyn.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.jurriaanpersyn.com</link>
	<description></description>
	<lastBuildDate>Sun, 11 Jul 2010 10:33:41 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Free Steeman</title>
		<link>http://www.jurriaanpersyn.com/archives/2010/07/11/free-steeman/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2010/07/11/free-steeman/#comments</comments>
		<pubDate>Sun, 11 Jul 2010 10:33:41 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[familie]]></category>
		<category><![CDATA[family]]></category>
		<category><![CDATA[free steeman]]></category>
		<category><![CDATA[pepe]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=388</guid>
		<description><![CDATA[Tekst die ik voorlas in de mis voor m&#8217;n grootvader, geschreven door m&#8217;n mama.
“Ja maar … Ik heb liever dat ge niet allemaal tegelijk komt”, zei je ons. Bang voor die lange, eenzame dagen tijdens je verblijf in het ziekenhuis.
Sorry pa. Deze keer kunnen we je wens niet inwilligen. Dit is onze laatste kans om [...]]]></description>
			<content:encoded><![CDATA[<p>Tekst die ik voorlas in de mis voor m&#8217;n grootvader, geschreven door m&#8217;n mama.</p>
<blockquote><p>“Ja maar … Ik heb liever dat ge niet allemaal tegelijk komt”, zei je ons. Bang voor die lange, eenzame dagen tijdens je verblijf in het ziekenhuis.</p>
<p>Sorry pa. Deze keer kunnen we je wens niet inwilligen. Dit is onze laatste kans om afscheid te nemen, samen met al je dierbaren.<br />
Afscheid nemen in dankbaarheid. Dat ben ik je echt wel verschuldigd. Ik heb immers zoveel van jou gekregen, ik heb zoveel van jou geleerd.</p>
<p>We hebben je steeds gekend als een bezorgde man, vader en grootvader. Wij waren je meest kostbare bezit en dat heb je ons heel vaak laten voelen. Jarenlang stond je dagelijks paraat om Jurriaan op te vangen toen wij in alle vroegte al vertrokken waren naar het werk. Nooit zei je ”nee, geen tijd”, toen we je vroegen om alweer een klus te komen opknappen in ons huis. Ik moet nog steeds glimlachen om jouw bezorgdheid als één van ons nog maar een onschuldige verkoudheid had. “Je laat de dokter toch komen”, reageerde je. Jij, die toen zelf zelden of nooit naar de dokter liep.<br />
Je cijferde jezelf soms helemaal weg in functie van je gezin. En zo heb ik er nog gezien in onze familie.</p>
<p>De laatste jaren heb jij de zorg voor ons ma op je genomen. Helemaal onverwacht kreeg je er een totaal nieuwe taak bij. Het huishouden kwam op jouw schouders terecht.<br />
Zoals zovele mannen van jouw generatie kon je aanvankelijk nog niet eens een ei bakken. Maar nieuwsgierig en actief als je was, ging je aan de slag. Je leek bijna de ambitie te hebben een kok te worden, toen je voor je nieuwjaar kookboeken als geschenk vroeg. Het was fijn om je te zien genieten van je nieuwe kennis en kunde. Je hebt ons getoond dat we nooit te oud zijn om te leren.</p>
<p>Mijn liefde voor de natuur heb ik ook aan jou te danken. Jij moet zowat de eerste geweest zijn die me het verschil tussen de zang van een lijster en een merel leerde kennen.<br />
Voor jou was verbondenheid met de natuur een levenshouding. Ik heb er mijn beroep van gemaakt.<br />
Elke vogel die ik zal horen op onze reizen en tijdens onze wandelingen zal de herinnering aan jou levendig houden.<br />
Je vond ook dat de natuur er niet enkel was om van te genieten. “Een mens moest ook eten, hè?” Vandaag wil ik het toegeven &#8230; je hebt daarin gelijk.</p>
<p>Toen ik deze week samen zat met mijn kinderen, kwamen ook onze wekelijkse bezoekjes aan tentoonstellingen ter sprake. Je hield van mooie dingen, zowel van beeldend werk als van muziek. Je leerde ons kijken en je liet ons meegenieten van je bewondering en ontroering. Die ontroering en begeestering heb je ons nagelaten in je eigen werk. De natuur, uiteraard, en naakte vrouwen waren je favoriete onderwerpen. Dat laatste tot lichte ergernis van ons ma.<br />
Je schilderijen en tekeningen zullen een blijvend aandenken zijn aan jouw bijzondere kijk op de wereld.</p>
<p>Jij was voor mij een echte “oude, wijze man”. Je werd verdraagzamer en ruimdenkender met de jaren. Naarmate je ouder werd, hoorde ik steeds minder: ”Wat zullen de anderen daarvan zeggen?” Iedereen mocht zijn wie hij was. De keuzes die wij maakten, zouden misschien niet de jouwe geweest zijn, maar je bleef ons steunen. Door dik en dun. Ook al was je ongetwijfeld bezorgd om de afloop ervan.</p>
<p>Onze kinderen hadden een pépé die mee kon praten over hun favoriete TV programma&#8217;s, die luisterde naar en genoot van hun muziek. Als we voor een zoveelste keer de uitzendingen van “In de gloria” zullen herbekijken, zullen we je zeker horen lachen op de achtergrond.</p>
<p>Als kind hebben we je wel eens vervloekt om je soberheid &#8211; in onze ogen was het gierigheid. Geen reizen, geen dure kleren, geen restaurantbezoekjes. Voor jou was dat evident, want je was de enige kostwinner in huis. Een ochtend aan de visput, een etentje aan de Donk in de grote vakantie of eens uit de bol gaan tijdens de jaarmarkt, waren de enige uitspattingen die je je veroorloofde. Je toont ons dat geluk meestal in simpele dingen te vinden is.</p>
<p>Het laatste jaar heb ik vooral je doorzettingsvermogen kunnen ervaren. Je bleef vechten, ook op moeilijke momenten. Nauwelijks 2 weken geleden vroeg je nog naar een fitnesstoestelletje om je spieren te trainen. Je moest en zou immers weer in je tuin gaan werken, binnenkort.<br />
Jammer genoeg is er nu geen binnenkort meer voor jou.<br />
Je wou nog zo graag verder.<br />
Je wou nog zoveel verwezenlijken.<br />
Je wou nog zoveel zorg dragen.</p>
<p>Lieve pa, ik hoop dat we met z’n allen een stukje van jouw dromen en levenswerk kunnen verderzetten.</p>
<p>Rust nu maar uit.</p>
<p>Roos</p></blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2010/07/11/free-steeman/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to memcached</title>
		<link>http://www.jurriaanpersyn.com/archives/2010/05/27/introduction-to-memcached/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2010/05/27/introduction-to-memcached/#comments</comments>
		<pubDate>Thu, 27 May 2010 21:52:32 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[tech]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[ikdoeict]]></category>
		<category><![CDATA[invalidation]]></category>
		<category><![CDATA[kaho st. lieven]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[php]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=354</guid>
		<description><![CDATA[These are the slides to a talk I did earlier this week for students of the professional bachelor in ICT course at KaHo St. Lieven. I wanted to give a clear and simple introduction to the memcached service, as I think it&#8217;s an invaluable tool in today&#8217;s web development. 

]]></description>
			<content:encoded><![CDATA[<p>These are the slides to a talk I did earlier this week for students of the <a href="http://www.ikdoeict.be/en">professional bachelor in ICT course</a> at <a href="http://www.kahosl.be">KaHo St. Lieven</a>. I wanted to give a clear and simple introduction to the memcached service, as I think it&#8217;s an invaluable tool in today&#8217;s web development. </p>
<div style="width:710px" id="__ss_4296041"><object id="__sse4296041" width="710" height="568"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=introductiontomemcached-kahost-lieven-may25th2010-100525142530-phpapp02&#038;stripped_title=introduction-to-memcached" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse4296041" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=introductiontomemcached-kahost-lieven-may25th2010-100525142530-phpapp02&#038;stripped_title=introduction-to-memcached" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="710" height="568"></embed></object></div>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2010/05/27/introduction-to-memcached/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Netlog Team Event 2010 &#8211; Madesimo, Italy</title>
		<link>http://www.jurriaanpersyn.com/archives/2010/05/25/netlog-team-event-2010-madesimo-italy/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2010/05/25/netlog-team-event-2010-madesimo-italy/#comments</comments>
		<pubDate>Tue, 25 May 2010 08:51:03 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[italy]]></category>
		<category><![CDATA[madesimo]]></category>
		<category><![CDATA[netlog]]></category>
		<category><![CDATA[snowboarding]]></category>
		<category><![CDATA[team event]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=351</guid>
		<description><![CDATA[
A compilation of videos from 3 days of snow fun with the Netlog Team in Madesimo, Italy. March 2010.
]]></description>
			<content:encoded><![CDATA[<p><object width="710" height="533"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=11130662&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=11130662&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="710" height="533"></embed></object>
<p>A compilation of videos from 3 days of snow fun with the <a href="http://beta.nl.netlog.com/groups/teamevent2010">Netlog Team in Madesimo</a>, Italy. March 2010.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2010/05/25/netlog-team-event-2010-madesimo-italy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Developing Social Games in the Cloud</title>
		<link>http://www.jurriaanpersyn.com/archives/2010/05/25/developing-social-games-in-the-cloud/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2010/05/25/developing-social-games-in-the-cloud/#comments</comments>
		<pubDate>Tue, 25 May 2010 08:48:17 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[tech]]></category>
		<category><![CDATA[amazon aws]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[gatcha]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[presentation]]></category>
		<category><![CDATA[social games]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=345</guid>
		<description><![CDATA[These are slides from a presentation I gave with Pieter De Schepper at a PHP Benelux meeting last month on the topic of building a social gaming platform, Gatcha &#8211; the project I&#8217;m currently working on, on top of Amazon Web Services.

]]></description>
			<content:encoded><![CDATA[<p>These are slides from a presentation I gave with <a href="http://nl.netlog.com/Pieter">Pieter De Schepper</a> at a <a href="http://phpbenelux.eu/en/2010-meeting-ghent">PHP Benelux</a> meeting last month on the topic of building a social gaming platform, <a href="http://www.gatcha.com/">Gatcha</a> &#8211; the project I&#8217;m currently working on, on top of <a href="http://aws.amazon.com/">Amazon Web Services</a>.</p>
<div style="width:710px" id="__ss_3901151"><object id="__sse3901151" width="710" height="568"><param name="movie" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=developingsocialgamesinthecloud-slideshare-100429072040-phpapp02&#038;stripped_title=developing-social-games-in-the-cloud" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed name="__sse3901151" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=developingsocialgamesinthecloud-slideshare-100429072040-phpapp02&#038;stripped_title=developing-social-games-in-the-cloud" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="710" height="568"></embed></object></div>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2010/05/25/developing-social-games-in-the-cloud/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Les 2 Alpes 2010</title>
		<link>http://www.jurriaanpersyn.com/archives/2010/03/10/les-2-alpes-2010/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2010/03/10/les-2-alpes-2010/#comments</comments>
		<pubDate>Wed, 10 Mar 2010 08:45:32 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[linkdump]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=342</guid>
		<description><![CDATA[
]]></description>
			<content:encoded><![CDATA[<p><object width="710" height="398"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=9822035&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color= 00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=9822035&amp;server=vimeo.com&amp;show_title=0&amp;show_byline=0&amp;show_portrait=0&amp;color= 00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="710" height="398"></embed></object></p>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2010/03/10/les-2-alpes-2010/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>RE: End of an era</title>
		<link>http://www.jurriaanpersyn.com/archives/2010/02/02/re-end-of-an-era/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2010/02/02/re-end-of-an-era/#comments</comments>
		<pubDate>Tue, 02 Feb 2010 10:32:08 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[tech]]></category>
		<category><![CDATA[adobe]]></category>
		<category><![CDATA[flash]]></category>
		<category><![CDATA[html5]]></category>
		<category><![CDATA[ipad]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[macosx]]></category>
		<category><![CDATA[platform]]></category>
		<category><![CDATA[web standards]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=332</guid>
		<description><![CDATA[Disclaimer: This post wasn&#8217;t planned. In fact, it&#8217;s a comment on &#8216;End of an era&#8216;, a blog post by my good friend Lennart, that&#8217;s gotten a bit out of hand. So much it didn&#8217;t qualify as a comment any more. Reading that post &#8211; and its references &#8211; first might help in trying to follow [...]]]></description>
			<content:encoded><![CDATA[<p><small><em>Disclaimer: This post wasn&#8217;t planned. In fact, it&#8217;s a comment on &#8216;<a href="http://lensco.be/2010/02/01/end-of-an-era/">End of an era</a>&#8216;, a blog post by my good friend Lennart, that&#8217;s gotten a bit out of hand. So much it didn&#8217;t qualify as a comment any more. Reading that post &#8211; and its references &#8211; first might help in trying to follow what I&#8217;m on about.</em></small></p>
<p>I get the impression there&#8217;s a whole bunch of people hoping to bring the <em>Flash&#8217; coffin to the grave</em>, rather sooner then later. Here&#8217;s the arguments I&#8217;m hearing:</p>
<ul>
<li>Closed technologies suck.</li>
<li>Flash websites suck.</li>
<li>Flash is for ads. <em>And ads suck, obviously.</em></li>
<li>The web has innovated. We don&#8217;t need flash.</li>
</ul>
<p>All of this is true.<br />
In some way.<br />
But the conclusion Lennart, and with him many other <em>evangelists</em> make, is &#8211; in my humble opinion &#8211; wrong, stupid and ignorant.</p>
<h3 style="margin-top: 25px;">Closed technologies suck.</h3>
<p>The car you drive, the stereo you use, the snowboard you skate, hell, even the OS and phone the Apple fans use (that includes me!), are all &#8211; in some extent &#8211; closed technologies. This post is typed on a machine that <em>totally depends on one multinational&#8217;s benevolence</em>. iTunes is closed software, right? But I still totally love it as my day-to-day media player.<br />
I totally support open standards and open technologies &#8211; I make my living out of it -, but since when is that a synonym for thinking there&#8217;s no room, or even need (!), for closed technology?</p>
<h3 style="margin-top: 25px;">Flash websites suck.</h3>
<p>Last time Google indexed the web it probably found a gazillion <abbr title="Plain Old Semantic HTML">POSH</abbr> websites that are hard to navigate. Be it on the PC, iPhone or iPad; tiny click areas and unreadable color combinations aren&#8217;t an Adobe patent.<br />
Apple tries to control quality for their iPhone OS Apps through whitelisting apps in the AppStore, making sure awful apps don&#8217;t get distributed. A policy that I can understand. But then why do they allow every single website, as sucky as it might be?</p>
<p>There&#8217;s both crap and good stuff for all platforms. Since when do we blame the platform itself?</p>
<h3 style="margin-top: 25px;">Flash is for ads. <em>And ads suck, obviously.</em></h3>
<p>Why is flash being used so much as an advertisement technology? Because of its support and capabilities. From <a href="http://farukat.es/journal/2010/02/385-so-long-and-thanks-for-all-the-flash">Farukat.es</a>:</p>
<blockquote><p>
&#8220;Ever heard of an SVG blocker? A CSS3 blocker? They don’t exist because they’re not considered necessary; these technologies are open, but more importantly, they don’t get abused and they don&#8217;t create terrible user experiences.&#8221;
</p></blockquote>
<p>Is he joking? Not possible to create terrible user experiences with CSS3 and SVG?! What is annoying? Overlays that are hard to close, too much animation, ugly colors, autoplay of videos and sound, fake UI elements that don&#8217;t do what you expected them to do. Which one of those annoyances is only possible with Flash and not with CSS3 and SVG?</p>
<p>The only reason I see for the non existence of SVG or CSS3 blockers is because ad agencies don&#8217;t use those technologies. Yet. And they don&#8217;t use &#8216;em because not every user&#8217;s browser supports them properly, in a performant way. Yet.</p>
<p>Ad agencies use annoying ads because they want to make money, just the way salesmen can use annoying techniques. Be it on the phone or at your doorstep. Be it with flash or with open technologies.</p>
<h3 style="margin-top: 25px;">The web has innovated. We don&#8217;t need flash.</h3>
<p>Oh, hallelujah, the awesome stuff that you can do with open standards! Gradients, masks, reflections, animations, transforms, transitions, speedier Javascript, local storage; hooray!<br />
No really, I love all this, it gets me excited. I&#8217;m a web developer, and I&#8217;m now handed new toys to improve my projects. I love the web, I love open standards, and I love all these new possibilities and the direction it&#8217;s going in. I&#8217;ve even advocated against the use of Flash for several parts of projects I was involved in. Either because better alternatives are (now) available or because more people in the company I work for could then implement and maintain that piece of code.</p>
<p>But, let&#8217;s face it; most of this isn&#8217;t innovation, it&#8217;s catching up with was already possible years ago. Yes, for some use cases Flash has lost its relevance and HTML5 now offers better alternatives. But don&#8217;t forget Flash has also helped innovating the web, has brought new possibilities and new ideas to this platform. (On a side note: Adobe even has offered us &#8211; web developers &#8211; the ability to easily create desktop apps through Adobe Air. While it sure isn&#8217;t the best platform to develop a MacOSX app on (hello there, Cocoa developers!), I still think that&#8217;s a good thing. I&#8217;m using Air apps on a daily basis.)</p>
<p>It is stupid to rule out the engineers of Adobe to continue trying to innovate the web. It is stupid to deny access to content that&#8217;s only available in flash right now. <em>(You don&#8217;t get denied access to a HTML site that has more than 5 animated gifs, or &#8211; god forbid &#8211; tables-for-layout, either.)</em> It is stupid to not use the potential of all those smart flash developers out there who do know a thing or two about usability and let them experiment on those new devices to create Plain Old Good Apps. <em>(Disclaimer: I&#8217;m sitting in the room with three of them.)</em></p>
<p>I&#8217;m not saying Apple should support flash on their device. The iPhone and the iPad are theirs, if they decide not to support flash, that&#8217;s their choice. (Closed technology, right?)<br />
In fact, personally I haven&#8217;t missed it all that much on my iPhone. And maybe flash even isn&#8217;t ready for these platforms. Is it fast enough? Can the OS support everything flash needs? I&#8217;d rather have a good implementation of flash then a slow buggy one.</p>
<p>I only don&#8217;t understand the shortsightedness of some of these evangelists who seem to be on a crusade against the availability of flash on those platforms, or any platform in fact. Invest your time in something else. Create good apps, support standards, teach best practices, talk about guidelines, experiment with the new possibilities, but &#8211; hell no &#8211; don&#8217;t try to start a battle against platforms that aren&#8217;t yours. It makes you look grumpy. Really.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2010/02/02/re-end-of-an-era/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>No Worries!</title>
		<link>http://www.jurriaanpersyn.com/archives/2010/01/17/no-worries/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2010/01/17/no-worries/#comments</comments>
		<pubDate>Sun, 17 Jan 2010 11:18:16 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[australia]]></category>
		<category><![CDATA[new zealand]]></category>
		<category><![CDATA[noworries]]></category>
		<category><![CDATA[sabbatical]]></category>
		<category><![CDATA[trip]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=325</guid>
		<description><![CDATA[From the end of July this year, me and my girlfriend will be leaving for a trip to Oceania and Southeast Asia. A nine month trip will mainly take us to Australia and New Zealand. If things become even more quiet over here than they already were, you might want to head over to noworries.jurriaanpersyn.com [...]]]></description>
			<content:encoded><![CDATA[<p>From the end of July this year, me and my girlfriend will be leaving for a trip to Oceania and Southeast Asia. A nine month trip will mainly take us to Australia and New Zealand. If things become even more quiet over here than they already were, you might want to head over to <a href="http://noworries.jurriaanpersyn.com/">noworries.jurriaanpersyn.com</a> for posts about the upcoming trip and travel reports from while we&#8217;re there <em>(in Dutch)</em>.</p>
<p>I&#8217;m so much looking forward to leaving with my girlfriend Laura for what will most definitely become one of the greatest adventures of our life, and surely the biggest so far. </p>
<div style="text-align: center;"><img src="http://farm1.static.flickr.com/48/143762891_b4de356eb9.jpg" /><br /><em>Photo by <a href="http://www.flickr.com/photos/pierre_pouliquin/143762891/">Pierre Pouliquin</a></em></div>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2010/01/17/no-worries/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>California 2009</title>
		<link>http://www.jurriaanpersyn.com/archives/2009/11/12/california-2009/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2009/11/12/california-2009/#comments</comments>
		<pubDate>Thu, 12 Nov 2009 10:34:23 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[life]]></category>
		<category><![CDATA[2009]]></category>
		<category><![CDATA[california]]></category>
		<category><![CDATA[death valley national park]]></category>
		<category><![CDATA[joshua tree national park]]></category>
		<category><![CDATA[jurriaan]]></category>
		<category><![CDATA[laura]]></category>
		<category><![CDATA[los angeles]]></category>
		<category><![CDATA[san francisco]]></category>
		<category><![CDATA[southwest usa]]></category>
		<category><![CDATA[vacation]]></category>
		<category><![CDATA[yosemite national park]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=302</guid>
		<description><![CDATA[Reisverslag en foto&#8217;s van onze vakantie in California. Zomer 2009.



California 2009 from Jurriaan Persyn on Flickr.
]]></description>
			<content:encoded><![CDATA[<p>Reisverslag en foto&#8217;s van onze vakantie in California. Zomer 2009.</p>
<p><object width="710" height="400"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=7519819&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=7519819&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="710" height="400"></embed></object></p>
<p><object width="710" height="400"><param name="allowfullscreen" value="true" /><param name="allowscriptaccess" value="always" /><param name="movie" value="http://vimeo.com/moogaloop.swf?clip_id=7533016&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" /><embed src="http://vimeo.com/moogaloop.swf?clip_id=7533016&amp;server=vimeo.com&amp;show_title=1&amp;show_byline=1&amp;show_portrait=0&amp;color=00adef&amp;fullscreen=1" type="application/x-shockwave-flash" allowfullscreen="true" allowscriptaccess="always" width="710" height="400"></embed></object></p>
<p><object width="710" height="532"><param name="flashvars" value="offsite=true&#038;lang=en-us&#038;page_show_url=%2Fphotos%2Foemebamo%2Fsets%2F72157622024019363%2Fshow%2F&#038;page_show_back_url=%2Fphotos%2Foemebamo%2Fsets%2F72157622024019363%2F&#038;set_id=72157622024019363&#038;jump_to="></param><param name="movie" value="http://www.flickr.com/apps/slideshow/show.swf?v=71649"></param><param name="allowFullScreen" value="true"></param><embed type="application/x-shockwave-flash" src="http://www.flickr.com/apps/slideshow/show.swf?v=71649" allowFullScreen="true" flashvars="offsite=true&#038;lang=en-us&#038;page_show_url=%2Fphotos%2Foemebamo%2Fsets%2F72157622024019363%2Fshow%2F&#038;page_show_back_url=%2Fphotos%2Foemebamo%2Fsets%2F72157622024019363%2F&#038;set_id=72157622024019363&#038;jump_to=" width="710" height="532"></embed></object></p>
<p><a href="http://www.flickr.com/photos/oemebamo/sets/72157622024019363/show/">California 2009</a> from <a href="http://www.flickr.com/photos/oemebamo">Jurriaan Persyn</a> on <a href="http://www.flickr.com/">Flickr</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2009/11/12/california-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Netlog Developer Day, April 2nd, Brussels</title>
		<link>http://www.jurriaanpersyn.com/archives/2009/03/05/netlog-developer-day-april-2nd-brussels/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2009/03/05/netlog-developer-day-april-2nd-brussels/#comments</comments>
		<pubDate>Thu, 05 Mar 2009 16:02:40 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[tech]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[api]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[developer]]></category>
		<category><![CDATA[gaming]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[monetization]]></category>
		<category><![CDATA[netlog]]></category>
		<category><![CDATA[opensocial]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[platform]]></category>
		<category><![CDATA[scalability]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=283</guid>
		<description><![CDATA[A month from now, the first Netlog Developer day will be held. On this one-day, free admission event we&#8217;ll be discussing oa. the OpenSocial API, the gaming platform and monetization possibilities on Netlog.

	

OpenSocial defines a common API for social applications across multiple websites. With standard JavaScript and HTML, developers can create apps that access a [...]]]></description>
			<content:encoded><![CDATA[<p>A month from now, the first Netlog Developer day will be held. On this one-day, free admission event we&#8217;ll be discussing oa. the <a href="http://en.netlog.com/go/developer/opensocial">OpenSocial API</a>, the <a href="http://en.netlog.com/go/about/blog/blogid=2978346">gaming platform</a> and <a href="http://en.netlog.com/go/developer/documentation/article=opensocialcreditsextension">monetization possibilities</a> on Netlog.</p>
<div style="text-align: center; padding: 10px;">
	<a href="http://nl.netlog.com/go/about/amiando=417031" style="border: 0; background: none; padding: 0; margin: 0;"><img src="http://v.netlogstatic.com/v5.00/1365//s/i/misc/about/amiando_bbday.png" style="border: 0;" /></a>
</div>
<p><a href="http://code.google.com/apis/opensocial/">OpenSocial</a> defines a common API for social applications across multiple websites. With standard JavaScript and HTML, developers can create apps that access a social network&#8217;s friends and update feeds.<br />
Netlog is one of the social networks supporting this standard. So, if you&#8217;re interested in developing social applications and <em>while-doing-that</em> would like to learn a little more about Netlog, this event is for you. Topics will also include performance and scaling, giving you an insight on how a site like Netlog is built.</p>
<p>In the afternoon there will be a codelab, where Google and Netlog developers will help you out experimenting with the available API&#8217;s.<br />
Parallel with the codelab, there will also be a &#8220;Brand Integration Day&#8221;, where creative agencies and media buyers are invited to learn all about leveraging Netlog&#8217;s Brand Integration Platform and key customers will share their experience with Netlog.</p>
<p>A lot has happened during the past months, and we continue to work on improving our platform – for users and partners alike. We’d be thrilled to see you on April 2 to learn what we’ve done and share some of your thoughts.</p>
<p>The Developer Day kicks off at 9 am on April 2nd at Kinepolis in Brussels. Interested? <a href="http://nl.netlog.com/go/about/amiando=417031"><strong>Register here</strong></a>.</p>
<p>Keep an eye on the <a href="http://en.netlog.com/go/developer/blog/blogid=3125097#blog">Netlog Developer blog</a> for more info.</p>
<p><br style="clear: both;" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2009/03/05/netlog-developer-day-april-2nd-brussels/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Database Sharding at Netlog, with MySQL and PHP</title>
		<link>http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/</link>
		<comments>http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#comments</comments>
		<pubDate>Thu, 12 Feb 2009 13:03:02 +0000</pubDate>
		<dc:creator>oemebamo</dc:creator>
				<category><![CDATA[tech]]></category>
		<category><![CDATA[work]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[database]]></category>
		<category><![CDATA[federation]]></category>
		<category><![CDATA[high performance]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[netlog]]></category>
		<category><![CDATA[parallel processing]]></category>
		<category><![CDATA[partitioning]]></category>
		<category><![CDATA[php]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[sharding]]></category>
		<category><![CDATA[sphinx]]></category>

		<guid isPermaLink="false">http://www.jurriaanpersyn.com/?p=274</guid>
		<description><![CDATA[
.slide {
width: 500px; 
margin: 0 auto;
margin-bottom: 10px;
}
.slide p {
margin: 0; 
display: block; 
padding: 5px; 
color: grey; 
text-align: right;
font-style: italic;
}

This article accompanies the slides from a presentation on database sharding. Sharding is a technique used for horizontal scaling of databases we are using at Netlog. If you're interested in high performance, scalability, MySQL, php, caching, partitioning, [...]]]></description>
			<content:encoded><![CDATA[<style>
.slide {
width: 500px; 
margin: 0 auto;
margin-bottom: 10px;
}
.slide p {
margin: 0; 
display: block; 
padding: 5px; 
color: grey; 
text-align: right;
font-style: italic;
}
</style>
<p>This article accompanies the <a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#slides">slides</a> from a presentation on database sharding. Sharding is a technique used for horizontal scaling of databases we are using at Netlog. If you're interested in high performance, scalability, MySQL, php, caching, partitioning, Sphinx, federation or Netlog, read on ...</p>
<p>This presentation was given at the second day of <a href="http://www.fosdem.org/2009/">FOSDEM 2009</a> in Brussels. FOSDEM is an annual conference on open source software with about <em>5000 hackers</em>. I was invited by <a href="http://www.krisbuytaert.be/blog/">Kris Buytaert</a> and <a href="http://friendfeed.com/lenzgr">Lenz Grimmer</a> to give a talk in the <a href="http://forge.mysql.com/wiki/FOSDEM_2009">MySQL Dev Room</a>. The talk was based on an <a href="http://www.slideshare.net/oemebamo/database-sharding-at-netlog-presentation">earlier talk</a> I gave at <a href="http://barcampgent2.wikispaces.com/">BarcampGent 2</a>.</p>
<h3>Overview</h3>
<ul>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#more-274#whoami">Who am I?</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#aboutnetlog">What is Netlog?</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#history">A history of scaling database systems</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#hittinglimits">Hitting limits</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#shardingbasics">Sharding basics</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#shardingschemes">Sharding schemes</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#implications">Implications</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#solutions">Existing solutions</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#implementation">Implementation</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#tacklingproblems">Tackling the problems</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#finalthoughts">Final thoughts</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#slides">Slides</a></li>
<li><a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#resources">Resources</a></li>
</ul>
<p><span id="more-274"></span></p>
<h3 id="whoami">Who am I?</h3>
<p>Currently I am a Lead Web Developer at Netlog working with php, MySQL and other frontend technologies to develop and improve the features of our social network. I've been doing this for 3 years now. For this paper it is important to mention that I am neither a DBA nor a sys-admin, so I approach the problem of scaling databases from an application / developer point of view.<br />Of course the solutions presented in this presentation are the result of a lot of effort from the Development and IT Services Department at Netlog.</p>
<h3 id="aboutnetlog">What is Netlog?</h3>
<p>For those of you, who are unfamiliar with Netlog, it's best to sketch a little overview of who and what we are, and especially where we come from in terms of userbase and growth. It will let you see things in perspective regarding scalability. At the moment we have over 40 million active members, resulting in over 50 million unique visitors per month. This adds up to 5+ billion page views per month and 6 billion online minutes per month. We're active in 26 languages and 30+ countries with our 5 most active countries being Italy, Belgium, Turkey, Switzerland and Germany. <em>(If you're interested in more info about the company, check our <a href="http://netlog.com/go/about">About Pages</a> and <a href="http://netlog.com/go/register">sign-up for an account</a>.)</em></p>
<div style="text-align: center;"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/europe.png" /><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/africa-asia.png" /></div>
<p>In terms of database statistics, this type of usage results among others in huge amounts of data to store (eg. 100+ million friendships for nl.netlog.com). The nature of ourapplication (lots of interaction) results in a very write-heavy app (with a read-write ratio of about 1.4 to 1). A typical database, before sharding, had an average of 3000+ queries per second during the peaktime (15h - 22h local time, for nl.netlog.com).<br />Of course, these requirements do not have to be met by every application, and different applications require different scaling strategies. Nevertheless we wouldn't have thought (or hoped) to be where we are today, when we started off 7 years ago as a college student project. We are convinced that we can give you further insight into scalability and share some valuable suggestions.<br />Below is a graph of our growth in the last year.</p>
<div style="text-align: center;"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/growthstats.png" style="width: 100%;" /></div>
<p>This growth has of course resulted in several performance issues. The bottleneck for us has often been the database layer, because this layer is the only layer in the web stack that isn't stateless. The interactions and dependencies in a relational database system, make scaling horizontally less evident.</p>
<p>Netlog is (being) built and runs on open source software such as <a href="http://www.php.net">php</a>, <a href="http://www.mysql.com">MySQL</a>, <a href="http://www.apache.org">Apache</a>, <a href="http://www.debian.org">Debian</a>, <a href="http://www.danga.com/memcached/">Memcached</a>, <a href="http://www.sphinxsearch.com/">Sphinx</a>, <a href="http://www.lighttpd.net/">Lighttpd</a>, <a href="http://www.squid-cache.org/">Squid</a>, and many more. Our solutions for scaling databases are also built on these technologies. That's why we want to give something back by documenting and sharing our story. </p>
<h3 id="history">A history of scaling database systems</h3>
<p>As every hobby project, Netlog (then <a href="http://web.archive.org/web/20001018021036/http://asl.to/">asl.to</a>, "your internet passport") started off, more then 7 years ago, with a single database instance on a - probably virtual - server in a shared hosting environment. As traffic grew and load increased, we moved to a separate server, with eventually a split setup for MySQL and php (<a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#databasesetup1">database setup 1</a>).</p>
<div id="databasesetup1" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/master.png" />
<p>Database Setup 1: Master (W)</p>
</div>
<p>A next step to be taken was introducing new databases configured as "slaves" of the "master" database. Because a single MySQL server couldn't serve all the requests from our application, we distributed the read and write traffic to separate hosts. Setting up a slave is pretty easy through MySQL's replication features. What happens in a master-slave configuration is that you direct all write-queries (INSERT/UPDATE/DELETE) to the master database and all (or most) read queries to one or more slave databases. Slaves databases are typically kept in sync with the master by reading the binlog files of the master and replaying all write-queries (<a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#databasesetup2">database setup 2</a>).<br />Problems to tackle for this set-up include increased complexity for your DBA-team (that needs to monitor multiple servers), and the possibility of "replication lag"; your slaves might get out-of-sync with the master database (because of locking read-queries, downtime, inferior hardware, etc.), resulting in out-of-date results being returned when querying the slave databases.<br />Not in every situation real-time results are required, however you'll have situations where you have to force some read-queries to your master database to ensure data integrity. Otherwise you will end up with the painful consequences of (possible) race conditions.</p>
<div id="databasesetup2" class="slide">
<img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/master-slave.png" /></p>
<p>Database Setup 2: Master (W) + Slaves (R)</p>
</div>
<p>A good idea for the master-slave set-up is to introduce roles for your slaves. Typically you might assign all search, backend and/or statistics related queries to a "search-slave", where you don't care that much about replication lag, since real time results are seldom required for those kind of use cases.</p>
<p>This system works especially well for read-heavy applications. Say you've got a server load of 100% and a read/write ratio of 4/1, your master server will be executing SELECT-queries 80% of the time. If you add a single slave to this set-up, the SELECT capacity doubles and you can handle twice the amount of SELECT-queries.<br />But in a write-heavy application, or a situation where your master database is executing write-queries for 90% of the time, you'll only win another 10% capacity by adding another slave, since your slaves will be busy syncing with their master for about 90% of the time. The problem here is that you're only distributing read traffic and no write traffic. In fact you're replicating the write traffic. Considering the fact that the efficiency of a Master-Slave setup is limited, you end up with lots of identic copies of your data.</p>
<p>At this point you'll have to start thinking about distributing write traffic. The heavier your application relies on write traffic, the sooner you'll have to deal with this. A simple, and straightforward, first step is to start partitioning your application on feature-level. This process is called vertical partitioning.<br /> In your application you identify features (and by that MySQL tables) that more or less can exist on separate servers. If you have tables that are unrelated and don't require JOINs, why not put them on separate servers? For Netlog we have been able to put most of the tables containing details about a the items of a user (eg. photos, blogs, videos, polls, ...) on separate servers. By replicating some important tables (eg. a table with userids, nicknames, etc.) to all separate partitions, you can still access and JOIN with those tables if you might need to.<br /> In <a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#databasesetup3">database setup 3</a>, you see an example where we don't bother our master database anymore for friends or messages related queries. The write and read queries for these features go directly to the database responsible for that feature. These feature-specific hosts are still configured as slaves of the "TOP" master database, because that way we can replicate a few of those really important tables.<br /> A good idea here is to split up the tables for <a href="http://en.wikipedia.org/wiki/OLAP">OLAP</a> use cases (data warehouses) from <a href="http://en.wikipedia.org/wiki/OLTP">OLTP</a> use cases (front-end, real time features), since these require a different approach and have different needs regarding speed and uptime, anyways.</p>
<div id="databasesetup3" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/verticalpartitioning.png" />
<p>Database Setup 3: Vertical Partitioning</p>
</div>
<p>What we did in setup 2, can be easily repeated for each of the vertically partitioned features. If any of your databases have trouble keeping up with the traffic requirements, configure a slave for that database and distribute the read and write traffic. This way you create a tree of databases replicating some tables through the whole system and a database class responsible for distributing the right queries to the right databases (<a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#databasesetup4">database setup 4</a>).</p>
<div id="databasesetup4" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/verticalpartitioning-slave.png" />
<p>Database Setup 4: Vertical Partitioning / Replication Tree</p>
</div>
<p>If necessary, you might dive deeper into your application and find more features to partition. Unfortunately this will become harder and harder, because with every feature you split up, you again lose some JOIN-functionality you might want or need. And, sometimes, you're even stuck with a single table that's growing too large and grows beyond what a single database host can easily manage. The first feature to hit this single-table-on-a-single-database limit, was a table with friendships between our users. This table grew so rapidly that the performance and uptime of the host responsible for this feature wasn't guaranteed anymore, no matter how many slaves we added to it. Of course you can always choose to scale up, instead of scale out, by buying boxes with an incredibly insane hardware setup, but apart from being nice to have, they're expensive and they'll still hit limits if you continue growing.<br /> This approach to scaling has a limit (<a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#databasesetup5">database setup 5</a>) and if you hit that limit, you have to rethink your scaling strategy.</p>
<div id="databasesetup5" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/verticalpartitioning-overload.png" />
<p>Database Setup 5: Hitting Limits</p>
</div>
<h3 id="hittinglimits">So, what's next?</h3>
<p>What could we do now? Vertical partitioning has helped us a great deal, but we are stuck. Does master-to-master replication help? Will a cluster set-up help? Not really; these systems are designed for high availability and high performance. They work by replicating data and don't offer solutions for distributing write traffic.<br /> What about caching? Oh, how can we forget about caching! Of course, caching will help a great deal in lowering the load on your database servers. (The read/write-ratio mentioned earlier would be completely different if we did no caching.) But the same problem remains: caching will lower the read traffic on your databases, but doesn't offer a solution for write traffic. Caching will delay the moment your database is only returning "1040 Too many connections" errors, but no matter how good your caching strategy is, it can't prevent your visitor metrics going nuts at some point.</p>
<h3 id="shardingbasics">The Holy Grail!</h3>
<p>You can't split a table vertically, but can you easily split it horizontally? Sharding a table is putting several groups of records of that table in separate places (be it physically or not). You cut your data into arbitrarily sized pieces / fragments / shards and distribute them over several database hosts. Instead of putting all 100+ million friendships records on 1 big expensive machine, put 10 million friendships on each of 10 smaller and cheaper machines.</p>
<div id="theholygrail" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/theholygrail.png" /></div>
<p>Sharding, or horizontal partitioning, is a term that was already in active use in 1996 in the MMO (Massive Multiplayer Online) Games world. If you're searching for info on sharding, you'll see it's a technique used by among others <a href="http://www.scribd.com/doc/35222/Flickr-Architecture-Presentation">Flickr</a>, <a href="http://highscalability.com/livejournal-architecture">LiveJournal</a>, <a href="http://www.slideshare.net/guest0e6d5e/sharding-architectures">Sun</a> and Netlog.</p>
<div id="shardingfederationmodulo" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/sharding-federation-modulo.png" />
<p>Sharding a photos table over 10 servers with a modulo partitioning scheme</p>
</div>
<p>In the image above you see an example of splitting up a photos-table over 10 different servers. The algorithm that's used to decide where you data goes or where you can access your data is eg. a modulo function on the userid of the owner of that photo. If you know the owner of a photo, you then know where to fetch the photo's other details, fetch its comments, etc.<br /> Let's hvae a look at another simple example. </p>
<ul>
<li>Use case: a simple blog site.</li>
<li>We've got a table with blog posts, with these columns: postid, title, message, dateadd, authorid
</li>
<li>authorid is a FK (foreign key) to a users table</li>
<li>We shard the blog posts table (because our authors have been very productive writers) over 2 databases.</li>
<li>Posts from authors with an even authorid go to database 1.</li>
<li>Posts from authors with an uneven authorid go to database 2.</li>
<li>Query: "Give me the blog messages from author with id 26."</li>
</ul>
<p>In a non sharded environment, somewhere in your application, you'd find code that looks like this:</p>
<div class="igBar"><span id="lphp-4"><a href="#" onclick="javascript:showPlainTxt('php-4'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">PHP:</span>
<div id="php-4">
<div class="php">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span> = DB::<span style="color:#006600;">getInstance</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// fetch a database instance</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">prepare</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#FF0000;">"SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}"</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// prepare a query</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">assignInt</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#FF0000;">'userID'</span>, <span style="color:#0000FF;">$userID</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// assign query variables</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">execute</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// execute the query</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$results</span> = <span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">getResults</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// fetch an array of results </span></div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>In this example we first fetch an instance of our database class that connects to our database. We then prepare a query, assign the variables (here the id of the author $userID), execute the query and fetch the resultset. If we introduce sharding based on the author's $userID, the database we need to execute this query on, is depending on that $userID (whether or not it is an even number). An approach to handle this could be to include the logic of "which user is on which database" into our database class and pass on that $userID to that class. You could end up with something like this: you pass on the $userID to the DB::getInstance() function, which then returns an object with the connection details based on the result of $userID % 2:</p>
<div class="igBar"><span id="lphp-5"><a href="#" onclick="javascript:showPlainTxt('php-5'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">PHP:</span>
<div id="php-5">
<div class="php">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span> = DB::<span style="color:#006600;">getInstance</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#0000FF;">$userID</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// fetch a database instance, specific for this user</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">prepare</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#FF0000;">"SELECT title, message FROM BLOG_MESSAGES WHERE userid = {userID}"</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">assignInt</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#FF0000;">'userID'</span>, <span style="color:#0000FF;">$userID</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">execute</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>; </div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$results</span> = <span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">getResults</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>; </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Instead of passing the $userID as a parameter to your DB-class, you could try to parse it from the prepared query you supply your class, or you could do your calculation of which DB connection you need on a different level, but the key concept remains the same: you need to pass some extra information to your database class to know where to execute the query. That is one of the most challenging requirements that has to be met for successful sharding.</p>
<h3 id="shardingschemes">How to shard your data?</h3>
<p>When you want to split up your data two questions spring to mind: which property of the data (which column of the table) will I use to make the decisions on where the data should go? And what will the algorithm be? Let's call the first one the "sharding/partitioning key", and the second one the "sharding/partitioning scheme".</p>
<p>Which sharding key will be used is basically a decision that depends on the nature of your application, or the way you'll want to access your data. In the blog example, if you display overviews of blog messages per author, it's a good idea to shard on the author's $userID. Say your site's navigation is through archives per month or per category, it might be smarter to shard on publication date or $categoryID. <em>(If your application requires both approaches it might even be a good idea to set up a dual system with sharding on both keys.)</em></p>
<p>What you can do with the "shard key" to find its corresponding shard basically falls into 4 categories: </p>
<ul>
<li><strong>Vertical Partitioning</strong>: Splitting up your data on feature/table level can be seen as a kind of sharding, where the "shard key" is eg. the table name. As mentioned earlier this way of sharding is pretty straightforward to implement and has a relatively low impact on the application on the whole.</li>
<li><strong>Range-based Partitioning</strong>: In range based partitioning you split up your data according to several ranges. Blog posts from before the 2000 and before go to database 1, blog posts from the new millenium go to the other database. This approach is typical for logging or other time based data. Other examples of range based partitioning could include federating users according to the first number of their postal code.</li>
<li><strong>Key or Hash based Partitioning</strong>: The modulo-function used in the photos example is a way of partitioning your data based on hashing or other mathematical functions of the key. In the simple example of a modulo function you can use your number of shards for the modulo-operation. Of course, changing your number of shards would mean rebalancing your data. This might be a slow process. A way to solve this is to use a more consistent hashing mechanism, or choose the original number of your shards right and work with "virtual shards".</li>
<li><strong>Directory based Partitioning</strong>: The last and most flexible scheme is where you have a directory lookup for each of the possible values of your shard key, mapped to a certain shard's id. This makes it possible to move all data from a certain shard key (eg. a certain user) from shard to shard, by altering the directory. A directory could on the other hand introduce more overhead or be a SPOF (Single Point Of Failure).</li>
</ul>
<p>As shown in the blog example, you need to know your "shard key" before you can actually execute your query on the right database server. It means that the nature of your queries and application determines the way of partitioning your data. The demanded flexibility, the projected growth and the nature of your data will be other factors helping you decide on what scheme to use.<br /> You also want to choose your keys and scheme so the data is optimally balanced over the databses and the load to each of the servers in the pool is equal.</p>
<p>The end result of sharding your data should be that you have distributed write-queries to different independent databases, and that you end up with a system of more, but cheaper machines, that each have a smaller amount of the data and thus can process queries faster on smaller tables.<br /> If you succeed, you're online again. Users appreciate it and your DBA is happy, because each of the machines in the setup now has less load and crashes less so there is no tussing and turning through the nights. <em>(Smaller tables means faster queries, and that includes maintainance or ALTER-queries, which again helps in keeping your DBA and developers happy.)</em></p>
<h3 id="implications">If there's a Holy Grail, there's a Killer Rabbit</h3>
<div id="killerrabbit" class="slide" style="margin-top: 15px;"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/killerrabbit.png" />
<p>Photo from <a href="http://www.flickr.com/photos/kt">The Rocketeer</a>. (Creative Commons Licensed)</p>
</div>
<p>Of course, sharding isn't the silver bullet of horizontal database scaling that will easily solve all your problems. Introducing sharding in your application comes with a significant cost of development. Here are some of its implications:</p>
<ul>
<li><strong>No cross-shard SQL queries</strong>: If you ever want to fetch data that (possibly) resides on different shards, you won't be able to do this with a JOIN on SQL-level. If you shard on $userID a JOIN with data from the same user is possible. However once you fetch results from several users on a shard, this will probably be an incomplete resultset. The key here is to design your application so there's no need for cross-shard queries. Other solutions could be the introduction of parallel querying on application level, but then of course you lose the aspect of distributing your database traffic. Depending on the use case, this could be a problem or not (eg. parallel querying for backend purposes is not as crazy as it may sound).<br /> Other options could be to denormalize your data and make some of the needed info available in several tables on several shards. You could duplicate the nickname of the author of a comment in the comments table to avoid having to do an extra query for that nickname. (The shard where you fetch the comment from, might be different than the shard where you'll find the nickname of the author.)<br /> If you have a table with guestbook messages that you want to shard, but require fetching both a list of messages by guestbook owner userid as on message poster userid, you could denormalize it by putting your messages (or references) on both the owner's and the poster's shard.</li>
<li><strong>Data consistency and referential integrity</strong>: Since data from a same "table" resides on several stand-alone database servers it becomes impossible to imply foreign keys, globally unique auto_increment values or execute cross-shard transactions. This means you have to deal with enforcing integrity on the application level, and you might eventually end up spending a significant amount of your development time on check and fix routines.<br /> One way to reduce the integrity problems is to fake transactions across databases by starting a database transaction on two servers and only committing each of them once you know both servers are up. There will still be a delay in between the two commits of the transaction (which can then again cause problems), but it is one step closer to keeping your data healthy.</li>
<li><strong>Balancing shards</strong>: If you shard on $userID, your sharding system might be(come) unbalanced because of power users versus inactive users. Not all hardware in your setup might have the same specs. And what if you add more shards, how will you be able to rebalance your setup? <br />Keeping the load on every database equal might take some effort. The choice of partitioning scheme is very important at this point. A directory based approach is the most flexible, but introduces overhead and a possible SPOF.</li>
<li><strong>Is your network ready?</strong> Your application servers will now possibly fetch and store data on several different servers in one request, your network topology and configuration settings have to be ready. Will you keep connections open for the full page render? Or will you close the connection to your database after every query?</li>
<li><strong>Your backup strategy will be different</strong>: Your actual data is fragmented over different servers affecting your backup strategy.</li>
</ul>
<h3 id="solutions">Existing solutions?</h3>
<p>At the moment Netlog is the 67<sup>th</sup> most visited website in the world, according to Alexa's ranking. This means that there's at least 66 other websites out there probably facing similar problems as we do. 16 of the 20 most popular websites are powered by MySQL so, we are definitely not alone, are we?<br /> Let's have a look at some of the existing technologies that implement or are somehow related to sharding and scaling database sysems, and let's see which ones could be interesting for Netlog.</p>
<p><strong><a href="http://www.mysql.com/products/database/cluster/">MySQL Cluster</a></strong> is one of the technologies you could think would solve similar problems. The truth is that a database cluster is helpful when it comes to high availability and performance, but it's not designed for the distribution of writes.</p>
<p><strong><a href="http://dev.mysql.com/doc/refman/5.1/en/partitioning.html">MySQL Partitioning</a></strong> is another relatively new feature in MySQL that allows for horizontal splitting of large tables into smaller and more performant pieces. The physical storage of these partitions are limited to a single database server though, making it not relevant for when a single table grows out of the capacities of a single database server.</p>
<p><strong><a href="http://www.hscale.org">HSCALE</a></strong> and <strong><a href="http://spockproxy.sourceforge.net/">Spock Proxy</a></strong>, that both build on <strong><a href="http://forge.mysql.com/wiki/MySQL_Proxy">MySQL Proxy</a></strong>, are two other projects that help in sharding your data. MySQL Proxy introduces <a href="http://en.wikipedia.org/wiki/Lua_(programming_language)">LUA</a>, as an extra programming language to instruct the proxy (for eg. finding the right shard for this query). At the time we needed a solution for sharding neither of these projects seemed to support directory based sharding the way we'd wanted it to.</p>
<p><strong><a href="http://www.hivedb.org/">HiveDB</a></strong> is a sharding framework for MySQL in Java, that requires the Java Virtual Machine, with a php interface currently being in an infancy state. Being a Java solution makes it less interesting for us, since we prefer the technologies we are experts in and our application is written in: php.</p>
<p>Other technologies that aren't MySQL or php related include <a href="http://hypertable.org/">HyperTable</a> (HQL), <a href="http://hadoop.apache.org/hbase/">HBase</a>, <a href="http://labs.google.com/papers/bigtable.html">BigTable</a>, <a href="http://www.hibernate.org/414.html">Hibernate Shards</a> (*shivers*), <a href="http://www.sqlalchemy.org/">SQLAlchemy</a> (for Python), <a href="http://www.oracle.com/technology/products/database/clustering/index.html">Oracle RAC</a>, etc ... The memcached SQL-functions or storage engine for MySQL is also a related project that we could mention here.</p>
<p>None of these projects really seemed to come in line with our requirements. But what exactly are they?</p>
<ul>
<li><strong>Flexible for the hardware department.</strong><br />We project growth and want the sharding system to be flexible. Knowing that our traffic will increase, we need to be able to add more shards quickly. With a growing amount of data, a proportional growth in hardware is requested. For this reason we opt for a directory based partitioning scheme.</li>
<li><strong>No massive rewrite.</strong><br />We can't introduce a whole new database layer or incompatible abstraction layer. We want to keep on using our database class as we are doing now and only implement sharding for those features that really require that amount of scaling. That's why we've opted for a solution that builds on what we have and allows for incremental implementation. We also wanted to use the sharding API, without having the data to be physically sharded, so the development and IT departments can independently decide when to do their part of the job.</li>
<li><strong>Support for multiple sharding keys.</strong><br /> Most of our data will probably be sharded on $userID, but we want the system to be flexible so we can implement other keys and/or sharding schemes too.</li>
<li><strong>Easy to understand.</strong><br />We can't expect each and every of our developers to know everything about scalability and performance. Even if this was the case, the API to access and store data in a sharded environment should make it transparent to them so they shouldn't care about performance and can focus on what's really fun to do: developing and improving on features.<br />So, it's best if the API is a php API which makes it easy for them to use in the rest of our application.</li>
</ul>
<h3 id="implementation">Sharding Implementation at Netlog</h3>
<p>So, what did we come up with? An in-house solution, written 100% in php. The implementation is mostly middleware between application logic and the database class. We've got a complete caching layer built in (using memcached). Since our site is mainly build around profiles, most of the data is sharded on $userID.</p>
<p>In this system we are using the shard scheme <a href="/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/#shard-database-host-structure">below</a>, where a shard is identified by a unique number ($shardID) that also serves as a prefix for the tables in the sharding system. Several shards (groups of tables) sit together in a "shard database", and several of those databases (not instances) are on a certain "shard database host".<br /> So a host has more then one shard. This allows us to move shards as a whole, or databases as a whole to help in balancing all the servers in the pool and it allows us to play with the amount of shards in a database and amount of shards on a server to find the right balance between table size and open files for that server.<br />When started using this system in production we had 4000 shards on 40 hosts. Today we've got 80 hosts in the pool.</p>
<div id="shard-database-host-structure" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/shard-database-host-structure.png" />
<p>Shards live in databases, databases live on hosts</p>
</div>
<p>From the php side there are two parts of the implementation. The first being a series of management and maintainance related functions allowing to add, edit, delete shards, databases and hosts to the system and a lookup system. The second series of classes provides an API consisting of a database access layer and a caching layer.</p>
<h4>The Sharding Management Directory</h4>
<p>The directory or lookup system is in fact a single MySQL table translating shard keys to $shardIDs. Typically these are $userID-$shardID combinations. This is a single table with the amount of records being the number of users on Netlog. With only id's saved in that table it's still manageable and can be kept very performant through master-to-master-replication, memcached and/or a cluster set-up.<br /> Next to that there's a series of configuration files that translate $shardIDs to actual database connection details. These configuration files allow us to flag certain shards as not available for read and/or write queries. (Which is interesting for maintainance purposes or when a host goes down.)</p>
<h4>The Sharded Tables API</h4>
<p><em>Note: The API we implemented allows for handling more than the typical case I'll discuss next, and also allows for several caching modes and strategies based on the nature and use of its application.</em></p>
<p>Most records and data in the shard system have both a $userID field and an $itemID field. This $itemID is a $photoID for tables related to photos or $videoID for tables related to videos. (You get the picture ...) The $itemID is sometimes an auto_increment value, or a foreign key and part of a combined primary key with $userID. Each $itemID is thus unique per $userID, and not globally unique, because that would be hard to enforce in a distributed system. </p>
<p><em>(If you use an auto_increment value in a combined key in MySQL, this value is always a MAX()+1 value, and not an internally stored value. So if you add a new item, delete it again, and insert another record, the auto_increment value of that last insert will be the same as the previously inserted and deleted record. Something to keep in mind ...)</em></p>
<p>If we want to access data stored in the sharding system we typically create an object representing a table+$userID combination. The API provides all the basic CRUD (Create/Read/Update/Delete) functionalities typically needed in our web app. If we go back to the first example of fetching blog messages by a certain author we come to the following scenario;</p>
<p>Query: Give me the blog messages from author with id 26.</p>
<ol>
<li>Where is user 26?<br />User 26 is on shard 5.</li>
<li>On shard 5; Give me all the $blogIDs ($itemIDs) of user 26.<br />That user's $blogIDs are: array(10,12,30);</li>
<li>On shard 5; Give me all details about the items array(10,12,30) of user 26.<br />Those items are: array(array('title' => "foo", 'message' => "bar"), array('title' => "milk", 'message' => "cow"));</li>
</ol>
<p>In this process step 1 is executed on a different server (directory db) then step 2 and 3 (shard 5). Step 2 and 3 could easily be combined into one query, but there's a reason why we don't do it, which I'll explain when discussing our caching strategy.<br />It's important to note that the functionality behind step 2 allows for adding WHERE, ORDER and LIMIT clauses so you can fetch only the records you need in the order you need.</p>
<p><em>(One could argue that for the example given here and the way we are using MySQL here, it's not needed to have a relational database and you could try to use simpler database systems. While that could be the case, there's still advantages in using MySQL, for cases you're bypassing this API. It's not that bad to have all your data in the same format either, sharded or not. The possible overhead of still using MySQL hasn't been the bottleneck for us today, but it is certainly something we might consider improving on.)</em></p>
<h4>Shard Management</h4>
<p>To keep the servers in the sharding system balanced we are monitoring several parameters such as number of users, filesize of tables and databases, amount of read and write queries, cpu load, etc. Based on those stats we can make decisions to move shards to new or different servers, or even to move users from one shard to another.<br />Move operations of single users can be done completely transparently and online without that user experiencing downtime. We do this by monitoring write queries. If we start a move operation for a user, we start copying his data to the destination shard. When a write query is executed for that user, we abort the move process, clean up and try again later. So a move of a user will be successful if the user himself/herself isn't active at that time, or if no other user is interacting with him/her (for features in the sharding system).<br /> Moving a complete shard or database at a time is a more drastic approach to balancing the load of servers and requires some downtime which we can keep to a minimum by configuring shards as read only / using a master-slave setup during the switch, etc.</p>
<p>Inherent to this sytem is that if one database goes down, only the users (or interactions with the users) on that database are affected. We (can) improve the availability of shards by introducing clusters, master-master setups or master-slave setups for each shard, but the chance of a shard database server being in trouble are slim to none because of the minor load on shard db's compared to the pre-sharding-era.</p>
<h3 id="tacklingproblems">Tackling the problems</h3>
<p>The difficulties of sharding are partially tackled by implementations with these 3 technologies: Memcached, parallel processing and Sphinx.</p>
<h4>Memcached</h4>
<p>"memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load." By putting a memory caching layer in between our application logic and the SQL-queries to our shard database we are able to get results much, much faster. This caching layer also allows us to do some of the cross-shard data fetching, previously thought impossible on SQL-level.</p>
<p>For those unfamiliar with memcached, below is a very simple and stripped-down example of Memcached usage where we try to fetch a key-value pair from the cache and if it's not found we compute the value and store it into the system so a subsequent call to the function will instantly return the cached value without stressing the database.</p>
<div class="igBar"><span id="lphp-6"><a href="#" onclick="javascript:showPlainTxt('php-6'); return false;">PLAIN TEXT</a></span></div>
<div class="syntax_hilite"><span class="langName">PHP:</span>
<div id="php-6">
<div class="php">
<ol>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#000000; font-weight:bold;">function</span> isObamaPresident<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$memcache</span> = <span style="color:#000000; font-weight:bold;">new</span> Memcache<span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$result</span> = <span style="color:#0000FF;">$memcache</span>-&gt;<span style="color:#006600;">get</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#FF0000;">'isobamapresident'</span><span style="color:#006600; font-weight:bold;">&#41;</span>; <span style="color:#FF9933; font-style:italic;">// fetch</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#616100;">if</span> <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#0000FF;">$result</span> === <span style="color:#000000; font-weight:bold;">false</span><span style="color:#006600; font-weight:bold;">&#41;</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#123;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#FF9933; font-style:italic;">// do some database heavy stuff</span></div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$db</span> = DB::<span style="color:#006600;">getInstance</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$votes</span> = <span style="color:#0000FF;">$db</span>-&gt;<span style="color:#006600;">prepare</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#FF0000;">"SELECT COUNT(*) FROM VOTES WHERE vote = 'OBAMA'"</span><span style="color:#006600; font-weight:bold;">&#41;</span>-&gt;<span style="color:#006600;">execute</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$result</span> = <span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#0000FF;">$votes</span>&gt; <span style="color:#006600; font-weight:bold;">&#40;</span>USA_CITIZEN_COUNT / <span style="color:#CC66CC;color:#800000;">2</span><span style="color:#006600; font-weight:bold;">&#41;</span><span style="color:#006600; font-weight:bold;">&#41;</span> ? <span style="color:#FF0000;">'Sure is!'</span> : <span style="color:#FF0000;">'Nope.'</span>; <span style="color:#FF9933; font-style:italic;">// well, ideally</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#0000FF;">$memcache</span>-&gt;<span style="color:#006600;">set</span><span style="color:#006600; font-weight:bold;">&#40;</span><span style="color:#FF0000;">'isobamapresident'</span>, <span style="color:#0000FF;">$result</span>, <span style="color:#CC66CC;color:#800000;">0</span><span style="color:#006600; font-weight:bold;">&#41;</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#125;</span></div>
</li>
<li style="font-family: 'Courier New', Courier, monospace; color: black; font-weight: normal; font-style: normal;color:#3A6A8B;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#616100;">return</span> <span style="color:#0000FF;">$result</span>;</div>
</li>
<li style="font-weight: bold;color:#26536A;">
<div style="font-family: 'Courier New', Courier, monospace; font-weight: normal;"><span style="color:#006600; font-weight:bold;">&#125;</span> </div>
</li>
</ol>
</div>
</div>
</div>
<p></p>
<p>Memcached is being used in several ways and on several levels in our application code, and for sharding the main ones include;</p>
<ul>
<li>Each $userID to $shardID call is cached. This cache has a hit ratio of about 100% because every time this mapping changes we can update the cache with the new value and store it in the cache without a TTL (Time To Live).</li>
<li>Each record in sharded tables can be cached as an array. The key of the cache is typically tablename + $userID + $itemID. Everytime we update or insert an "item" we can also store the given values into the caching layer, making for a theoretical hit-ratio of again 100%. </li>
<li>The results of "list" and "count" queries in the sharding system are cached as arrays of $itemIDs or numbers with the key of the cache being the tablename + $userID (+ WHERE/ORDER/LIMIT-clauses) and a revision number.</li>
</ul>
<p>The revision numbers for the "list" and "count" caches are itself cached numbers that are unique for each tablename + $userID combination. These numbers are then used in the keys of "list" and "count" caches, and are bumped whenever a write query for that tablename + $userID combination is executed. The revisionnumber is in fact a timestamp that is set to "time()" when updated or when it wasn't found in cache. This way we can ensure all data fetched from cache will always be the correct results since the latest update.<br />If, with this in mind, we again return to the blog example, we get the following scenario.</p>
<p>Query: Give me the blog messages from author with id 26.</p>
<ol>
<li>Where is user 26?<br />The result of this query is almost always available in memcached.</li>
<li>On shard 5; Give me all the $blogIDs ($itemIDs) of user 26.<br />The result of this query is found in cache if it has been requested before since the last time an update to the BLOGS-table for user 26 was done.</li>
<li>On shard 5; Give me all details about the items array(10,12,30) of user 26.<br />The results for this query are almost always found in cache because of the big hit-ratio for this type of cache. When fetching multiple items we make sure to do a multi-get request to optimize traffic from and to Memcached.</li>
</ol>
<p>Because of this caching strategy the two separate queries (list query + details query) which seemed a stupid idea at first, result in better performance. If we hadn't split this up into two queries and cached the list of items with all their details (message + title + ...) in Memcached, we'd store much more copies of the record's properties.</p>
<p>There is an interesting performance tweak we added to the "list" caches is that. Let's say we request a first page of comments (1-20), we actually query for the first 100 items, store that list of 100 in cache and then only return the requested slice of that result. A likely, following call to the second page (21-40) will then always be fetched from cache. So the window we ask from the database is different then the window requested by the app.</p>
<p>For features where caching race conditions might be a problem for data consistency, or for use cases where caching each record separately would be overhead (eg. because the records are only inserted and selected and used for 1 type of query), or for use cases where we do JOIN and more advance SQL-queries, we use different caching modes and/or different API-calls. </p>
<p>This whole API requires quite some php processing we are now doing on application level, where previously this was all handled and optimized by the MySQL server itself. Memory usage and processing time on php-level scale alot better then databases though, so this is less of an issue.</p>
<h4>Parallel processing</h4>
<p>It is not strange to fetch data stored on different shards in one go, because most data is probably available from memory. If we fetch a friends of friends list, one way to do this could be to fetch your own friends loop over them and fetch their friends and then process those results to get a list of people your friends know, but you don't know yet. <br />The amount of actual database queries needed for this will be small, and even so, the queries are simple and superfast. Problems start to occur if we are processing this for users which have a couple of hundreds of friends each. For this we've implemented a system for splitting up certain big tasks into several smaller ones we can process in parallel.<br />This parallel processing in php is done by doing several web requests to our php server farm that each process a small part of the task. It is actually faster to process 10 smaller tasks simultaneously than to do the whole thing at once. The overhead of the extra web requests and cpu cycles it takes to split up the task and combine the results, are irrelevant compared to the gain. </p>
<h4>Using Sphinx</h4>
<p>Other typical queries that become impossible for sharded data are overview queries. Say you'd like a page of all the latest photos uploaded by all users. If you'd have your user's photos distributed over a hundred of databases, you'd have to query each, and then process all of those results. Doing that for several features would not be justifiabled, so most of our "Explore" pages (where you browse through and discover content from the community) are served from a different system.<br /> Sphinx is a free and open source SQL full-text search engine. We use it for more than your average input field + search button search engine. In fact a list of most viewed videos of the day, can also be a query result from Sphinx. For most of the data on these overview pages it's not a problem if the data isn't real time. So it's possible to retrieve those results from indexes that are regularly built from the data on each shard and then combined.</p>
<p>For a full overview of how we use Sphinx (and how we got there), I encourage you to have a look at the presentation of my colleague Jayme Rotsaert, "<a href="http://www.slideshare.net/_jayme/scaling-optimizing-search-on-netlog-presentation">Scaling and optimizing search on Netlog</a>", who's put a lot of effort into using Sphinx.</p>
<h3 id="finalthoughts">Final thoughts</h3>
<p>If there are only two things I could say about sharding it'd be these two quotes; </p>
<ul>
<li>"Don't do it, if you don't need to!" (37signals.com)</li>
<li>"Shard early and often!" (startuplessonslearned.blogspot.com)</li>
</ul>
<p>Sounds like saying two opposite things? Well, yes and no.</p>
<p>You don't want to introduce sharding in your architecture, because it definitely complicates your set-up and the maintenance of your server farm. There are more things to monitor and more things that can go wrong. <br /> Today, there is no out-of-the-box solution that works for every set-up, technology and/or use case. Existing tool support is poor, and we had to build quite some custom code to make it possible.<br /> Because you split up your data, you lose some of the features you've grown to like from relational databases.<br /> If you can do with simpler solutions (better hardware, more hardware, server tweaking and tuning, vertical partitioning, sql query optimization, ...) that require less development cost, why invest lots of effort in sharding?</p>
<p>On the other hand, when your visitor statistics really start blowing through the roof, it is a good direction to go. After all, it worked for us.<br /> The hardest part about implementing sharding, has been to (re)structure and (re)design the application so that for every access to your data layer, you know the relevant "shard key". If you query details about a blog message, and blog messages are sharded on the author's userid, you have to know that userid before you can access/edit the blog's title.<br /> Designing your application with this in mind ("What are the possible keys and schemes I could use to shard?"), will definitely help you to implement sharding more easily and incrementally at the moment you might need to. </p>
<p>In our current set-up not everything is sharded. That's not a problem though. We focus on those features that require this scaling strategy, and we don't spend time on <em>premature optimization</em>.<br /> Today, we're spending less ca$h on expensive machines, we've got a system that is available, it can handle the traffic and it scales.</p>
<div id="conclusion" class="slide"><img src="http://www.jurriaanpersyn.com/projects/netlog/sharding/conclusion.png" /></div>
<h3 id="slides">Presentation</h3>
<div style="width: 500px; text-align: right; margin: 0 auto;" id="__ss_1004297"><object style="margin:0px" width="500" height="417"><param name="movie" value="http://static.slideshare.net/swf/ssplayer2.swf?doc=database-sharding-at-netlog-final-1234116512031629-2&#038;stripped_title=database-sharding-at-netlog" /><param name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/><embed src="http://static.slideshare.net/swf/ssplayer2.swf?doc=database-sharding-at-netlog-final-1234116512031629-2&#038;stripped_title=database-sharding-at-netlog" type="application/x-shockwave-flash" allowscriptaccess="always" allowfullscreen="true" width="500" height="417"></embed></object>
<div>View more <a href="http://www.slideshare.net/">presentations</a> from <a href="http://www.slideshare.net/oemebamo">Jurriaan Persyn</a>.<br /> <em>(tags: <a href="http://slideshare.net/tag/fosdem2009">fosdem2009</a> <a href="http://slideshare.net/tag/fosdem">fosdem</a>)</em></div>
</div>
<h3 id="resources">Resources</h3>
<ul>
<li>the great development and it services team at Netlog</li>
<li><a href="http://www.netlog.com/go/developer">www.netlog.com/go/developer</a></li>
<li><a href="http://www.37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding">www.37signals.com/svn/posts/1509-mr-moore-gets-to-punt-on-sharding</a></li>
<li><a href="http://www.addsimplicity.com/adding_simplicity_an_engi/2008/08/shard-lessons.html">www.addsimplicity.com/adding_simplicity_an_engi/2008/08/shard-lessons.html</a></li>
<li><a href="http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day">www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day</a></li>
<li><a href="http://startuplessonslearned.blogspot.com/2009/01/sharding-for-startups.html">startuplessonslearned.blogspot.com/2009/01/sharding-for-startups.html</a></li>
<li><a href="http://www.codefutures.com/weblog/database-sharding">www.codefutures.com/weblog/database-sharding</a></li>
<li><a href="http://www.25hoursaday.com/weblog/2009/01/16/BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx">www.25hoursaday.com/weblog/2009/01/16/BuildingScalableDatabasesProsAndConsOfVariousDatabaseShardingSchemes.aspx</a></li>
<li><a href="http://highscalability.com">highscalability.com</a></li>
<li><a href="http://dev.mysql.com/doc/refman/5.1/en/partitioning.html">dev.mysql.com/doc/refman/5.1/en/partitioning.html</a></li>
<li><a href="http://www.hibernate.org/414.html">www.hibernate.org/414.html</a></li>
<li><a href="http://en.wikipedia.org/wiki/SQLAlchemy">en.wikipedia.org/wiki/SQLAlchemy</a></li>
<li><a href="http://spockproxy.sourceforge.net">spockproxy.sourceforge.net</a></li>
<li><a href="http://www.scribd.com/doc/3865300/Scaling-Web-Sites-by-Sharding-and-Replication">www.scribd.com/doc/3865300/Scaling-Web-Sites-by-Sharding-and-Replication</a></li>
<li><a href="http://oracle2mysql.wordpress.com/2007/08/23/scale-out-notes-on-sharding-unique-keys-foreign-keys">oracle2mysql.wordpress.com/2007/08/23/scale-out-notes-on-sharding-unique-keys-foreign-keys</a></li>
<li><a href="http://www.flickr.com/photos/kt">www.flickr.com/photos/kt</a></li>
<li><a href="http://oreilly.com/catalog/9780596101718/">High Performance MySQL, Second Edition (O'Reilly)</a></li>
</ul>
<p>For further questions or remarks, feel free to contact me at <a href="mailto:jurriaan@netlog.com">jurriaan@netlog.com</a> and subscribe to my blog at <a href="http://www.jurriaanpersyn.com">www.jurriaanpersyn.com</a> and the Netlog developer blog at <a href="http://www.netlog.com/go/developer/blog">www.netlog.com/go/developer/blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/feed/</wfw:commentRss>
		<slash:comments>32</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic Page Served (once) in 0.764 seconds -->
<!-- Cached page served by WP-Cache -->
