Paris Trip Notes – Day 3

Sun 10-1Mon 10-2Tue 10-3Wed 10-4Thu 10-5Fri 10-6

Today is Wednesday October 4th, 2017.   Almost every minute spent in the office yesterday – but really great to be here.  After dinner, a colleague, an Account Executive named Alexandre invited me to dine with his wife and daughter at their home near La Defense.

Alexandre has a daughter, about 2 years old who is absolutely adorable.    Alex’s wife is from Italy and she prepared a fantastic meal including a great salad, and pasta carbonara.   Afterwards, we enjoyed some of France’s amazing fromage (cheese).  I feel a bit bad… perhaps rude – I have so much preparation for meetings tomorrow that I needed to leave right after eating.

Today, we’ll have a very important meeting with a customer here in La Defense so I’ve been spending a majority of my time prepping for that… not much Paris tourism today.

Here’s a list of things that jumped out at me as I walked the streets, met and engaged with the people of France.

  1. Smokers – A lot of people smoke cigarettes.  There’s not the same anti-smoking sentiment that exists in the states.  For example, the restaurant I ate at yesterday had ash trays on the tables.  Of the 5 or so tables on the terrace (an enclosed area – but outside of the main restaurant) 3 were occupied by people smoking cigarettes.
  2. Driving – There are WAY more super tiny cars… a lot of taxi’s.  Uber works here, btw – but Lyft, not so much.  Drivers are aggressive – but not combative.  For example, on my way to the office we drive down Avenue Charles-de-Gaulle.  There are side streams to the street and it’s possible to go straight down the center lanes – or switch to the side feeder streams of the road.  Many of the drivers switch back and forth trying to get where they’re going faster… nothing new there – right?  But in the states, or, at least in NY, NJ, PA – you’d invariably run into an aggressive, road-raging driver that wants to punch you in the face.  Here, although everyone is moving at high rates of speed and cutting in/out – there didn’t appear to be any rage.  About 50% of the people on the road ride sleek scooters or motorcycles.  They ride them in and out of traffic… They’re allowed to weave and to go between cars here – just as in California.  I don’t think my Heritage Softail Custom would much fit in here what with the loud exhaust and the fat/wide body.
  3. Bread Aroma – During what appears to be random times of day, the streets are filled with the heavenly scent of bread baking… reminds me of home, Sarcones.  I tasted my first baguettes last night – really délicieux.
  4. Stylish – A majority of the people I see (and I realize this could be because of the area of France I am in) dress really well.  Men keep their hair cut well, neatly scruffed beards.  Short / tight-ish suit pants are a major thing on guys.  Slender sport coats.
  5. Slender – It would appear that there are fewer really overweight people than in the states.
  6. Tiny Cafe – The coffee here is REALLY good but there’s not enough of it.  When I do get it, it’s in tiny cups.  And people drink it slowly… relaxed.  And when I try to go to the street to buy more – the shops are closed – or I need to sit down at a cafe to get some.  GIVE ME A COFFEE SHOP WITH TO-GO COFFEE, s’il vous plait!
  7. Tiny Food – Meals are smaller… could this be related to observation #6?
  8. Direct – People are direct… to the point… but most I’ve met are friendly – this, btw, breaks a stereotype I’ve heard before that most French people are rude.  Not so, in my limited experience.
  9. Subway Doors – The subways have doors that guard access to the tracks when the train is not present.  This, depressingly I was told is to prevent people from leaping in front of the trains.You can’t really see the doors in this shot – but they slide open when the train arrives.The subways have doors that guard access to the tracks when the train is not present.  This, depressingly I was told is to prevent people from leaping in front of the trains.
  10. dʒigabytes The word for GB… Gigabytes in french is pronounced with a soft G… as in Regime, or genre  – Gigabytes.  The sound is represented or denoted /dʒ/ when writing phonetically.

Word(s) of the day:

FrenchEnglishAudio
 Pouvez-vous m’aider, s’il vous plaît? Will you help me, please?
 Où sont les toilettes? Where is the bathroom?

Franco-Fact:

Got a crush on that hot person that just passed away?   In France you can marry a dead person – under French law, in exceptional cases you can marry posthumously, as long as you can also prove that the deceased had the intention of marrying while alive and you receive permission from the French president.  The most recent approved case was in 2017, when the partner of a gay policeman gunned down on Paris’s Champs-Elysees by a jihadist was granted permission to marry his partner posthumously.

Photo of the Day

Paris Trip Notes – Day 2

Sun 10-1Mon 10-2Tue 10-3Wed 10-4Thu 10-5Fri 10-6

Today, is Tuesday, October 3nd, 2017.  My first full day in Paris.  I’ll be spending most of it in the MongoDB office preparing for meetings.

Last night, I ventured out to the Eiffel Tower.  I stayed late at the office so by the time I got out and made my way to the tower, most of the shops, and cafes had closed.  I was famished… I walked around for about 30 minutes trying to find a place to grab a bite but only found a grocery store.  I’m ashamed to admit my first dinner in Paris looked a lot like the dinner I have at home on the go.  Not quite le grand repas I imagined… but tomorrow is another day.

Today, I’m excited to be digging into a few new opportunities at work.  The team here in Paris seem extremely focused and while most are new to MongoDB, they bring a wealth of experience and talent to the table.

Where in the world?

Here’s a map showing the location of the MongoDB office.  It’s just outside the city of Paris.

Word(s) of the day:

The funny thing about learning a new language is that it’s not necessarily difficult to memorize the new words you need in order to have a conversation.  No – it’s actually having the confidence to use these new words out there in the real world.  I find myself knowing the words I want to use but not being absolutely certain about my pronunciation so I hesitate.  The great thing I found so far about the French is that if you show an effort… they seem to appreciate it and take pity on you.

FrenchEnglishNotesAudio
 Où est le métro?Where is the subway? I’ll be staying a few kilometers from the MongoDB office, and I imagine, I’ll be taking the metro each day so I think this will come in handy.

Franco-Fact(s):

The French Army was the first to use camouflage in 1915 (World War I) – the word camouflage came from the French verb ‘to make up for the stage’. Guns and vehicles were painted by artists called camofleurs. 

Photo(s) of the Day:

Not an interesting mix of photos – not much time to sight-see just yet… hopefully we get to some time later today.

Sun 10-1Mon 10-2Tue 10-3Wed 10-4Thu 10-5Fri 10-6

Paris Trip Notes – Day 1

Today, I fly to France for a week of work.  What better way to capture memories of the trip than to create a blog post for each day.

Word(s) of the day:

FrenchEnglishNotesAudio
Bonjour!HelloGreeting used for the daytime. When it gets later, you may use Bonsoir. Bon means good. Jour means day. Soir means evening.
Parlez-vous Anglais?Do you speak English?Pretty important when you’re in a country who’s language you don’t speak. I may also whip out the Je ne parlez pas francais which means, I don’t speak french.

Franco-fact

France became a republic in 1792 as a result of the French Revolution against centuries of royal rule.  The Revolution started with the storming of the Bastille fortress on July 14th, 1789.  This event is celebrated every year all over France and is referred to as Bastille Day.

What is the Bastille, you ask?  (That’s what I asked.)

The Bastille was a political prison that was built in the late 1300’s to house criminals and enemies of the French state.

There were approximately 1000 revolutionaries that stormed the Bastille on that day and they were mostly craftsmen and store owners who lived in Paris.

The revolutionaries were members of a French social class called the Third Estate.   The First Estate was the clergy, the Second Estate was the nobility.

The reason they stormed the Bastille was primarily due in large part to massive famine, and extremely high bread prices… Hold up… Bread?  They rioted and overthrew the government because of bread?

Yep.  As it turns out, in the late 1700’s, the average french citizen’s diet was made up primarily of bread and soup.  According to Smithsonian.com – Prior to 1788 the average french wage earner spent half their income on bread.  Then, in 1788 and 1789, the grain crops failed and the price of bread shot up to over 88 percent of the average wage earners income.

Apps I used for this trip

I don’t speak french… I should probably say that right out of the gate. So – I thought might be good to get an app to help me learn the basics. There are PLENTY. I really focused on the reviews. I tried several free apps, along with downloading several podcasts but became frustrated with the quality and approach. Eventually I settled on 2 that I really feel are valuable.

SpeakEasy French

This app is from a company called PocketGlow. More information is available from http://pocketglow.com/sf.  

SpeakEasy French Navigation

There are two versions of this app… obviously, a free version and a paid version.  The free version did a great job for a very few number of phrases and words.  I liked the interface so I sprung for the paid version.  At $3.99, I must say it’s more than I usually spend so quickly – without more investigation but it hasn’t let me down.

The interface lets you navigation starting with categories of words, such as communication, emergency, borders/customs, etc.  I like this because the amount of time I have to memorize is limited.  Therefore, I’m more likely to need a reference in the moment.

DuoLingo

Duolingo feels a bit childish… but I have to admit, it’s effective.  The repetition and multifaceted nature of the learning methods are very effective.

Not sure how much I’m retaining – but it feels like it’s working… will keep you posted on progress.

Photo of the Day

Mike in the Airport
Waiting for the first leg of the flight. From PHL to JFK, JFK to CDG

Paris Trip Notes – Day minus 1

Sun 10-1Mon 10-2Tue 10-3Wed 10-4Thu 10-5Fri 10-6

Today, is September 30th, 2017.  T-minus 16 Hours and counting until I fly to Paris, France.  Not much planning or packing left to do.  I feel fortunate to be able to make the trip.  I’ll be visiting colleagues from MongoDB’s Paris Office and will be lending a hand with some customer meetings for the week.

I thought it might be interesting to share details of the trip with my friends and family through this blog… so here’s the plan:

Each day – I’ll be in France for 5 days, I’ll create a blog post and an accompanying video.  Each post will have some vocabulary words I’m trying to learn, some facts I’m trying to learn about France, a photo or two and a bit about the plan for the day and what I’ve experienced.

Like what you see?  Let me know in the comments – or on social media.  Want to know something about Paris, France – or want me to take a photo… let me know!

Word(s) of the day:

FrenchEnglishNotes
Excusez-moi, où est ___? Excuse me, where is ___? Definitely will have a need to find my way around… I’m thinking this one will pay off quickly.
Où se trouvent les toilettes?Where is the bathroom?And what would be more important that finding a restroom?

Franco-fact:

France is the world’s most popular tourist destination – some 83.7 million visitors arrived in France, according to the World Tourism Organization report published in 2014, making it the world’s most-visited country.

Photo of the Day

Ok – not a photo I took – but it’s a google street view of the MongoDB office in Paris.  See you in about 24 hours.

Sizing MongoDB: An exercise in knowing the unknowable

Into the Unknown: How many servers do I need?  How many CPU’s, how much memory?  How fast should the storage be?  Do I need to I shard?

As part of my job as a Solutions Architect, I’m asked to help provide guidance and recommendations for sizing infrastructure that will run MongoDB databases.  In nearly every case, I feel like Nostradamus.  That is to say, I feel like I’m being asked to predict the future.

In this article, I’ll talk about the process I use to get as close to comfortable with a prediction as possible – essentially, to know the unknowable.

Charting the Unknown

Let’s start out with some basic MongoDB Sizing Theory.  In order to adequately size a MongoDB deployment, you need to understand the following

  1. Total Amount of Data that will be stored.
  2. Frequently accessed documents
  3. Read / Write Profile of the application
  4. Indexes that will be leveraged by the application to read data efficiently

These four key elements will help you build what is known as the Working Set.  The Working Set is the total amount of data, plus indexes that you will try to fit into RAM.

Wait a minute…, how can it be unknowable?  How is it possible that I’m not be able to know my performance requirements?

Ok – this may be exaggeration, or at least a bit of hyperbole but if you’ve ever completed an exercise in MongoDB sizing for a live production application, you’ll completely agree or at least understand.

The reason I chose the word “unknowable” is because it’s literally impossible to know every possible data point required to ensure that your server resource meets or exceeds the requirements 100% of the time.  This is because most application environments are not closed.  They are changeable and in many cases we are at the mercy of an unpredictable user population.

The best we can hope for is close.  The rest, we will leave up to the flexible, scalable architecture that MongoDB brings to the table.

When it comes right down to it, there are a lot of things we know… or at least can predict with pretty good accuracy when it comes to an application running in production.  Let’s start with the data.  Here’s where we employ good discovery technique.

To understand how MongoDB will perform, you must understand the following elements:

  • Data Volume – How much data will our application manage and store?
  • Application Read Profile – How will the application access this data?
  • Application Write Profile – How will the data be updated or written?

Data Volume

How much data will you be storing in MongoDB?  How many databases?  How many collections in each database?  How many documents in each collection?  What size will the average document be in each of these collections?

This requires a knowledge of your applications, of course.  What data will the applications be managing?  Let’s start with an example.  People and Cars are elements of data to which that everyone can relate.  Let’s imagine, we’re writing an application that helps us keep track of a group of people (our users) and their inventory of cars.

To start, let’s look at the projected number of users of our application: How many users’ car inventories will we be managing with our application and database.  Assume we’re going large scale and we expect to take on approximately 1 billion users.  Each user will own and manage approximately 2-3 automobiles and a few service records for each car.

The better we understand our app – as data modelers, the better chance we have of deploying resources in support of the database that will match the application requirements.  Therefore, let’s dig a bit deeper to understand the data volumes and let’s look at the documents.  What do the People documents and the Cars documents actually look like?  In its most simple form, our document model may look something like the following.

PeopleCars: Avg Doc Size: 1024bytes, # of Docs 1b

In this example, I’m expressing the relationship between people and their cars through embedding.  This leaves us with a single collection for People and their Cars.  In reality, you may require a more diverse mix – so let’s include a linked example collection for service records.  Imagine for our purposes that each person will have on average 10 service records per person.

Service:  Avg Doc Size: 350bytes, # of Docs: 10b

Here’s what our architecture might look like visually:

Let’s do the math:  (1b users * 1024bytes) + (10b * 350bytes) =

1024000000000 + 3500000000000 = 4524000000000 = 4.524TB

Given the estimated users, their cars and their service records, we’ll be storing approximately 4.5TB of data in our database.   As stated previously, the goal in sizing MongoDB is to fit the working set into RAM.  So, to get an accurate assessment of the working set – we need to know how much of that 4.5TB will be accessed on a regular basis.

In many sizing exercises, we’re asked to estimateLet’s assume at any given time during any given day, approximately 30-40% of our user population is actively logged in and reviewing their data.  That would mean that we would have 35% * 1b users * 1024 bytes (user documents) plus 35% * 10b service docs * 350 bytes or…

(.35 * 1b * 1024) = 358400000000
plus
(.35 * 10b * 350) = 1225000000000
———————————————-
equals
1583400000000 or 1.6TB

The last bit of information we need to understand is the indexes we’ll maintain so that the application can swiftly, and efficiently access the data.  Let’s assume that for each People and each Service Record document we’ll create an index document that’s approximately 100bytes in size… or 11b * 100 bytes = 1.1TB

So our total working set will consist of 1.6TB of frequently accessed data and 1.1TB of index for a total working set size of 2.7TB.

Application Read / Write Profile

Your application is going to be writing, updating, reading and deleting your MongoDB data.  Each of these activities is going to consume resource from the servers on which MongoDB is running.  Therefore, to ensure that the performance of MongoDB is going to be acceptable, we should really understand the nature of these actions.

How many reads?  How frequent?

Understanding how many reads, and what data you’ll be accessing as well as how frequently you’ll be reading is critical to ensure that your databases have enough memory to store these frequently accessed documents.

In many cases, knowing this will require estimation.  Here is where we’ll attempt to know the unknown.

In our example application, assume we’ll have an active user population of anywhere between 30% and 40% of the total number of users in our database, or 35% of 1 billion users which will be 350,000,000 users.  Let’s finish out the math.  With 350m users, that means MongoDB will be regularly accessing 350m user documents each approximately 1k in size.  Additionally, each user will likely be accessing their service records – so – 350m users – each having 3 cars with at least 5 service records – let’s assume each accessed the system, thereby causing the application to fetch all of their People documents (350m) and all of their Service Record Documents (3 cars * 5 service records each at 350bytes each).

350m users * 3cars * 5 service records * 350bytes = 1837500000000bytes or 1.83tb

How many writes?

As important as reads are, so too is understanding how many writes, and what the size and frequency will be.  This will probably be the most important factor that will determine the disk IOPS rating you will need to support your use case.

If we continue our imaginary example, you can probably guess that the application, as I’ve described it will not provide a great deal of write workload.  People looking at their car inventories, and reviewing their service records doesn’t exactly sound like a high bandwidth, low-latency requirement.

However, it will be in our best interest to do the math to ensure our infrastructure can support our workload.

Let’s ask some questions.  Regardless of the actual details of your application, the questions are always the same.  What is the data?  How often will it change?  How does this change impact the total data stored?

In our example case the questions will be as follows:

  • How often will users be added?
    • 1m users per day
    • With 1m user additions, we’ll be looking at a daily incremental storage requirement of 1m * 1024bytes or 1GB.  This incremental value is likely negligible for most disk subsystems.
  • How often will service records be added?
    • 10m service updates per day
    • With 10m service updates, we’ll need to support a daily incremental storage requirement of 10m * 350bytes or 3.5GB.  Again – not monumental.

With both people and service records, we’re going to need to ensure that our infrastructure can support a write profile of at least 5GB per day.  The next logical question to ask is WHEN are these updates completed?  Based on what we know about our data and our application, the users will most likely come in at random periods – but let’s say we don’t want to make any assumptions and we want to understand what kind of load this will place on our disks.

We typically measure write performance in terms of IOPs – Input Output Operations Per Second and to understand how much data we’ll be able to move in terms of IOPS, consider the following:

IOPS*TransferSizeInBytes=BytesPerSec

Let’s take a look at what modern disk subsystems can accomplish in terms of IOPs.

  • HDDs: Small reads – 175 IOPs, Small writes – 280 IOPs
  • Flash SSDs: Small reads – 1075 IOPs (6x), Small writes – 21 IOPs (0.1x)
  • DRAM SSDs: Small reads – 4091 IOPs (23x), Small writes – 4184 IOPs (14x)

For this exercise, we’ll assume the fastest disks available for random, small write workloads: DRAM SSDs.

To Shard or Not to Shard

In order for us to determine whether or not we will need to shard, or partition our database, we need to figure out whether or not we’ll be able to provision a service with enough RAM to support our working set.

Do you have servers with in excess of 2.7TB of RAM?  Probably not.  Then let’s take a look at sharding.

What is sharding?
Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput.

The most common goal of sharding is to store and manipulate a larger amount of data at a greater throughput than that which a single server can manage.  (You may also shard, or partition your data to accomplish data locality or residency using zone-based sharding… but we’re going to leave that for another article.)

To determine the total number of partitions we’ll roughly divide the total required data size from our working set by the amount of memory available in each server we’ll use for a partition.

If you’re fortunate enough to be ordering server hardware prior to deployment of your application, make your server order so that each server has the most amount of ram you can afford.  This will limit the number of shards and enable you to scale in the future should it be required.

For the sake of this exercise, let’s assume our standard server profile is equipped with 256GB of RAM.  In order to safely fit our working set into memory, we would want to partition the data in such a way that we created (2.7TB/256GB) or 11 partitions (rounded up, of course.)

In future articles, we’ll discuss in further detail the process of determining exactly how to partition or shard your data.

Conclusion

In summary, we’ve answered the question, how do we go about sizing for a MongoDB deployment – or – how do I go about coming to know the unknowable?  We looked at the data, and the access patterns of that data.  We worked through an example and found that there are really no shortcuts – we must understand the data and how it will be manipulated and managed.

Lastly, we came to a conclusion – an educated guess about the number of servers, and the amount of RAM that will be required for each.  I want to stress that Sizing MongoDB is part art, and part science.  You can rarely, if ever get all of the facts so to bridge the path of uncertainty, we use educated guesses and then we test… we search for empirical data to support our hypotheses and we test again.  You will do yourself a great disservice if you try to size your MongoDB deployment and you neglect this fact.  You must test your sizing predictions and adjust where you see deviations in the patters associated with your application – or your test harness.

If you have a challenge or a project in front of you where you need to deploy server resource for a new MongoDB deployment, let me know.  Reach out via the contact page, or hit me up on LinkedIn and let me know how you’re making out with your project.