Get to Know Cassandra

Chris / March 12, 2013 in 

Last week we kicked off our Meet Cassandra series – with a short introduction to meet and greet. Today, we’ll begin getting to know her by immediately get hands on with data modeling. Download Cassandra 1.2.2 and follow the below screencast to get it running.

Data modeling with Cassandra (C*) is dramatically different than with RDBMS. With RDBMS, a normalized model is created without considering the exact queries it will service, as SQL can return just about anything through joins. With C*, the data model is designed for the specific queries it will service; as new queries are introduced, the schema is adjusted accordingly. C* has no concept of joins, relationships, or foreign keys. Instead, a separate table is typically leveraged per query, and data required by multiple tables is denormalized across those tables.

/wp-content/uploads/2013/03/CassandraTwitterPic.jpgLet’s consider a simple “Twitter-clone” application, focusing on two queries:

  1. What information does a user have in his/her profile?
  2. What tweets has a user posted?

Let’s make a model for these queries with Cassandra Query Language (CQL) from the cqlsh prompt where our screencast ended. While CQL mimics SQL, don’t let it lure you into thinking it has all the querying power of SQL – it truly contains just a subset. We need a Keyspace to house our tables. Create one with:

CREATE KEYSPACE demo WITH replication = {'class':'SimpleStrategy', 'replication_factor':3};
USE demo;

Don’t worry about the specifics of the above commands for now. Next, let’s create our tables:

CREATE TABLE users (
email varchar,
bio varchar,
birthday timestamp,
active boolean,
PRIMARY KEY (email)
);
CREATE TABLE tweets (
email varchar,
time_posted timestamp,
tweet varchar,
PRIMARY KEY (email, time_posted)
);

Our users table is keyed by user email, and has columns for bio, birthday, etc. We have specified a variety of data types for columns, (varchar, timestamp, …); these are but a subset all the supported types. The tweets table is keyed by both the email of the user who posted the tweet and the time the tweet was posted, and contains a column for the tweet message itself. Now let’s insert some data into the users table and read it back. Inserts are similar to SQL:

INSERT INTO users (email, bio, birthday, active) VALUES
('john.doe@bti360.com', 'BTI360 Teammate', 516513600000, true);

In case you’re wondering, timestamp fields are specified as milliseconds since epoch, hence the big numbers. Exit cqlsh with quit; then add in some sample data from this file with cqlsh --file <file>. You might see some errors regarding keyspaces/tables already created, ignore them. Re-enter cqlsh, and run SELECT * FROM users; and SELECT * FROM tweets; We should see something similar to:

USERS

emailactivebiobirthday

new.user@gmail.com

True

null

null

john.doe@bti360.com

True

BTI360 Teammate

1986-05-15 00:00:00-0400

bob@yahoo.com

False

Hi.

null

TWEETS

emailtime_postedtweet

new.user@gmail.com

2013-02-09 13:38:55-0500

What’s a tweet?

john.doe@bti360.com

2013-02-09 13:30:01-0500

VT FTW

john.doe@bti360.com

2013-02-09 13:43:48-0500

Go orange.

Our select statement for users show nulls where values were omitted for fields, and tweets shows new rows for each tweet for a user. In reality, cqlsh is lying to us. It is presenting the data in a Relational-esque view. Those nulls don’t actually exist, and neither do the extra rows.

Check out our next blog entry when we go behind the scenes and dive deeper into data modeling with Cassandra to see what sets her apart!

Previous

Have You Met Cassandra?

Next

Diving Deeper with Cassandra

Close Form

Enjoy our Blog?

Then stay up-to-date with our latest posts delivered right to your inbox.

  • This field is for validation purposes and should be left unchanged.

Or catch us on social media

Stay in Touch

Whether we’re honing our craft, hanging out with our team, or volunteering in the community, we invite you to keep tabs on us.

  • This field is for validation purposes and should be left unchanged.