The Cassandra data model can be difficult to understand initially as some terms, similar to those used in the relational world, can have a different meaning here, while others are completely new.
A keyspace is the container for tables in a Cassandra data model. A table is the container for an ordered collection of rows. Rows are made of a primary key plus an ordered set of columns, themselves made of name/value pairs.
There is no need to store a value for every column each time a new row is stored. Cassandra can hold wide rows with lots of columns (up to millions of columns!...) It can also hold many rows with a smaller set of columns.
The primary key is a composite made of a partition key plus an optional set of clustering columns. The partition key is used to determine the nodes on which rows are stored, and it can consist of multiple columns. The clustering columns control how data is sorted within a partition. Cassandra also supports static columns, storing data that is not part of the primary key, but shared by every row in a partition.
When a column is created, a data type is defined to constrain the values stored in that column. Data types include character and numeric types, collections, and user-defined types. Three types of collections can be defined: sets, lists, and maps. A column also has other attributes: timestamps and time-to-live. A timestamp is generated for a column value, each time it is created or updated, to resolve any conflicting change to the value. The time-to-live (TTL) is used to indicate how long to keep the value.
A secondary index is an index on any columns that is not part of the primary key. Since Cassandra partitions data across multiple nodes, each node must maintain its own copy of a secondary index for the rows it stores. Therefore, secondary indexes are not recommended on columns with high cardinality or very low cardinality, or on columns that a frequently updated or deleted.
Joins cannot be performed at the database level. If there is need for a join, either it must be performed at the application level, or preferably, the data model should be adapted to create a denormalized table that represents the join results.