Earthquake – Facebook make its own datacenter open-source!!!

Leave a comment

When you run slowly, you don’t know how the world will be changed by Giant!
Facebook may change the game again, refer to Chinese Version of news.

The orignal news in Facebook site, Open Compute Resources.
Facebook of Open Compute Site!!!

Catch up with the wave!!!the most interesting topic!!!


Cassandra Client

Leave a comment

  1. For quick introduction of Cassandra Data Model : Cassandra Data Model of Max Version
  2. CLI API Exercise :Note 1 – For every command or statement you enter into the CLI, make sure you enter a semicolon at the end before hitting the return key. If you forget to do this, the CLI echos an ellipsis ( . . . ), which indicates that the CLI expects more input – such as a semicolon, or names and values in other cases.

Connect to “Test Cluster”

$ ./cassandra-cli -host localhost -port 9160

Create KeySpace

[default@unknown] create keyspace twissandra with replication_factor=1and placement_strategy=’org.apache.cassandra.locator.SimpleStrategy’;
More reference of KeySpace, you could find in the following link – Cluster about Key Spacing

Creating a Column Family

[default@unknown] use twissandra;
Authenticated to keyspace: twissandra
[default@twissandra] create column family users with comparator = UTF8Type
…     and column_metadata = [{column_name: password, validation_class:
…     UTF8Type}];
[cc792bbc-5ecb-11e0-9ff3-f038a2a19b21] <= output schema of password

Similar commands to create the columns families for Twissandra tweets, followers, userline and timeline would look like the following:

[default@twissandra] create column family tweets with comparator = UTF8Type and column_metadata = [{column_name: body, validation_class:UTF8Type}, {column_name: username, validation_class: UTF8Type}];
[5385025d-5ecc-11e0-9ff3-f038a2a19b21] <= output schema of tweets

[default@twissandra] create column family friends with comparator = UTF8Type;
[61c0f5ee-5ecc-11e0-9ff3-f038a2a19b21]<= output schema of friends

[default@twissandra] create column family followers with comparator = UTF8Type;
[6d62ddaf-5ecc-11e0-9ff3-f038a2a19b21]<= output schema of followers

[default@twissandra] create column family userline with comparator = LongType and default_validation_class = TimeUUIDType;
[78bac420-5ecc-11e0-9ff3-f038a2a19b21]<=output schema of userline

[default@twissandra] create column family timeline with comparator = LongType and default_validation_class = TimeUUIDType;
[8504d2c1-5ecc-11e0-9ff3-f038a2a19b21]<=output schema of timeline

Inserting and Retrieving Columns

[default@twissandra] set users[‘jsmith’][‘password’]=’ch@ngem3′;
Value inserted.
[default@twissandra] get users[‘jsmith’];
=> (column=password, value=ch@ngem3, timestamp=1301929424925000)
Returned 1 results.

Note 2 – For all CLI write and read operations such as these example commands, the consistency level is ONE. Different consistency levels are not available with the CLI, though all levels are available when writing/reading programatically.

Cassandra Cluster

Leave a comment

1, Installation Cassandra on Windows Host

As configuration location will be read out via Environment variable CASSANDRA_HOME, so in linux or windows environment, please set CASSANDRA_HOME in running process accordingly.

2, Cluster Configuration

* tokens calculation for new adding nodes
python version [in python 3.2]

def tokens(nodes):
for x in range(nodes):
print (2**127/nodes*x)

Java version – BigInteger [stored in virtual ubuntu system]

import java.math.BigInteger;

public class PythonCal{
public static void main(String[] args){
int iCodes = Integer.parseInt(args[0]);
BigInteger t = BigInteger.valueOf(2).pow(127).divide(BigInteger.valueOf(iCodes));
for(int i=0; i<iCodes; i++){

* seeds – ip address of the initial tokens server
* listen_address and rpc_address will be filled with

3, load balance – $CASSANDAR_HOME/bin/nodetool -h -p 8080 ring
Current status

Starting NodeTool
Address Status State Load Owns Token
122378085476254521922132626725931733214 Up Normal 10.88 KB 28.07% 0 Up Normal 6.54 KB 71.93% 122378085476254521922132626725931733214

Next two steps :
1, Load balance for clusterCluster operations
Refer to Ban,Black’s presetation –

Final solution is based on development wiki – Cluster Maintenance
* $CASSANDRA_HOME\bin\nodetool -h -p
* Autobootstrap & Automatic token assginement
* $CASSANDRA_HOME\bin\nodetool -h -p loadbalance [ -> result as follow ]

Starting NodeTool
Address Status State Load Owns Token
122378085476254521922132626725931733214 Up Normal 15.27 KB 50.00% 37307493746019906056288974867989680350 Up Normal 10.9 KB 50.00% 122378085476254521922132626725931733214

Pay attention to Token of initial_tokens : 0 node

* $CASSANDRA_HOME\bin\nodetool -h -p netstat [ -> previous : streams – in/out stream for ring ]

Starting NodeTool
Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name Active Pending Completed
Commands n/a 0 1
Responses n/a 0 2476

Regarding ring’s maintanence, refer to section – Bootstrap & node token in ring

2, Next Steps: Cassandra Clients
* Java: Hector Client API
Hector provides Java developers with features lacking in Thrift, including connection pooling, JMX integration, failover and exentsive logging.
* Python: Pycassa Client API
Pycassa is a Python client API with features such as connection pooling, SuperColumn support, and a method to map existing classes to Cassandra column families.
* PHP: Phpycassa Client API
Phpycassa is a PHP client API with features such as connection pooling, a method for counting rows, and support for secondary indexes.

mongoDB 1.8.0 via git source

Leave a comment

As reference of cassandra, mongoDB is a kind of No-SQL database based on JSON documentation storage.mongoDB Main Release is based on C++ implementation, feature lists:
1, Document-oriented storage » JSON-style with dynaimic schemas
Schema Design run on top of BSON format.
2, Full Index Support » Indexes enhance query performance, often dramatically.
3, Replication & High Availability » Master-Slave Replication & Replica Sets
4, Auto-Sharding » scales horizontally via an auto-sharding architecture.
5, Querying » fast query based on Query Expression Objects, and multi-external Query language interface support – Java, Perl, PHP, Python, Ruby, C# etc
6, Fast In-Place Updates » MongoDB supports atomic, in-place updates as well as more traditional updates for replacing an entire document.
7, Map/Reduce » batch processing of data and aggregation operations.
8. GridFS » specification for storing large files in MongoDB.

As a very simple start, I downloaded via Git source control of mongoDB into my virtual ubuntu 10.10 system [/home/app/08_mongoDB]. Using 1.8.0 stable version.
1, the building reference could be found as following link : Buiding for Linux
2, Before you try to start mongoDB, please check with Quickstart with Unix.
A little bit difference with traditional Database installation, the major db file location should be manually created as linux user access right.

In principle, as source code is avaiable, /home/app/08_mongoDB/mongo/version_20110320.txt could be considered as the version checking-up documentation and good example of distributed database implementation.

Take care! I preferred to cassandra as major open source example. Regarding mongoDB, it will be a good references when the topic – How multi-language will be supported.

Cassandra Development – NoSQL Database

Leave a comment

Cassandra 0.7.3 Wiki
Key features
1, Scalable – scalable capacity without down-time; better agility using schema-less data model
2, Reliable – P2P cluster nodes, supported via cross-data center replication
3, Durable – configurable write feature and fsync system, once completed, no further data loss even with hardware failure
4, Analytic without ETL – hadoop jobs
5, Performant – tunable consistency levels, multiple cache tuning

Start and Stop Cassandra
1, Start Cassandra via command line
sh bin/cassandra -f [-p pidfile]
-p file will record process id into current $CASSANDRA_HOME folder, then kill $(cat pidfile) to stop cassandra server
RunningCassandra Run Wiki Configuration

2, Mx4j library couldn’t be found in classpath
Classpath is in the $CASSANDRA_HOME\lib here, you can download JMX library to that folder.
JMX is a kind of service monitoring adn administration library.

Cassandra Installation and Overview

Leave a comment

1, Cassandra offical siteBranche of Apache open source site
* Download the source code from 0.7.3 source into ubuntu linux system [another system is also OK, as Cassandra is java-based application].
The offical get-start wiki link – Getting Started.

* Installation steps
JVM prepation – JDK 1.6.0_24 has been installed
ANT prepation – Apache Ant(TM) version 1.8.2 compiled on December 20 2010 has been installed [just binary version, in order to simplify installation 🙂 ]

  • /etc/profile
  • is modified as follow –

    * Unzip apache-cassandra-0.7.3-src.tar.gz into your target Cassandra folder, run ant to load build.xml file
    * Cassandra source code will be compiled and exported into build folder
    Tips: build.xml will download required libraries into [your source location]\build\lib, if they aren’t available. The source code will be converted into bit-code in [your source location]\build\class.

    Next steps for installation :
    * Cassandra startup/stop
    * git preparation for further usage

    2, Overview of Cassandra development infrastructure
    Index of development center – developer guideline
    In the official Cassandra site, please check Video/slides link – cassandra-explained
    More presentation of Cassandra – Eric Evans.
    More articles of Cassandra – researchs presentation

    3, Wiki introduction of Cassandra and Eric Evans
    Chinese introduction of Cassandra
    Eric Evans works as Software Developer at Apache Software Foundation. His working fields include : Debian Developer, Apache Cassandra committer, and System Architect at The Rackspace Cloud.

    Start with Cassandra

    Leave a comment

    Recently get touch with SAP In-Memory Database, play with SQLscripts, R, L and JDBC driver.
    Of course, the most exhausted experience is to build up HANA environment.

    Though SAP provided with sufficient details in BluePrint, I’m still wondering how to work this out with actual coding. You know, that’s “bad” habits of programmer….

    Finally decided to do something, after going through CSDN link – Megastore of Google.

    As the most popular distributed no-sql database inherited from Google’s BigTable, open source project Cassandra will be a very good entrance point to learn how-to for Non-sql(Mixed-Mode database). I guess, to some degree, SAP also learnt something from such idea.

    In section “distributed database” section of In-Memory database, mentioned a little bit.

    Anyway, journey starting…