Saturday 17 September 2016

BIG DATA ADVANCED ANALYTICS : ELASTICSEARCH FOR HADOOP

This post discuss about hadoop and Elasticsearch.
Let me commence with a brief introduction about hadoop and Elasticsearch( as many starters would not know about Elasticsearch).

Elasticsearch :
Elasticsearch is a great tool for document indexing and powerful full text search. Its JSON based Domain Specific query Language (DSL) is simple and powerful, Elastic’s ELK analytics stack is gaining momentum in web analytics use cases for these reasons:
  • It is very easy to get a toy instance of Elasticsearch running with a small sample dataset.
  • Aplication developers are more comfortable maintaining a second Elasticsearch instance over a completely new technology stack like Hadoop.

What about hadoop ?
HDFS separates data from state in its node architecture, using one over-arching node that manages state for the entire cluster, and several daughter nodes that store only data. These data nodes execute commands from their master node and log all operations in a static file. This allows a replica master to quickly recreate the state of the system without needing to talk to another master node during fallback. This makes the system extremely fault tolerant, and prevents the split-brain scenario that causes data loss amongst masters that must communicate with each other to restore state. 

ELASTICSEARCH FOR HADOOP: 

Implementing a Hadoop instance as the backbone of an analytics system has a steep learning curve, but it’s well worth your effort. In the end, you’ll be much better off for its rock solid data ingestion and broad compatibility with a number of third party analytics tools, including Elasticsearch. There are couple of advantages when it comes Elasticsearch in hadoop , namely, 

  • Speedy Search with Big Data Analytics.
  • Seamlessly Move Data between Elasticsearch and Hadoop.
  • Visualize HDFS Data in Real-Time with Kibana.
  • Second Search Queries and Analytics on Hadoop Data.
  • Hadoop's enhanced security includes basic HTTP authentication.
  • Works with Any Flavor of Hadoop Distribution.         
Hadoop also has a broad ecosystem of tools that support bulk uploading and ingestion of data, along with SQL engines to support the full querying power you expect from a standard database. On the other hand, it can be argued that standing up Hadoop, Zookeeper, and a Kafka ingestion agent requires as much domain specific knowledge as Elasticsearch. Thus, the raw power and stability of Hadoop comes at the price of heavy setup and maintenance costs.

BIG DATA ADVANCED ANALYTICS : ELASTICSEARCH FOR HADOOP

This post discuss about hadoop and Elasticsearch.
Let me commence with a brief introduction about hadoop and Elasticsearch( as many starters would not know about Elasticsearch).

Elasticsearch :
Elasticsearch is a great tool for document indexing and powerful full text search. Its JSON based Domain Specific query Language (DSL) is simple and powerful, Elastic’s ELK analytics stack is gaining momentum in web analytics use cases for these reasons:
  • It is very easy to get a toy instance of Elasticsearch running with a small sample dataset.
  • Aplication developers are more comfortable maintaining a second Elasticsearch instance over a completely new technology stack like Hadoop.

What about hadoop ?
HDFS separates data from state in its node architecture, using one over-arching node that manages state for the entire cluster, and several daughter nodes that store only data. These data nodes execute commands from their master node and log all operations in a static file. This allows a replica master to quickly recreate the state of the system without needing to talk to another master node during fallback. This makes the system extremely fault tolerant, and prevents the split-brain scenario that causes data loss amongst masters that must communicate with each other to restore state. 

ELASTICSEARCH FOR HADOOP: 

Implementing a Hadoop instance as the backbone of an analytics system has a steep learning curve, but it’s well worth your effort. In the end, you’ll be much better off for its rock solid data ingestion and broad compatibility with a number of third party analytics tools, including Elasticsearch. There are couple of advantages when it comes Elasticsearch in hadoop , namely, 

  • Speedy Search with Big Data Analytics.
  • Seamlessly Move Data between Elasticsearch and Hadoop.
  • Visualize HDFS Data in Real-Time with Kibana.
  • Second Search Queries and Analytics on Hadoop Data.
  • Hadoop's enhanced security includes basic HTTP authentication.
  • Works with Any Flavor of Hadoop Distribution.         
Hadoop also has a broad ecosystem of tools that support bulk uploading and ingestion of data, along with SQL engines to support the full querying power you expect from a standard database. On the other hand, it can be argued that standing up Hadoop, Zookeeper, and a Kafka ingestion agent requires as much domain specific knowledge as Elasticsearch. Thus, the raw power and stability of Hadoop comes at the price of heavy setup and maintenance costs.

Wednesday 24 August 2016

Evolutionary algorithm to tackle big-data clustering (in a nutshell )

Introduction:


Evolutionary Algorithms belong to the Evolutionary Computation field of study concerned with computational methods inspired by the process and mechanisms of biological evolution. The process of evolution by means of natural selection (descent with modification) was proposed by Darwin.Evolutionary Algorithms are concerned with investigating computational systems that resemble simplified versions of the processes and mechanisms of evolution toward achieving the effects of these processes and mechanisms, namely the development of adaptive systems.I will provide a brief introduction about genetic algorithm and its application in big data clustering problem.


Genetic algorithm: 

Genetic Algorithms (GAs) are adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetics. These are the following steps which involves in genetic algorithm:




  • Initialization : typically contains several hundreds or thousands of possible solutions. Often, the initial population is generated randomly, allowing the entire range of possible solutions (the search space) . Best way is to create select ("seed") with initial optimal solutions.
  • Selection: a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected.
  • Genetic operators:The next step is to generate a second generation population of solutions from those selected through a combination of genetic operators: crossover (also called recombination), and mutation. By producing a "child" solution using the crossover and mutation, a new solution is created which typically shares many of the characteristics of its "parents".
  • Termination:These steps are repeated until a solution is found that satisfies minimum criteria,fixed number of generations reached.

Employing Genetic algorithm to big data problem:

The clustering is one of the important data mining issue especially for big data analysis,
The goal of data clustering is to organize a set of n objects into k clusters such that objects in the same cluster are more similar to each other than objects in different clusters. Clustering is one of the most popular tools for data exploration and data organization that has been widely used in almost every scientific discipline that collects data. I will introduce the data clustering problem , main issues: (i) how to define pairwise similarity between objects? and (ii) how to efficiently cluster hundreds of millions of objects?. Usually k-means clustering is used but it has drawbacks when it comes into big data clustering.Here we will see how genetic algorithms may come in handy for big data clustering.
  • Initialization:For evaluating fitness you can use Davies-Bouldin index (https://en.wikipedia.org/wiki/Davies%E2%80%93Bouldin_index), after determining the fitness value of each possible solutions, all viable solutions are included in a set, each set contains a set of population with best fitness.
  • Selection:For selection you can use Tournament selection procedure (https://en.wikipedia.org/wiki/Tournament_selection). 
  • Genetic operators:  Genetic crossover  can be accomplished with concept of RB- Tree ( https://en.wikipedia.org/wiki/Red%E2%80%93black_tree ),is heavily used in algorithm design to make heavy optimization and create efficient bucket for data storage,now by cross over we will get hidden relationship between the disjoint sets henceforth we accomplished crossover by crossover of disjoint sets. 
  • Termination: A new population as generated replaces the older population. This population would again form a newer population using mating and selection procedure. This whole procedure would be repeated again and again until the termination condition is met. 
*The AnyScale Learning For All (ALFA) Group at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) aims to solve the most challenging big-data problems — questions that go beyond the scope of typical analytics. ALFA applies the latest machine learning and evolutionary computing concepts to target very complex problems that involve high dimensionality. http://news.mit.edu/2015/una-may-oreilly-evolutionary-approaches-big-data-problems-0114 *
  

Sunday 10 July 2016

NoSQL : Essentials and Short tutorial on Mongodb - Part 2

In this post , I am going to present a short and precise tutorial on Mongodb and it's important commands and it's advance features.

Documents and Collections 

Unlike the RDMS, Mongodb lacks rows, columns, tables, joins. Instead Mongodb employs documents and collections. Documents are name\value pairs and can store array of values, to present a clear picture , consider documents as rows and collections as tables when compared to traditional database.Collections store documents. Now let me create a document,
vehicle={ 
name: "uden"
type:"four-wheeler"
}
company:"ferrai"
}
The below example shows how to create document using Java:
List<Integers>vehicle=Arrays.aslist();
DBObject owner=new                       BasicDBObject("name","uden") .append("type","four-wheeler")
.append("company","ferrai");
Documents could have nested documents, here company is the nested document within the vehicle document.

Performing Insert,Update,Delete and Query 

Let us see how to create a database and insert,update,
read,delete operations.Creating a document is discussed in the previous section.Creating database is done by using 
use udendb ,where udendb is the database name which I will use in this tutorial, you can name the database in whatever name you desire.
Now let us insert a document into the database using insert() command,
db.col.insert({"vehicle"} , here col is collection name where Mongodb creates Collections by default.
You can also add several documents by passing it as arrays.After inserting a document into the database, you can check the database by using,
show dbs 
There will be a default database know as test which stores collections if you don't wish to create a database.
Update: In order to update values, update() method is used, consider the following example,
db.col.update({'company':'ferrai'},{$set:{'company':'mahindra'}},{multi:false})
here col is the default collection name and I have updated my company from ferrai to mahindra and if you want to update value to multiple documents, then you need to set multi to true, in my example I have multi to false since I want only to update a single document.
Query: To query a document, you can use pretty() method along with find() which gives results in a structured way, for example:
db.col.find(). pretty()
You can also find according to criteria or some condition by using AND denoted by , (comma) between the values and by using OR denoted by $or.
dB.col.find({"color":{yellow},$or[{"by":"ferrai"},{"company":"mahindra"}]}).pretty()
Here the results will display vehicle in yellow and whose company  is either mahindra or by ferrai.
Delete: Delete operation is a simple task which can be achieved by using,
remove() which removes all documents.
remove(name of desired document to be deleted) which removes according to instruction give within in the parenthesis.


Using GridFS

GridFS is a file system which can be used for retrieving and storing files such as images , videos etc. It is mainly used to store large size of files more than more than 16 mb. 

fs.files is used to store metadata in files.

fs.chunks is used to store chunks , where each chunk is given objectID.

Now I will show you how to add an image file using GridFS,
mongofiles.exe -d gridfs put image.jpg
This command is typed in the mongofiles.exe in the bin directory, GridFS is the database name and put command stores the image file in gridfa database.

Mongodb Mapreduce 
Mapreduce is a large data processing tool which is also supported by Mongodb.The syntax command is,
mapreduce( function() {emit(key,value);}
                      function(key,values) 
                       {return reduceFunction},{
                        out: collection, query: document,
                         sort: document, limit:number } }
Let me explain with an example,I am going to collect all cars which are yellow in colour and group them under ferrai company and then count number of cars manufactured by ferrai.. Consider the document created in the first section (vehicle). Now the mapreduce would be,
mapreduce( function() {emit(ferrai,1);)}
                        function((key,values){return Array.sum(values)},
{query:{colour:"yellow"},
 out:"total ferrai cars"})
The result will be,
{ results : "total ferrai cars"
counts:{"input":18,"emit":3,"reduce":16,"output":2};}
The result shows 18 documes matched the query yellow and emitted 3 accoridmg to key value and finally reduce function grouped same values into 2.Hence there are two cars manufactured by ferrai.

Mongodb Text search
Mongodb Text search enables to search specified words.Let me show you how to search text with help of an example,
Consider the document vehicle which was created in the first section,I want to search for ferrai word under the nested document company. First you need to create text index by using this command,
db.vehicle.ensureIndex({company:"text"})
Now we can search the text using,
db. vehicle.find({$text:{$search:"ferrai"})


Conclusion: I covered the basics commands and some of its advanced features, I provided the links below for resources if you want detailed information.

  • For more on create ,read,update and bulk write https://docs.mongodb.com/manual/crud/
  • To download Mongodb http://www.mongodb.org/downloads
Useful books to learn 







Monday 13 June 2016

NoSQL : Essentials and Short tutorial on Mongodb - Part 1

This post will provide insights into NoSQL and as continuation, in next post I will provide short and precise tutorial on Mongodb .

About Databases : Before diving into NoSQL , let me discuss the importance of databases.Databases are fundamental systems found in every organisations, databases provides storage,retrieving,manipulating and analysing data.Databases are very vital to business because it radically enhances the advantage of data offers.In other words , a database converts data into meaningful information.In my point of view, storing isn't hard to achieve but deriving value or information from stored data is a tedious process and optimal solutions are hard to find. 


RDBMS:The relational model was proposed by E.F.Codd’s 1970 paper "A relational model of data for large shared data banks" which made data modeling and application programming much easier. Relational model is well-suited to client-server programming.


NoSQL :NoSQL encompass modern techniques such as simple design,enabling storage of enormous amount of data in database management.NoSQL has become a popular architecture for handling large data volumes, because they can be more efficient with regard to the processing power required to handle large files.Relational databases simply don't perform well unless they are given structured data.NoSQL databases are new enough that many database engineers will have some difficulties in handling. Hence emergence of NoSQL databases applications such as Mongodb,Neo4j will make things easier and provide developers with agility and flexibility.In the RDBMS , the developer needs to design the data schema from the outset, and SQL queries are then run against the database and if that application/database undergoes any changes such as updation , the developer must be contacted again.Most NoSQL databases are open source and come with built-in communities.According to Daniel Doubrovkine, Art.sy's head of engineering, states thatNoSQL databases like MongoDB are simple to start with and get more complex over time. The syntax of NoSQL differs from SQL and needs some training to naive users. However NoSQL proivdes plenty of online checking forums and documentations. Migrating to NoSQL can be done through writing a bunch of SELECT * FROM statements against the database and then loading the data into your NoSQL document [or key/value, column, graph] model using the language of your choice. And you can rewrite into NoSQL statetments by using insert(),find().
Personal user information, social graphs, geo location data, user-generated content and machine logging data are just a few examples where the data has been increasing exponentially,hence SQL is not suitable for these types of data. CAP theorem is the principle when you talk about NoSQL databases or in fact when designing any distributed system.
CAP theorem states that there are three basic requirements: Consistency,Availability,Partition Tolerance



Practically speaking , it is impossible to satisfy all three requirements.

NoSQL Categories:
Each of the these categories have their limitations and attributes.

Key-Value: Key-Value are designed to handle large data and stored in form of  hash tables and it stores values like string,JSON,etc. For example "ANIMAL" is the key for the value"Lion" .Key-Value satisfies availability and partition of CAP theorem.

Column-oriented databases: stores data in column and every column is considered as individual one.Example :Simpledb,Cassandra,bigtable.

Document-oriented databases: collection of documents and data is stored in these documents. Example: Mongodb,Couchdb.

Graph-database:stores data in graph and each node represents entity and edge represents between these two entities or nodes.For example Orientdb,Neo4J.








Saturday 4 June 2016

DEVELOPMENTS AND PROGRESS IN APACHE HADOOP HDFS

About HDFS:
HDFS is a Java-based file system that provides scalable and reliable data storage, and it was designed to span large clusters of commodity servers , in other words it is a distributed Java-based  file system for storing large volumes of data.HDFS forms the management layer along with the YARN.HDFS has demonstrated production scalability of up to 200 PB of storage and a single cluster of 4500 servers, supporting close to a billion files and blocks.By distributing storage and computation across many servers, the combined storage resource can grow linearly with demand while remaining economical at every amount of storage.HDFS entails following features which ensures high availability , fault tolerance , scalability and efficient storage of data.

Rack awareness : Takes node's physical allocation                                          for scheduling tasks.

Standby NameNode: Main component for providing                                            redundancy and high availability

Less Data Movement: Processing of tasks takes place in the physical node where the data resides , hence data movement is reduced ,increasing high aggregate bandwidth


Features of HDFS


PROGRESS IN HDFS:

  • HDFS version 2.3.0 provides centralized cache management heterogeneous storage, implementation of openstack swift and HTTP supports
  • Version 2.4.0 provides metadata compatibility, rolling upgrades (allows upgrading individual HDFS daemons).
  • Version 2.5.0 provides incremental data copy , extended attributes for accessing metadata.

Monday 2 May 2016

Apache Spark as a service -The essentials and overview of services by IBM and Databricks

Spark  provides the flexibility you need to succeed, focusing on simplifying your time to deployment, making your business users self-sufficient, and accelerating your time to value. I have discussed what spark as a service has to offer:

Introduction about Spark 
Spark is an open source analytic engine  which is used for rapid large-scale data processing in real time. Spark provides iterative and interactive processing .Spark can be regarded as alternative for mapreduce . Apache Spark can process data from data repositories, including the Hadoop Distributed File System (HDFS) and Amazon Simple Storage Service (S3). Spark can process even unstructured data too.The main advantage is that Spark can support in-memory processing and disk based processing .Spark can be  incorporated with high level programming such as Java and other languages such as Scala,R which makes it popular among data scientists to make analytical applications.Spark make use of in-memory to its full advantage where recently read data are placed in-memory allowing faster query execution.

Spark vs Mapreduce 

  • SQL queries are executed much faster in Spark than Mapreduce.
  • Spark runs on hundreds of nodes whereas Mapreduce can run on thousands onf  nodes.
  • Mapreduce is ideal for batch processing ,Spark is ideal for real time processing since it uses in-memory storage and processing.
Spark SQL 
Spark sql is a component in Spark big data framework which allows sql queries on data.It can query structured ,columnar,tabular data in Spark.It can be integrated with other hadoop database like Hive allowing interaction with hadoop HDFS.

Spark's MLlib 
MLlib contains functionalities such as statistics, regression,classification ,filtering which are needed for machine learning and real time analysis.

Spark's Flagship
Spark flagship converts streaming of big data into data  streams where analysing, process takes place with these data streams using Spark stream module.
Spark GraphX
This component provides processing on big data graph such as changing edges , vertices in a graph dataset.

Now let us see what factors contribute Spark as a service in other words why we can consider Spark as a service.

Spark-as-a-service 
Spark is provided as a cloud based service by IBM,Databricks because of its advantages over other processing frameworks such as Mapreduce (which I discussed earlier). Companies like databricks allows you faster deployment of Sparks.Databricks eases the process such as cluster buliding and configuration. Process monitoring , resource monitoring are taken care by Databricks. IBM's Spark-as-a-service offers new API and tac and cognitikles unstructured data and follows IBM analytics on Spark on IBM bluemix.IBM DataCap Insight Cloud services delivers data science based on external data about events, people.The company said that services doesn't require deep knowledge of big data analysis.Whenever we come across IBM and cognite computing ,we think of WATSON but this IBM Datacap Insight provides a way to handle unstructured data (i.e. machine- unfriendly data).



Saturday 2 April 2016

Best ways to tackle Big data in R

Big data contains millions of data that can be processed using R.Even though R provides few packages to support big data, extra effort is needed.Map reduce algorithms can be created using R for analysing data (Refer:The Art of R Programming - O'Reilly Media).
The original data set size could increase and lead to a bigger object during the analysing process.

I have listed out best ways to handle big data in R.

  • Divide and conquer 
Large datasets could be  divided into smaller subsets and then each of these smaller subsets are worked upon.And faster solutions and different strategies can be obtained through parallel processing of these smaller subsets of data.

  • Memory and hardware
Every data object is stored in memory by R.Therefore for better performance, machines should have higher capacity of  memory. 8TB along with 64-bits machines are best suitable to work with R.Another way is to make use of the packages such as "ff" and "ffbase".These packages doesn't store data in memory." Scale R" provides a variety of a algorithms for analysing data.

  • Incorporating programming languages like Java or c++
Sometimes, components of a program in R can be integrated with high level languages such as Java (or)  c++ for efficient and better performance. rJava combines R and Java and regarded as connection packages (refer: Advanced R development by Hadley Wickham).Renjin is an open source project with consists of altered R interpreter within JVM.Oracle R also uses R-interpreter with variety of Mathematicals functions and libraries.


Saturday 26 March 2016

Difference between Arrays and Arraylists and Best ways to print int array,byte array,array of strings,two dimensional array and also array of array in Java

Arrays for data structure which acts as a containers that holds collections of variables of same type, in other words ,size collection of elements of the same data type. Variables stored in arrays are in continues memory locations.Before the advent of arraylists , arrays were used by programmers to store huge amount of variables 
Let us first see the differences between array lists and arrays :

  • Arraylists are dynamic in size where the size of an arraylist can grow when new elements are added while arrays are fixed in size.
  •  Arrays contains both objects as well as primitives whereas array list contains only objects.
  • Loops such as For loops are used iterate elements and iterators can be used in arralylists.
  • Arraylists ensures type safety through generics whereas arrays are homogeneous.
  • Elements are added by assignment operators and add() method is used in Arraylists to add elements. 
  • Arrays are mutli-dimensional where as Arraylists are single dimensional.
In order to consturct an array , you should new , then type of array and then specify the size of that array using open and closed  brackets. For example :
new int[6] //here type of array is int and size is 6.


Now let us see the different ways to print int array,byte array,array of strings,two dimensional array and also array of array.
  • Printing int array : 
We can use Arrays.toString(int array), for example : 
int [] odd = {3,5,7};
System.out.printing("odd numbers are"+Arrays.toString(odd));

  • Printing byte array :
Converting String into a byte array is commonly used employed in Java Cryptography Extension encryption. To convert byte array into string , create a string object and delegate the byte array to it.For example : 
String example="hello";
bytes[] bytes = example.getBytes();
String s = new String(bytes);
System.out.println("text decrypted:"+s);
  • Print an array of strings :
There are two ways of printing array of strings. One way is use a FOR loop, for example:
String [] cars=new Strings[3];
cars[0]="Ferrari";
cars[1]="maruthi";
cars[2]="Datsun";
for(int i=0;i<cars.length;i++)
{
System.out.println(cars[i]);

Another way is to use Arrays.toString().
  • Printing two dimensional array:
Here we use Arrays.deepToString, for example: 
String [][] greetings={{"hi","good morning "},{"hello"," good evening "}};
System.out.println(" planets"+Arrays.deepToString);

  • Printing array of array:
Array of same type can be stored into another array. Arrays.deepToString is used to print array of array , for example :

String []arr1=new String [] {"hi","hello"};
String []arr2=new String [] {"how are you"};
String [][] arrayOfArray=new String[][]{arr1,arr2};
System.out.println(Arrays.deepToString));


Friday 29 January 2016

Difference between arguments and parameters in Java

Parameter is defined in the method header whereas an argument is the instance passed to the method during run-time.
public class Parameters and Arguments {

    public static int divide(int a, int b) { //a, b are parameters here
         a=5;
         b=5;
        return a%b;

    }

    public static void main(String[] args) {
        int x=divide(a, b); //a, b are arguments here
        System.out.println(x);
    }

}
Here a and b are formal parameters which is declared in the method's header. 
Whereas the a and b becomes arguments in the point of invocation.

Wednesday 27 January 2016

Possible reasons for outOfMemory error and run time and compile time errors in java

outOfJava Heap space : Java is allowed for limited usage memory. Java is divided into two regions namely, Permanent generation and Heap Space. The size for these two regions are set by JVM. OutOfMemory occurs when you add more data to heap space when it is full.
Reasons :
  • Memory leaks: When you don't specify the memory by yourself. The JVM uses the Garbage collection (GC), here the unused items are cleared in memory and again made ready in other words it automatically checks for unused items and removes them.Memory leaks occurs when GC fails to recognize the unused items and fails to remove them.Thus the java heap space increases indefinitely.


Compile time error -Occurs when the code does not follow the Java semantics and syntactic rules
  •  a class tires to extend more than one class 
  • overloading or overriding is not correct
  • referring to a out scope variable
  • inner class has the same name with enclosing class name
  • when class is not abstract but the methods in it are abstract
  • a private member of class A is referenced by  another class B
  • when creating an instance of an abstract class
  • when change the value of the final member
  • when two class or instance have same name
  • missing brackets
  • missing semicolons
  • access to private fields in other classes
  • missing classes on the classpath (at compile time)
Runtime error -
  • using variable that are actually null (may cause NullPointerException)
  • using illegal indexes on arrays
  • accessing ressources that are currently unavailable (missing files, ...)
  • missing classes on the classpath (at runtime)

Thursday 21 January 2016

Hash table in java

Hashtables are efficient implementation of array data structure (an associative array) that stores key/value pairs and searched by the key value.It uses a floating-point value, a string, another array, or a structure as the index. A hashtable has 2 elements, a key set and a value set.And find a way to represent and keys should always map to the appropriate.Next use a hash function that is perfect for you.A hash function depends upon the criteria, could be anything.
C language is not provided with keyed arrays , in order to access an element is by its index number.
Eg:- students[66];
I have provided the hashtable code in java :
public class HashEntry {
      private int key;
      private int value;

      HashEntry(int key, int value) {
            this.key = key;
            this.value = value;
      }     

      public int getKey() {
            return key;
      }

      public int getValue() {
            return value;
      }
}

public class HashMap {
      private final static int TABLE_SIZE = 100;

      HashEntry[] table;

      HashMap() {
            table = new HashEntry[TABLE_SIZE];
            for (int i = 0; i < TABLE_SIZE; i++)
                  table[i] = null;
      }

      public int get(int key) {
            int hash = (key % TABLE_SIZE);
            while (table[hash] != null && table[hash].getKey() != key)
                  hash = (hash + 1) % TABLE_SIZE;
            if (table[hash] == null)
                  return -1;
            else
                  return table[hash].getValue();
      }

      public void put(int key, int value) {
            int hash = (key % TABLE_SIZE);
            while (table[hash] != null && table[hash].getKey() != key)
                  hash = (hash + 1) % TABLE_SIZE;
            table[hash] = new HashEntry(key, value);
      }
}