Exploring the Vast World of the Internet in Human Existence

Table of Contents

An opening statement

Extracting information from large sets of data

Association Rule Mining

Hadoop

Literature Survey

Researchers Mohit K. Sikka and Geeta Sikka were involved in Multi-objective Genetics

Goals

Conclusion and future scope

The internet has taken up a greater part of the human experience. It is now a prominent part of almost every sector in the world. Internet has a number of advantages, including the ability to communicate quickly and transfer information via multiple channels.

Internet has become a tool for communicating information and knowledge. Internet has been used to exchange ideas and information. Most people use Social Networking sites to communicate with each other and share information. A Social Network is an interconnected network of people who are connected by interpersonal relationships.

Individuals use a lot data to exchange information, such as pictures and videos. Social network data refers to the data generated. These data are used to identify various aspects of society. Data mining involves looking at data from different perspectives to uncover the hidden truths. One of the significant task of data mining, helps in the discovery of associations, correlations, statistically relevant patterns, causality, emerging patterns in social networks is known as Association Rule miningIntroductionEarlier people used to communicate either verbally or non verbally.Non verbal communication takes place through writing letters in newspapers or making draft etc. This type of communication is somewhat limited and has some limitations.

Nonverbal communication was not as widespread as it used to be. Internet was also called network of internets. This allowed for nonverbal communication to be more widely used. It is now a prominent part of almost every sector in the world.

Internet’s main benefit is its ability to quickly communicate and transfer information via multiple channels. Over time, the need to gather information for sharing, contributing and impact grew. This eventually gave rise to the desire to quickly collect, analyze and channelize large data sets in a systematic manner. The people involved in knowledge society have become familiar with data creation, data storage, retrieval, retrieving, and presentation. In the end

Internet isn’t just a way to learn, it is also used for communication. Internet is used by millions to communicate their thoughts and share information. Social Networking Sites (or blogs) are used by most people to connect with others and share information. The rise of social networking has been remarkable in the world. There are many social networking sites, such as Facebook and Twitter. Facebook had over 1.44 billion users in 2015

Social Sites have seen a dramatic increase in popularity. Twitter, for instance, has become a popular social networking site because of its unique features such as tweets that are short text messages. These tweets are quicker and can be used for collecting various information. Millions of tweets are generated every day and are used to help make decisions.

A Social Network is essentially a group of people connected through interpersonal relationships. Social network data is the information generated by people using social media. These user-generated data allow you to look at many assets of the socializing group when they are analyzed. Social Network Analysis is a method to achieve this. Social network analysis (SNA) is a way to map and measure relationships. SNA is a key tool for displaying the various assets of the socializing group.

Data Mining Many data points from different social networks sites are stored in files or other repositories. This helps us analyze and interpret the data and gives us lots of interesting information that can be used to make further decisions. Data mining is also known as Knowledge Discovery Process [4]. It’s the process of discovering the hidden insights in data by analyzing it from different perspectives. These patterns are found in large data sets. The data is taken from the dataset and remolded.

Association Rule MiningAn important task of data mining that helps to discover associations, correlations and causality in social networks, is called Association Rule Mining.

Frequent Items Sets Mining is another important mining technique. It plays an integral part in many Data Mining tasks.

Frequent item set mining is a key part of many Data Mining tasks. These tasks aim to find interesting patterns in databases like sequences, classifiers, classesifiers, clusters, association rules, and correlations. One of the most difficult problems in all of these is mining association rules. Data Mining’s most primitive task is to recognize sets of items, products or manifestations that are often present together in a single database.

As an example, the association rule bread->sandwich would indicate that customers are more likely buy sandwich if they buy potatoes and bread together. Sandwiches are confidence and bread and potatoes are support. This knowledge can also be used to help make decisions. Imagine a social network that shares user-generated texts (e.g. You can use discussion forums, blogs, or other means to share user-generated text documents. It would make sense to identify the most common words used in discourse on a given topic. Also, it would be beneficial to determine which words are frequently used together. Example: In a discussion on ‘American Election,’ the frequent use o ‘Economy’ demonstrates that economy is the main aspect of bureaucratic life.

A frequent item set with count one can be used to indicate a central topic. A frequent item set with count or length 2 can also indicate the importance of other factors. A frequent items set mining algorithm is able to display the central topic in a discussion and the patterns of word usage within discussion threads and blogs. Due to the increasing amount of data from social networks, it has been difficult to analyze this data on one machine. Apriori algorithm [6], one of the most commonly used methods to extract frequent itemsets from transactional databases, isn’t efficient at handling this growing data. MapReduce framework [7], an algorithm for cloud computing, can be used to solve this problem.

HadoopHadoop provides the open-source platform Apache v2 that offers both analytical capabilities and the computational power to handle large amounts data. Hadoop framework has been designed so that users can store and process big data on distributed computers.

It has the ability to manage thousands from one server. It splits the data into manageable pieces, copies them and distributes them across all cluster nodes. This allows one to get their data processed quickly. The Apache Hadoop software library does not rely on hardware to provide high availability. Instead, it breaks down data into manageable chunks and replicates them across multiple cluster nodes. This ensures that each user can access their data quickly and reliably.

Literature SurveyMethods to discover relationships between variables within large databases is known as Association Rule Mining. Rakesh Angrawal created it in order to verify the regularity among products in large-scale transactions using point-of scale (POS) systems. This was built on the Association rule.

Sandwiches can be described as bread, tomatoes, and mayonnaise. According to different sales data, supermarkets might offer a sandwich option if customers buy mayonnaise and tomato together. This data is useful for making decisions.

T. Karthikeyan & N. Ravikumar conclude their paper after reviewing the data and observing. They suggested that the algorithm could have been improved to decrease execution time, reduce complexity, and increase accuracy. The authors concluded that a more efficient algorithm is required with reduced I/O by reducing database scanning as part of the association rules mining process.

This paper provides a theoretical overview on several existing algorithms for association rule-mining. This paper begins with a brief overview of research and then proceeds to the core concept.

This paper will provide a theoretical overview of some algorithms for association rule mining. This paper discusses the pros and disadvantages of this method and draws an inference.

Ramakrishnan Srikant & Rakesh agrawal suggested the seed set concept for creating large item sets. These were referred to as candidate itemsets. These counted actual support until no new large items were found. These algorithms were called AprioriTid and AprioriTid. They are used to find the associations rules between items in large sales transaction databases.

J. Han, J. Pei and Y. Yi created a method of FP-tree-based miner called FP-growth to extract recurring patterns that were based on fragment-growth concept. Three aspects were addressed to the problem: Firstly, the data structure known as FP tree. This means that only recurring length items in the tree will be represented in the tree. Secondly, they created an FP tree based pattern. This pattern surveyed the conditional base of the tree and then built its FP tree. They then performed periodic mining with such a tree. A divide and conquer technique was also used, instead of a bottom-up search method.

S. Cong and J. Han developed a new strategy to mine frequent itemsets using terabyte-scale cluster system data. They focused on the idea that a sampling-based framework can be used for parallel data mining.

The algorithm included all the ideas of data mining. The algorithm also considered the processor’s performance and memory hierarchy. The algorithm developed was the fastest sequential algorithm that could be extended in parallel and used all resources available to it.

P. V. Sander & W. Fang & K. K. Lau created a new data mining narration that utilized the new-generation GPUs (Graphic Processing Units) called GPUMiner. The massively multi-threaded SIMD architecture (Single Instructions, Multiple-Data), was the basis of the system. GPU miner was composed of three components. They included buffer manager and CPU storage. These stored data and I/O transfer were handled between the Graphical Processing Unit.

Two FP Tree based techniques: a lockless data tiling parallelization and a FP conscious FP-array were suggested in “Optimization recurring Itemset Mining On Multiple-Core Processors”.

C. Aykanat and E. Ozkural developed a top-down method to divide the recurring itemet mining task. A vertex separator separates the graph so that each item can be mined and distributed independently. This scheme was the basis for two new mining algorithms. These algorithms create the items that correspond with the separator. These algorithms recreate the work using one algorithm and compute the other.

The association rules used the MapReduce mode-based algorithm that was studied. Algorithm performance can be hampered by the limited resources of CPU and memory. S. Ghemawat & J. Dean discusses the enhanced Apriori Algorithm, which can handle large datasets using a lot of nodes on Hadoop platform. It can also be used to study problems that are larger and more complex.

Jongwook Woo & Yuhang Xu presented a Market Basket Analysis pair algorithm (key,value), which can be executed via Map/Reduce.

This algorithm used the joining technique to produce paired objects. To avoid errors, the transaction must be sorted alphabetically.

Nick Cercone, Zahra Farzanyar proposed a new method for miming recurring itemsets using Map/Reduce framework. [20] This framework was then used in Social Network Data. The improved Map/Reduced Apriori Algorithm decreases the number of. It reduces the number and processing time of partial recurring itemets.

Nirupma Tatvari and Anutha Shaharam presented a survey on Association Rule Mining. It used a genetic algorithm. Extremely robust mining of the Association rules was achieved. The technique used to mine the Association rules of GAs was extremely robust. It was necessary to make major modifications in order to reduce the complexity of the distributed computing algorithms.

K.P. Hanirex presented the paper. Kaliyamurthie made it easier to identify recurring itemsets by using a genetic algorithm. The population is initially created from random transactions. This algorithm then transforms the population continuously by performing the steps fitness evaluation, replacement and selection.

Gaurav dubey and Arvind Jaiswal had proposed the best algorithm for optimizing association rules. This is where the population is constantly transformed by following the steps: First, fitness evaluation is done to determine how fit each person is. Then selection is made to select one from the current population. Next, generic operators are used to create new offspring (individuals). Finally, replacement is done with another individual, often their parents.

Mohit K.Gupta and Geeta Sukka worked on Multiobjective GeneticAlgorithm. This is used to automate the extraction large datasets from the Association Rule. Multiple quality indicators were used to optimize the paper’s output, including interest, confidence, support, and comprehension.

Reza Sheibani, Amir Ebrahimzadeh proposed improved cluster-based association rules (ICBAR). This mining algorithm is capable of exploring large itemets. It also reduces the number of candidates. The data can also be compared with partial cluster tables.

ObjectivesTo build a network between the details from Social Network Data.

To create a parallel programming model that can be used with MapReduce

Apriori Algorithm uses MapReduce Framework for finding recurring itemsets. These are generated from Social Network Data and Genetic Algorithm.

Conclusion and future scope Social networking is growing at an incredible rate. These sites contain a lot of data. This makes data mining very useful. The Developed System is quick because it supports parallel processing. The EAMRGA Algorithm helps to locate association rules. Genetic Algorithm is used for optimization to locate relevant and optimized associations rules. Experimental work showed that the algorithm’s efficiency was 39% higher and the rule’s accuracy was 25%. Future work

The data we will have to deal with will range from terabytes in size. We need to be able to handle this amount of data quickly and efficiently. You can use either a parallel or hierarchical approach to this job. This can be done with large-scale Hadoop feature usage.

Author

melissarauch

I'm a 25-year-old middle school teacher and blogger. I blog about education, parenting, and life in general. I'm also a mom of two young children, so I write about topics that are important to me.

View all posts

Internet Is A Larger Space In The Life Of Mankind

Author

Related Posts

The International Mobile Subscriber Identity(imsi)

Understanding Today’s Technology Through Social Imagination