- Written by Fajar Purnama
- Category: Uncategorised
- Hits: 20
Table of Contents
- Digital Currency
- Bitcoin to Other Coins
- Correlation to Our Lives
Bitcoin The First Cryptocurrency
The easiest start to understand cryptocurrency is to understand Bitcoin. Bitcoin is the first cryptocurrency created by Satoshi Nakamoto in 2008. The ideal concept of Bitcoin is to have the properties of openess, borderless, censorship resistance, unconfiscatibility, distortion resistance, manipulation resistance, distribution, decentralization, and pseudo anonymous. Disclaimer that the mentioned concept is idealized and in reality may not be perfect. Also in this book contains no technical explanation about cryptography and other technologies behind cryptocurrency because this book is inteded for users only. Instead, only illustrations or parables are provided and may not be fully accurate.
Maybe you have heard about the latest Wirecard issue where one billion dollars was lost, or previous scandals such as seven billion dollars accounting error by Worldcom, Enron's hidden debt, or even the Charles Ponzi scheme back in the old days. If not, most probably you have heard of financial corruptions happening in your country or local area. I still remembered my teen days in Indonesia where there was Century Bank scandal. Nobody knows where the money went and the rich people who put their money their lost all their savings. I remember seeing the news that once a rich woman must know work as a laborer at any construction sites. Who knows if there were any that have to work as a maid or as a slave after losing their savings. I firmly believe in the absolute energy theory where energy does not disappear but transferred out. Obviously, the money did not disappear, someone must have taken that money. With blockchain technology, transactions can be securely recorded in details, preventing these kinds of scandals.
Open and Transparent
Again Bitcoin is P2P where for as long as there is a peer node nearby, you can connect to the network even if Internet is censored. Authorities can always try blocking every nodes but good luck in blocking emerging nodes daily. If you have used any Bitcoin wallet, you probably wondered why they give many warnings of not to make mistake in inputing the receiver's address. That is because the transaction is irreversible not only to prevent distortion and manipulation but to prevent censorship. If a transaction is reversible, authorities can easily demand to reverse your transaction if they do not like it.
Bitcoin to Other Coins
Bitcoin maximalist may say that other than Bitcoin are scam coins, only Bitcoin is the truth, but in my opinion, that thinking will only cover one of the beauty of Bitcoin. The beauty of Bitcoin is that it is open source where anyone can reuse the code and modify. If anyone wants to build something different or just does not agree with some function of Bitcoin, then they can freely create another coin and take a different path instead of fighting to change Bitcoin which isn't that the same as war?. Then let the people choose which coins they prefer. The freedom to choose is one of the beautiful contribution of Bitcoin.
There are many other coins like there are many companies out there where you need a whole team to research them all. You can legitimately get rich buy investing into altcoin because the concept is the same that you invest in good things before anybody knows. For example I bought $70 worth Statera when I saw there post and read that they are a deflationary coin on defi and when I saw that price was still steady, I estimate that they are still early and finally my Statera once worth over $200 and sold $70 to return my capital and now I'm in profit. Also altcoins are the most dangerous investment I know because new stuffs have a high risk of not surviving for example I bought almost $100 of Inmax as a random gamble and for years they have not launch their exchange and $100 plummelled to $1 meaning that I completely lost the gamble. Also, beware of scams that anyone today can make their tokens, they could even name them Bitcoin for example if you buy these named Bitcoin token it cannot be used on the Bitcoin network because they are not the same. Therefore, always do your research first.
Government and Other Private Blockchain
The governments, banks, and companies said they are interested in implementing blockchain. You often heard they said yes to blockchain but no to Bitcoin or blockchain has value but Bitcoin does not have. What do they mean? They like the blockchain and the distributed system but they do not like the decentralization, openess, censorship resistance, unconfiscatibility, and privacy. They are in control of the global financial system, implementing Bitcoin means the same thing as giving up control such as their ability to print currency, their ability to distribute to whomever they want, and their ability to enforce monetary policies.
Correlation to Our Lives
Most of us are probably born with fiat currency or public simple term cash accepted as money which is a tool to communicate value. Simply with money you can buy anything and most of us believe that money is our primary necessity which is not true. Money is just a tool, it is what we can get with money is our true necessity. If you cannot understand that, then you lack history lesson or logic. Ask yourself a question, does money always exist in the past? The answer is no. If you go back in time and give people dollars, they would think you are crazy. Why would they give you stuffs for a piece of paper?
The oldest form of trading is barter. I need water and you need food so I trade some of my food with you to get some water. However, barter have scaling, practibility, and divisibility problem. I have food but not everybody needs much food, you have clean water but not everybody need much water, and someone have clothes but not everybody need much clothes. I need clothes and found someone who has that also needs food. I have to negotiate how much food to give and how much clothes that person will give. Very impracticle and people began to demand a single unit that can measure the value of every item or an item called money that can buy anything.
People began experimenting with salt, sugar, crops, shells, and other commodities as currencies but only one type was admitted through out history and that is precious metals. Mainly gold and silver have the property of immutibility which means no one today can create gold, you have to mine gold. This means today that gold is scarce which is known to have limited supply. The property of gold also cannot deteriorate which the form of gold you have now will remain the same almost forever which indicates a good commodity to store. Gold is divisible where items can be valued in weight of gold for example a meal is worth a few miligrams (mg)s of gold. People began to create gold coins that makes trading much more practical then before.
In my opinion, for average people, gold was doing well as a currency but gold was not practical enough to be used on nation scale for example it is very heavy to carry for massive transaction not forgetting to mention costly as well and risk of being raided or anything that can lose the physical golds. Dividing gold is still not easy for regular people where you need smithing which means that there is a limit to the divisibility. Say that I carry a few mg of gold but I only want to buy one candy, usually I cannot but have to purchase many candies or other items.
This is where paper money comes in. Instead of carrying heavy gold, we trust banks to store our gold and receive a certificate or a kupon where each of them represents an amount of gold. That is the good dollar I knew, where each dollar can be exchanged to fixed certain amount of gold. Paper money are easier to carry and easily divisible and vice versa. It is also practical enough to be used as a medium of exchange on a nation scale. Then comes banking, the digital age, and online transactions and you know the rest.
Before proceeding, let us take a look at some other currency debasement history. Most of the information I got from Guide To Investing in Gold & Silver by Michael Maloney and I strongly recommend watching his Hidden Secrets of Money Episode.
This book does not emphasize cryptocurrency as a solution to financial value. You can find gold bugs in agreement with Bitcoin activists regarding the problem with the current financial system but does not agree with Bitcoin especially other cryptocurrencies mainly because they do not have physical form and many other reasons. Like this book stated previously that if the problem is only financial value, there are other solutions which proof historically effective. Eventhough these years Bitcoin and some cryptocurrencies have the best performance, regular people cannot handle the short term volatility. However, before the purpose of tackling financial value, Bitcoin and other cryptocurrencies are created for a larger purpose.
The last thing that they will try to do is to force control the price of the market to their fiat currency. This happened a few years later in the United States after President Nixon no longer supports the convertibility of the Dollar to gold. The worst was near the end of Roman Empire where they released a law that citizens are forced to work and continue family business but at a controlled price punishable with Death.
With Bitcoin and other cryptocurrency, regulatorily can be banned but technically cannot be stopped or censored, technically cannot be confiscated where the only way is to persuade, pressure, or social engineer the owners to hand over themselves, both the supply function and distribution are algorithmicly and mathematically defined which ideally is neutral and not controlled by any single entity, and also most are open and transparent.
- Written by Fajar Purnama
- Category: Uncategorised
- Hits: 30
Table of Contents
- Enter With a Wallet
- Custodial Wallet
- Semi-Custodial Wallet
- Non-Custodial Wallet
- Hot Wallet
- Cold Wallet
- Getting Your First Coin
Enter With a Wallet
Backup and Secure Seed Phrase and/or Private Keys
Sending and Receiving Coins
The necessary functions of using a wallet have been discussed. Explore other function yourself. I may cover interesting functions in separate articles.
Other Hot Wallets Types
Getting Your First Coin
Now that you have medium to receive your coins whether through custodial wallet, non-custodial wallet, hot wallet, cold wallet, or multiple of them, you are ready to fill them with coins. There are only three ways to get coins which are transacting with someone, mining, and creating one by becoming a developer who invent your own coin. In this section is recommended for new users get coins directly from someone. If you want to get them using your bank account, skip to the next chapter, if you want to mine, skip to later chapter, and if you want to become a developer skip to the next book. Finally the last message in this section is to get yourself some cryptocurrency in order to participate in the ecosystem because without them, your options are limited. Start with something small or more accurately, an amount you are willing to lose or an amount you are comfortable with because if you go in big out of greed expecting to get rich quick, your mentality may not be able to handle it because the market is very volatile where even if in the future the price will sky rocket but it may drop more than half first before that happens.
Buy From Someone Trustful
Non-Custodial Credit Card Service
- Written by Fajar Purnama
- Category: Uncategorised
- Hits: 43
- This is a thesis submitted to Graduate School of Science and Technology, Computer Science and Electrical Engineering in Kumamoto University, Japan, on September 2017 in partial fulfillment of the requirements for the degree of Master of Engineering but was not published thus the copyright remained with me "Fajar Purnama" the main author where I have the authority to repost anywhere and I claimed full responsibility detached from Kumamoto University. Therefore, I hereby declare to license it as customized CC-BY-SA where you are also allowed to sell my contents but with a condition that you must mention that the free and open version is available here. In summary, the mention must contain the keyword "free" and "open" and the location such as the link to this content.
- The presentation is available at Slide Share.
- The source code is available at Github.
Below are the publications reused in this thesis that does not require copyright clearance:
- Hand Carry Data Collecting Through Questionnaire and Quiz Alike Using Mini-computer Raspberry Pi 
Below are the publications reused in this thesis that requires copyright clearance:
The continuous advance of electronics and information communication technologies (ICT) have influenced every aspects greatly, on this thesis is discussed on education aspect. Electronics and ICTs have been incorporated into the learning and teaching process, giving birth to electronic learning (e-learning). Inside, there is a well known term called online course where the essence is being able to deliver courses distantly with flexibility in place and time. However a simple condition must be met in order to implement online course, and that is the sufficiency of ICT infrastructure. Unfortunately not all regions met this condition, limiting the accessibility of online course. Other than improving the ICT infrastructure, distributed learning management system (LMS) was proposed as alternative, but the next issue was the maintenance or synchronization, which in this case is keeping the learning contents up to date. There are two problems highlighted in this thesis which are unable to perform synchronization in severe network connectivity region, and duplicate data transfer during synchronization.
To overcome the synchronization in severe network connectivity region the solution is utilizing hand carry servers. By implementing hand carry servers on distributed LMS will grant mobility to the servers of distributed LMS. The concept proposed was having the hand carry server to physically seek network connectivity to perform online synchronization, and afterwards returns to its original location. The hand carry server was proved to be portable due to its small size, light weight, and also power consumption where a power bank is enough to supply for a whole day. Although it has resource limitations in terms of computer processing unit and random access memory which limits its performance.
To overcome duplicate data transfer during synchronization incremental synchronization was utilized instead of full synchronization. Also on this thesis introduced a new approach called dump and upload based sychronization which was to overcome the obstacles of different LMSs and LMS versions faced by dynamic content sychronization.
Table of Contents
- Portable Distributed LMS
Distributed LMS Synchronization
- Learning Content Sharing
- Full Synchronization versus Incremental Synchronization
- Dump and Upload Based Synchronization
- Conclusion and Future Work
List of Figures
- In Chapter 1:
- Illustration of e-learning showing many electronic devices to beused (images from openclipart .
- Illustration of the difference between conventional course and online course. While conventional course is restricted by place andtime, online course can be anywhere and anytime (images fromopenclipart .
- In Chapter 2:
- Illustration of main benefit of distributed system using ICT penetration map of Indonesia in 2012, where more green regions showed good network connectivity and more red regions showed the opposite. (a) People on regions with more red colored will have difficulty in accessing the central server. (b) On the other hand peoplewill have not difficulty in accessing if there are servers on their local regions.
- Illustration of using hand carry computer device to gather informa-tions from other users inputed from their own computer device .
- Time consumption of survey process from preparation, responding,to post survey . (a) For paper based method the preparation consists of question typing and question printing, responding consists of question distribution, question answering, and responsecollection, and Post Survey consists of response insertion. (b) Forhand carry server method the preparation consists of question typing with web delays, responding consists of server connection, question answering with web delay, and the advantage of this method isno need for post survey which the response already automatically inserted.
- Data in form of bar graph and pie chart was shown the instancethe hand carry server received the responses . Only 4/30 item result shown here since it is too much to show all.
- Illustration of moving hand carry servers where they have to move to a location with network connectivity to synchronize with main server, and return to original location after finishing .
- Implementation illustration of hand carry server on distributed LMS in Indonesia. (a) Servers on more red areas have difficulty on their network connectivity. (b) Replacing those servers with hand carry servers renders them to be physically mobile and able to search for network connectivity.
- Resource usage during survey attempted by 30 users showing mostly over 80% of CPU usage and around 700MB of RAM usage .
- Stress testing illustration using Funkload software application that generates up to 100 virtual users to stress the hand carry server(images from openclipart ).
- Stress testing showing increasing response time to increasing number of virtual users and increasing number of questionnaire items, (a) average response time while (b) maximum response time.
- In Chapter 3:
- Illustration of full synchronization of learning contents in courses. Initial stage is learning content sharing where 100 mega bytes (MB) of course is shared. Next stage is update where there is 800MB of new data but whole 900MB is transfered which 100MB is aduplicate data. On next update there is 100MB of new data but whole 1GB is transfered which 900MB is duplicate data.
- Incremental synchronization different from Figure 3.1 where the duplicate data are filtered.
- Dynamic content synchronization model for Moodle . The course packer converts both Moodle tables into synchronization tables. Then the synchronizer checks for inconsistency between the two tables which in the end applies the difference between both synchronization table to the slaves synchronization table. Finally the synchronization table is reconverted into Moodle table and that is how it is synchronized.
- The dump and upload based synchronization model. Both servers’ LMS will dump/export the desired learning contents (in this case packed into a course) into archives/files. The synchronizer will perform differential synchronization between the two archives. After synchronization the archives will be imported/uploaded into the servers’ LMS, updating the learning contents.
- Screenshot of Moodle’s export feature, (a) showed options like include accounts, and (b) showed learning contents to choose to export.
- First step is to generate a signature of archive on slave and send to master. The signature of is used on master’s archive to generate delta/patch or can be called the difference and have it sent to slave. Slave will apply/use that delta/patch on its archive and producean archive identical to the one on master.
- Assume two archives where the outdated archive on slave have only second topic, and latest archive on master have all three topics. Here for example outdated archive is divided into three blocks, andthree sets of checksums are obtained and bundled into a signature. The signauture is then sent to master.
- Illustration of identifying difference. (a) The three sets of check-sums are compared in rolling with other blocks on new archive. Identical blocks to the first and second sets of checksums are found and the locations are recorded while no matching block is found for the third set of checksums which will be marked for delete. (b) The delta is generated on master containing instructions to rearrange identical blocks, delete unfounded blocks, and append newblocks, which will be send and applied on slave.
- After the delta/patch is applied, slave will have identical archive to master.
- Implementation of some download manager techniques into rsync algorithm based synchronization. Delta is split into pieces and retrieved by the client. The integrity of the pieces are checkedusing cheksum, here is MD5 and if inconsistent it will redownload those pieces. In the end the pieces are merged. This can also be implemented on uplink side when sending the signature.
- Test result showing the relationship between block size, signature, and delta. When the block size increases the signature size decreases, but the opposite for delta which it increases. The full file is the size of the file to be downloaded without using differential method, in other words using full synchronization. The transmission cost if using incremental synchronization is the sum of signature and delta which on this case is when the block size is 512 bytes when it is optimal.
- Network traffic generated based on the four scenarios of the experiment. Full sychronization generates the most network traffic shown in blue bars. The orange and yellow bar is network traffic of incremental synchronization depending on the size of contents to be updated which lower are generated compared to full synchronization. The green bars showed incremental synchronization execution when there is no update and the results are very low and tolerable.
List of Tables
- In Chapter 1:
- In Chapter 2:
- In Chapter 3:
- Size of course contents of the same course on different LMS, show-ing sizes when in contains one, two, and three topics.
- Detail experiment result of Figure 3.12 showing size of signature and delta during incremental synchronization scenarios on each LMSs.
- Experiment result of delta size compared to ideal size, and percentage of duplicate eliminated was formulated from these data.
Electronics and Information Communcation Technology (ICT) have made many tasks more convenient, including delivering education. It can be seen that many have incorporated electronics in their learning and teaching process. There are few examples such as teachers using laptops and projectors to present their materials, students browsing the Internet to search for informations, and both of them using emails, chats, or social networking service to communicate. These kind of things are agreed to be called electronic learning (e-learning) which can be illustrated on Figure 1.1
Though, this thesis will not discuss widely on e-learning, but a category which is part of e-learning called online course. It uses electronic ICT devices where information exchange can be done remotely. Information can be delivered through electrical signal in high speed on the network, preferably on the Internet, and computer devices as end devices or as transmitters and receivers. Simply computer devices connected to the Internet are all that are needed to participate in online course from anywhere at anytime illustrated on Figure 1.2.
Online course is now being highlighted by many parties, seeing them as one solution to the unevenly distribution of education. Straighfowardly not everyone have access to good quality education, furthermore there are also those who does not have access, and by using online course people can receive education without going to school. Knowing this, our peers tried to implement online course in their Universities, one in Indonesia  and the other one in Myanmar . Another peer already have online course well built in Mongolia and now moving to massive open online course (MOOC) . Unlike private online course only for students in Universities, MOOC is open for anyone indiscriminately. In the United States (US) MOOC is also being used to scout for potential students. For example Massachusetts Institute of Technology (MIT) found a genius Mongolian highschool student who perfectly ace its Circuits and Electronics MOOC, then took him as a freshmen student . In summary many people saw bright future in utilizing online course in education.
With all the benefits of online course, there are still problems preventing many people from enjoying it. The problem was the lack of accessability to online course due to insufficient ICT infrastructure. In other words there are people who are having network connectivity issue especially in developing countries. On random survey by Kusumo et al.  on students in Indonesia, 60% of them agreed that Internet connection is still problematic. The survey by Monmon et al.  of e-readiness on Yangon Technological University and Mandalay Technological University in Myanmmar showed lower Likert scale scores on the students' and teachers' perception on ICT network compared to other items. Today the world Internet penetration is still around 50% indicating that only half of the world's population can access online course . Eventhough these people have access, their access quality may still be questionable which can lead to disatisfaction in accessing online course.
The obvious solution to accessibility issue is to improve the ICT infrastruture, however this takes a long time. Therefore another method was implemented, which is implementation of distributed system rather than centralized system. The concept is to have the people to access the service on their local area that is distantly closer than on the central area that is distantly further. In some references, it is stated as the third generation of content management system (CMS) , thought on this work is more about learning contents of Learning management system (LMS) than general contents of CMS.
With distributed LMS as the solution to the lack of accessability of online course, it is the next problem which is discussed on this thesis. The problem is the synchronization which is to keep the learning contents up to date. This can also be said as the maintenance of the learning contents. Specifically there are two problems highlighted on this thesis as follow:
- The lack of network connectivity for synchronization. Usually synchronization are set to be done online where the servers synchronizes with another in order to keep the learning contents to its latest version. If this was the case then synchronization is not possible on no network connectivity condition.
- Duplicate data transfer during synchronization. In default full synchronization is used, where the learning contents is usually in bundle of courses. Commonly when the contents of the course is revised on LMS, the whole contents of course is distributed to other servers including previously distributed contents (duplicate data). In this case, there will be many redundant data which will add more burdens to the network.
This thesis provides two main solutions for the two problems:
- For the first problem of no network connectivity, the solution is to provide portability function to distributed LMS. Straightforwardly enabling the servers to move to other locations where there is network connectivity to synchronize, and to return to its original location after finish synchronizing.
- For the second problem of duplicate data is to utilize incremental synchronization through continuous differential synchronization technique. The new contents are to be identified before synchronization and only the new contents are distributed, leaving out the redundant data.
Detail significances are discussed in further sections, but in general can be mentioned as follow:
- Possibility of flexible synchronization in severe network connectivity region by mobilizing the servers of distributed LMS. It can also be pictured as widening the network coverage.
- Lower network cost can be achieved from incremental synchronization.
The objective of this research is to enable online synchronization of distributed LMS in almost no network connectivity region and reduce redundant data transfer during synchronization.
- Introduced a novel concept of integrating hand carry server to distributed LMS which makes it mobile or portable resulting in able to perform synchronization in regions with severe network . This thesis also demonstrated the portability of hand carry servers' through conducting survey simulation and on the other hand also showed its limitation through stress testing .
- Though the novelty of incremental synchronization in distributed LMS was already claimed , this thesis showed a different approach call dump and upload based synchronization  which the advantages of its single software application is compatible to most LMS and benefits the feature of that LMS, for example its privacy and security feature which automatically makes the synchronization private and secure, and on Moodle possibility of partial synchronization due to micronization of course contents into blocks. Another advantage is this approach supports bidirectional synchronization.
Each method may have limitations which is discussed in detail on each of their respective sections, but here is mentioned the general limitation of this research:
- The system is only experimented in laboratory and not yet implemented in real running online courses. The experiment is done on the author's virtual machines, laboratory's local area network (LAN), and free public clouds owned by the author.
- Only one hand carry server was used in actual experiment and the expansion discussed of using more the one of it is still a concept derived from the experiment.
- This thesis' dump and upload based incremental synchronization is novel in its concept but not in its software application since it only make use of existing software applications. They are the export and import feature in LMS to dump the learning contents and rdiff application based on rsync to identify the difference between dumps.
- The course experimented on is the authors self created course which was never delivered, in short it is not an actual running course.
1.8 Structure of the thesis
Beyond this section the thesis contains three more chapters:
- Chapter 2 discussed about portable distributed LMS which in order gives brief introduction to distributed LMS, afterward is the author's work in showing the convenience of hand carry server , the concept of hand carry server in distributed LMS , and laslty the hand carry server's limitations.
- Chapter 3 discussed about incremental data synchronization which in order the story of sharing learning contents, distinguishing full synchronization to differential and incremental synchronization, discussion of the previous work of dynamic content synchronization  versus the author's work of dump and upload based synchronization , and finally experiments and results showing the percentage of duplicate data eliminated on incremental synchronization.
- Chapter 4 is the conclusion of this thesis that also discussed the future work.
2 Portable Distributed LMS
2.1 Distributed Systems
2.1.1 Partitioned System
Distributed systems can be a wide discussion with different implementation . One implementation can be as partitioned system. For example, an organization's network can have their servers separated, where the database, directory, domain name service (DNS), dynamic host configuration protocol (DHCP), file, web, and each other servers on separated machine. They are integrated but independent where if one service (server) is damage, will not damage other services. A different example is data partitioning where data are fragmented that when retrieving data, they have to be gathered and merged. This usually happens in collaboration where people are working on the same project but from different machines.
2.1.2 Replicated System
Another implementation can be as replicated system, and this is the one that is referred or used on this thesis. The urgency for replicated system can be due to bottleneck traffic or geographically severe network connectivity, or both. One of the most popular implementation is search engine like Google and Yahoo where they have different server locations assigned with local domains for example .co.jp for Japan, .co.id for Indonesia, and etc. Not as well known as search engines are online multiplayer games. The servers of online multiplayer games can reside on many regions such as Asia, Europe, United States, China, etc. There are games that shows the number of population on each servers indicating whether it is full or not. Players can choose other servers when a server reached the population limit or when players cannot actually reach the server on that region.
2.2 Distributed Learning Management System
One definition of LMS is a system that manages the learning and teaching specifically for online case. The current form of LMS today is a software application. It is not just delivering learning materials to students but online computerize any activities that can happen in a class. Some activities are interractions whether by chat applications or forums like on social networking service (SNS), assignments where this time is submitted electronically through LMS by uploading their files, and quizzes or examinations which can be automatically or manually graded. Not to forget that it can be accessed from anywhere at anytime, and computers are used which can perform much faster and automatic tasks than humans, makes it possible for unique applications, data minings, and learning analytics. In short new features are being developed everyday. Today exists many LMS as on Table 2.1 whether they are open source (free to use, modify, with all the codes open), only available on clouds or software as a service (SAAS) which tends to be freeware/usage only, or proprietary which tends to be business/commercial/paid. On the author's surroundings mostly Moodle is used.
|Open Source||aTutor, Canvas, Chamilo, Claroline, eFront, ILIAS, LAMS, LON-CAPA, Moodle, OLAT, OpenOLAT, Sakai, SWAD, Totara LMS, WeBWorK|
|SAAS/Cloud||Cornerstone OnDemand Inc, Docebo LMS, Google Classroom, Grovo, Halogen Software, Informetica, Inquisiq R3, Kannu, Latitude Learning, Litmos, Talent LMS, Paradiso LMS, TOPYX, TrainCaster LMS,WizIQ LinkStreet|
|Proprietary||Blackboard Learning System, CERTPOINT Systems Inc, Desire2Learn,eCollege, Edmodo, Engrade, WizIQ, GlobalScholar, Glow, HotChalk,Informetica, ITWorx CLG, JoomlaLMS, Kannu, Latitude Learning LLC,Uzity, SAP, Schoology, SSLearn, Spongelab, Skillsoft, EduNxt,SuccessFactors, SumTotal Systems, Taleo, Teachable, Vitalect|
The term distributed LMS means that the replicated servers contains LMS. Each servers are meant to service online course. The implementation can be a full replication where not only learning contents but everything else including activities, assessments, and interractions are synchronized. This means students and teachers can freely use any servers recommended to the one with best network connectivity. The other implementation is partial replication where only non-private data are synchronized, usually only the learning contents. This can happen when there are jurisdictions where each regions are to be handled locally. In other words contents are provided but each schools and universities are still the owner of their own servers and asserts local authorities. Either way distributed system is the solution for bottleneck and connectivity issue. As an illustration on Figure 2.1 in Indonesia, it is better to build and spread more servers compared to have a centralized server in the capital city.
2.3 Hand Carry Server in Distributed LMS
After the establishment of distributed LMS, the contents needs to be maintained or to be kept up to date through synchronization. However the problem is the lack of network connectivity between servers usually found in deeper areas such as schools in villages. It may be easy to build a LAN but difficult to build connections to other servers or simply an Internet connection on distant places. In a short time it is only possible to build a very limited connection (very low speed) which retrieval of contents may seem to take forever if it is very large. The metaphor is building a server in a jungle, a remote island, or a desert, which are very isolated. The default solution is offline synchronization or the author's solution server mobilization .
2.3.1 Portability of Hand Carry Server
Before discussion of the synchronization, this section would like to introduce hand carry servers. On this thesis it is called hand carry server because the physical hardware is a computer with the size of a regular human hand that has been configured into a server. It is called a mini, pocket size, or portable computer, one example on this thesis is used Raspberry Pi 2 with the specification on Table 2.2.
|A 900MHz quad-core ARM Cortex-A7 CPU|
|1 Giga Byte (GB) Random Access Memory (RAM)|
|4 Universal Serial Bus (USB) ports|
|40 General Purpose Input Output (GPIO) pins|
|Camera Serial Interface (CSI)|
|Display Serial Interface (DSI)|
|Micro Serial Digital (SD) card slot|
|Video Core IV 3D graphics cire|
|Size of 85.60 mm × 56.5 mm (3.370 in × 2.224 in), not including protruding connectors|
|Weight of 45g|
The portability was demonstrated on one of the author's previous work . It is less related to distributed system but it showed applications of hand carry server in manual labors which on that work is a simulation comparing between paper based method survey to hand carry server method survey. The motivation was the lack of Internet connection to perform online survey but most people owns a computer devices in developing countries   . Instead of reverting to paper based method, the participants' personal digital assistants (PDAs) can be utilized by connecting them to the hand carry server and perform a semi-online survey illustrated on Figure 2.2.
For the simulation a MOOC readiness survey . consist of 30 questionnaire items was simulated on 30 participants by a surveyor. The whole survey consists of three stages; preparation, responding, and post survey. On the preparation stage, for paper based method the surveyor creates the questionnaire items on word processing software then print them, while for hand carry server method the surveyor creates the questionnaire on web based survey application called Limesurvey CMS. On responding stage, for paper based method the surveyor hands out paper to each participants and collect them when they are finish responding, while for hand carry server method the surveyor tells the participants to connect their PDAs to the hand carry server and informs the URL of the local survey site, then waits until the participants submits their results to the hand carry server. Though results on Figure 2.3 showed no difference in time consumption for preparation and responding stage, paper based method tends to burden more on labors such as printing the questionnaires (time taken multiply greatly using old printers) and carrying heavy papers if there are alot of participants. On the other hand resource is the main issue for hand carry server which will be discussed on Limitation of Hand Carry Server section.
However the advantage was shown on the post survey stage where usually the surveyors have to input the responses into the database, not to forget to also handle human errors by verifications such as double checking which seems to be the most stressing and tiring proses of paper based method. It is different from hand carry server method where the responses are automatically processed, literally no post survey stage. In fact results/statistics are instantly visible which no manual method can outfast. The participants can see the current statistics the moment they submitted the responses as exampled on Figure 2.4.
The author's work mostly discussed the convenience of computerization but the important part is the mobility or portability . Back on Figure Figure 2.2., the hand carry server can be carried anywhere (a walking/moving server) which only needs a power supply of direct current (DC) 5V (volts) potential difference and 2A (amperes) electric current, usually a hand carry power bank is enough. On the simulation is also measured the current delivery was 0.6AH (ampere hour) in 39 minutes (whole duration of survey, see Figure Figure 2.3) meaning with the powerbank's specification of 20000AH it will last 20 hours. In short the hand carry server is low power cost that can last longer during mobile.
2.3.2 Synchronization in Severe Network Connection
Currently synchronization have to be to taken offline when there is no network connectivity whether they are full or incremental which will be discussed in next chapter. An administrator will go to network connected or directly to the updated server to retrieve the contents and store in a storage media such as compact disc (CD), and flash drive. Then travel back to the outdated server, insert the storage media and give the contents. There is a work by Ijtihadie et al.  for differential update where it was sent through email then differentially update the contents. It should be possible to put the differentials into a storage media which then to be inserted into the outdated server to update the contents.
Another way is to move the servers to an area with connectivity, have it update, and then return it to its original location . This was actually inspired by Ijtihadie et al.  where the students downloads the quiz on their mobile devices, answers them offline at their homes, and later finds an Internet connection to synchronize (automatically upload their answers). This concept was applied to this thesis' work where the process happens to the hand carry server instead of the mobile device. It is illustrated on Figure 2.5 with currently people carrying the servers. An example of implementation is on Figure 2.6. There are regions in Indonesia which does not have goot network connectivity rendering difficult to synchronize with other servers. If those servers are replaced with hand carry servers, then it can physically move to find network connectivity (it supports wired and wireless connection) to synchronize, and in the end return to its original location.
Within the distributed LMS, the servers can either be replaced with hand carry servers or leave them mounted and have hand carry servers as addition or support, meaning the hand carry servers will travel from servers to servers. It is temporary implementation when there are no network infrastructures built, since it is fast and simple to install, or it can serve as a purpose to cover network coverage holes where the hand carry server moves around these network uncovered area.
2.4 Limitation of Hand Carry Server
With the compressed size and light weight of hand carry server, it has resource limitation. The resources responsible for servicing are mainly computer processing unit (CPU) and random access memory (RAM) (detailed specification can be seen back on Table Table 2.2). As shown on Figure 2.7 the CPU and RAM are already exhausted when 30 participants attempts the survey . These measurement result alone may not show much meaning, but can be meaningful if stress testing is conducted as on next subsection.
2.4.2 Stress Testing
Experience users may completely understand by just showing the resource measurement results, but others will have to feel, rub, and take few trials to see how far this hand carry server is actually capable. For that reason, stress testing was proposed and conducted. Though it was tested for survey purpose , but the method can be applicable for other applications. For the stress testing, a web stress testing software application called Funkload was used. Different numbers of virtual users incrementally 10 up to 100 was generated and attempts survey on the hand carry server simultaneously Illustrated on Figure 2.8. This time only response time was measured.
Response time can be refered to service time, in this case how much users takes to load questionnaire items and to submit responses. The service time can also be called queuing time where there are users who takes shorter time and users who takes longer time as on Figure 2.9 are shown the average response time and the maximum response time (the user on the last queue). It shows that the response time increases to the number users and also increases when the questionnaire content size increases because it will affect on the number of questionnaire items to be retrieved and how much responses that have to be submitted. Through this results, the surveryor can decide the target average response time and tolerable maximum response time. Then the number of users and questionnaire items simultaneously can be determined. Though the result also showed that the hand carry server have reached its limit above 85 concurrent users and 30 questionnaire items which the service stops working and must be restarted.
3 Distributed LMS Synchronization
3.1 Learning Content Sharing
Before going to the main discussion of synchronization, it is better to discuss about learning content sharing. Sharing learning contents became popular ever since MOOC was introduced. A course "Moodle on MOOC" conducted periodically teaches students how to use Moodle and advised them to share their finished courses . Making a well designed and written learning contents for online course from a scratch may consume a lot of time, learning content sharing helps other instructors to quickly develop their own. Some specialized courses may only be written by experts. Learning content sharing reduces the burden of the teacher to create learning contents for online courses, and the more the existence of online courses can give more students from all over the world a better chance to access a quality education.
Distributed LMS is also another form of learning content sharing where the learning contents are shared to other servers on other regions. The typical way of learning content sharing is dump, copy, then upload. Most LMS have a feature to export their course contents into an archive and allows to import the contents to another server which have the LMS. The technique to export and import varies to systems but the concept is to synchronize the directory structure and database. There is a very high demand for this feature that it is still improving until now, for example being able to export user defined part of the contents is being developed. Other LMS that currently does not have this feature will be developed as it is stated on its developer forum.
3.2 Full Synchronization versus Incremental Synchronization
3.2.1 Full Synchronization
Synchronization can be defined as similar movements between two or more systems which are temporally aligned, though on this case is the action of causing a set of data or files to remain identical in more than one location. The data or files are learning contents and private data, although private data are usually excluded. The term full synchronization defined on this thesis is the distribution of the whole data consists of new data and existed data. Synchronization occurs when new data are present to update the data of other servers. Illustrated on Figure 3.1 the full synchronization includes existed or duplicated data which deems to be redundant that only adds unnecessary burden to the network. However full synchronization are more reliable because each full data are available.
3.2.2 Incremental Synchronization
Ideally the duplicate data are to be filtered out and not to be distributed for highest efficiency. The conventional way is the recording approach where the changes done by the authors of the course are recorded. The changes can only and either be additions or deletions of certain locations. This actions are recorded and sent to other servers and have them execute the actions to achieve identical learning contents, which is similar to push mechanism where the main server forces updates on other servers. Accurate changes can be obtained but unrecoverable from error because the process is unrepeatable. Another issue is its restriction that no modification must take place on the learning contents of other servers, meaning the slightest change, corruption, or mutation can render the servers unsynchronizable.
Instead of the recording approach, the calculating approach is more popular due to its repeatable process and less restriction. The approach is to calculate the difference between the new and outdated learning contents. Therefore the process of the approach can be done repeatedly and some changes, corruption, or mutation on either learning contents does not prevent the synchronization. One of the origins of the calculating approach is file differential algorithm developed in Bell Laboratory  which today known as diff utility in Unix. The detailed algorithm may seem complicated, though in summary consists of extracting the common longest subsequence of characters in each line between the two files (more like finding the similarity between two files), afterwards the rest of the characters on the old file will be deleted while on the new file the characters will be added to the common longest subsequence on the correct location, resulting in update of the old file. For large files hashings were involved.
Applying the file differential algorithm on the synchronization will make it differential synchronization. Unlike full synchronization, differential synchronization is the distribution of only the new data. The repetition of differential synchronization will make it incremental synchronization which is the repetitive distribution of only the new data. In sense the synchronization will be incremental because only the updates are sent every time. Another way to put it, increment means to add up where the learning contents adds up to every differential updates. Ultimately duplicate data or learning contents will be filtered out, reducing unnecessary burdens on the network illustrated on Figure 3.2.
3.2.3 Dynamic Content Synchronization on Moodle
The idea of implementing differential synchronization on distributed LMS started by Usagawa et al. , which then continued by Ijtihadie et al.  . These works still limits themselves to distributed Moodle system because it solely focuses on Moodle structure. When writing the software application, it is necessary to identify the database tables and directories of the learning contents. The incremental synchronization between two Moodle systems was described as dynamic content synchronization  where the learning contents are constantly being updated. The dynamic synchronization is unidirectional or simplex in terms of communication model where it is fixed that one Moodle system acts as a master to distribute the updates and another one acts as a slave to receive the updates.
File differential algorithm was applied to maintain consistencies on both master's and slave's database tables and directories. The database tables and directories are assigned with hashes . Information of those hashes are exchanged between master and slave, identical hashes meaning thoses contents should not be change, and on the other hand mismatch hashes meaning those contents should be updated. Though Ijtihadie et al.  developed their own algorithm stated specifically for synchronization of learning contents between LMS, it is not much different from existing remote differential file synchronization algorithm such as .
The moodle tables on the database is converted into synchronization tables as on Figure 3.3 through means of hashing. Only contents related to the selected course was converted and sorted on the course packer. Privacy was highly regarded, thus private data was filtered. The purpose is to find inconsistencies on the database between master and slave. Stated on the previous paragraph, hashes are oftenly used to test inconsistencies, if the hashes are different then they are inconsistent and vice versa. When inconsistencies on a certain table is found, the master sends its table to the slave replacing the slave's table which in the end will become consistent. In the end the synchronization tables are reverted back into Moodle tables. In summary dynamic content synchronization only takes place on parts of the database and directories that changes or inconsistent.
3.3 Dump and Upload Based Synchronization
The dynamic content synchronization  software application was written solely for Moodle, and back then was written for Moodle version 1.9. Later on Moodle rises to version 2.0, with major changes on database and directory structure. The software application have to be changed to suit the new Moodle version , but the concept of synchronization remains the same. Moodle continues to develop, until now it is version 3.3, though sadly the dynamic content synchronization software application was discontinued on Moodle version 2.0. The author originally tried to continue the software application but found a better approach named dump and upload based synchronization model  on Figure 3.4. Unlike dynamic content synchronization, the dump and upload based synchronization is bidirectional but limited to half duplex communication model. In other words each can play as both master and slave, but only one at time. For example, on first synchronization one server can play as the master while others as slaves, and on second synchronization the master can switch into a slave and one of the slaves can switch into a master. Another thing is that the synchronization uses pull mechanism where the slave checks and requests updates to the master. It is considered more flexible than the push mechanism where the master forcefully update the slaves.
3.3.1 Export and Import Feature
While dynamic content synchronization handles everything from a scratch, the dump and upload based synchronization utilizes the export and import feature that exists in most LMS. It is a feature mainly to export and import learning contents categorized into courses which can also be called course contents. The export feature outputs the course content's database tables and directories into a structured format. Then the import feature reads the format and inserts the data into the correct database tables and directories. Formats may differ from one LMS to another but the method is most likely the same.
Other features are export and import of course lists, user accounts, and probably more others but not known and used on this thesis. One of the best export and import is on Moodle where further splitting is possible on the course contents while on other LMS have to dump the whole course. This way people can choose to get only the contents they are interested in. This opens a path for partial synchronization where only specific contents or parts of the course are synchronized. Another advantage is the option to choose to include, not to include private data, or include private data but anonymized, in other words it supports privacy. In summary Moodle's export and import feature's advantage compared to other LMSs' is the ability to secure private data, and split course contents into blocks or micros screenshot on Figure 3.5. This thesis highly recommends other LMSs' export and import feature to follow Moodle's footsteps.
3.3.2 Rsync a Blocked Based Remote Differential Algorithm
With the pervious subsection explained that course contents can be dumped using the export and import feature, the next step is performing remote differential synchronization between the two archives. The author chose not to develop an algorithm but used an existing algorithm called rsync . The author also did not write a program to perform rsync but use the already existing program based on the rsync library (librsync). What the author did is just implementing this program to work on hyper text transfer protocol (HTTP) or on web browsers since LMS are usually web based (rsync is mostly used on secure shell (SSH)). There are three general steps of performing rsync algorithm between the two archives located on different servers as on Figure 3.6, and details are as follow:
- The archive to be updated is divided into blocks with each blocks calculated and assigned two types of hash or checksums. The checksums are weak rolling checksum for example Adler-32 and strong checksum for example Black2, and MD5. The checksums are bundled into a signature and sent to the other server. The user can determine the size of divided blocks which can affect the accuracy of finding difference. Figure 3.7 illustrates this step.
- The signature is then used on the latest version archive on the server with latest version of archive. First a weak checksum is checked in rolling block. Second if a block's weak checksum is identical then comparison of the two strong checksums is done to verify whether the block is really identical or not. For blocks with identical checksums, their locations are recorded, while other blocks are regarded as new blocks which should be sent to the server with outdated archive. Checksums on signature with no matching blocks found on archive with latest verstion, the blocks of the outdated archive that generated this checksum will be regarded as deleted. Based on all of these information a delta/patch is generated containing instructions to alter the blocks of the outdated archive and new blocks to be inserted there. This step is illustrated on Figure 3.8.
- The delta/patch is sent to the server with outdated archive, applying it to its archive, constructing identical archive to the latest version one as on Figure 3.9.
Lastly on this subsection, for implementation should be targeted for regions with severe network connectivity. Although transmitting only the differential than the whole contents reduces the transmission cost, it is not the only answer regarding to network stability issue. Network stability issue can be a long cut off in the middle of transmission which forces to restart the synchronization process. Another one is short cut offs which makes the transmission discrete but unnecessary to restart, however frequent short cut offs can corrupt the transmission data. To solve this unstable network problem, techniques implemented in most download manager software applications should also be implemented on the synchronization's transmission. To support continueable download after the transmission is completely cutoff, is to split the transmission data into pieces. During cutoff, the transmission can be continued by detecting how much pieces the client has, then request and retrieve remaining pieces from server. To prevent data corruption checksums can be used to verify the data's integrity, on this case are the pieces integrity. Finally Figure 3.6 is modified to Figure 3.10.
3.3.3 Experiment Result and Evaluation
With dump and upload based synchronization prototype created, an experiment was conducted. The experiments took place on many LMS with the latest version, which were Moodle 3.3, Atutor 2.2.2, Chamilo 1.11.4, Dokeos 3.0, Efront 18.104.22.168, and Illias 5.2. The purpose was to compare the network traffic between full synchronization and incremental synchronization, and percentage of duplicate data eliminated. The experiment used the authors own original course contents which mainly consists three topics are computer programming, computer network, and penetration testing, with each consists of materials, discussion forums, assignments, and quizzes. A snapshot of one of the topics was provided on Figure Figure 3.3.
There are four scenarios. First is full synchronization, equivalent to transmitting the whole course content or full download from the client side. Second is large content incremental synchronization is when the client only have one of the three topics (example for Moodle will update from 16.5MB to 30.5MB). Third is medium content incremental synchronization is when the client already have two of the three topics (example for Moodle will update from 28.4MB to 30.5MB), and the client wants to synchronize to the server in order to have all three of the topics (update). Fourth is no revision meaning incremental synchronizing while there is no update, to test whether there are bugs in the software application which the desired result should be almost no network traffic generated. On Table 3.1 shows the course content data size in bytes when it has one, two, or three of the topics. The data sizes varies depending on the LMS, but the contents such as materials, discussion forums, assignments, and quizzes are almost exactly similar.
|LMS||1 Topic||2 Topics||3 Topics|
|Moodle||16.5 MB||28.4 MB||30.5 MB|
|Atutor||336.5 kB||11.7 MB||13.7 MB|
|Chamilo||8.5 MB||20 MB||22 MB|
|Dokeos||27.4 MB||39 MB||41 MB|
|Efront||16.5 MB||28 MB||30 MB|
|Illias||439.3 kB||22.8 MB||26.6 MB|
The experiment used rdiff utilities to perform rsync algorithm between latest and outdated as the incremental synchronization. Before proceeding it is wise to examine the affect of block size which on previous subsection states that users are free to define the size. The test was perform on Moodle's archives from Table Table 3.1 between an archive which has one topic of 16.5MB and archive which has 3 topics of 30.5MB. The result is on Figure 3.11 showing the relationship between block size, signature, and delta size, which affects total transmission cost by summing signature and delta. Larger block size meaning less blocks where less checksum sets are generated, thus smaller signature size. However this means less accurate checking and less likely to detect similar blocks which will contribute to the size of the delta. The Figure 3.11 showed the delta had reached the full size of the targeted archive, meaning that it missed detecting similar blocks, thus the whole archive is treated as totally different archive. The incremental synchronization will be more heavier than full synchronization. Reversely smaller block size provides more accurate detection which guarantee to reduce the size of the delta. However this means more blocks and more checksum sets are to be bundled into the signature, and looking at the Figure it can grow very large that can cost a lot more transmission cost then full synchronization itself. In conclusion choosing the right blocksize is crucial to get less sum of signature and delta that contributes to the transmission cost, on this case 512 bytes of block size is optimum.
With the relationship of blocksize to signature and delta discussed, it is still not ready to proceed with the experiment. With the difference between the two archive's size, latest is 30.5MB and outdated is 16.5MB ideally the delta should be 14MB but still strayed far to as large as 20MB. It is found that the problem is because the rsync algorithm (rdiff) was executed directly on the archive which is still compressed. The solution is to uncompress the archive before hand and execute rdiff recursively of every available contents which makes the author to turn on more modified utility called rdiffdir.
The experiment succeeded and got results of Figure 3.12. Figure 3.12 already includes uplink and downlink, for incremental synchronization uplink is influenced by the size of the signature and downlink is influenced by the size of the delta (see Figure 3.6). Detailed data are also provided on Table 3.2. However the purpose of both Figure 3.12 and Table 3.2 is only to show that incremental synchronization is better than full synchronization which in this case is lower network traffic, and to show that the incremental synchronization is able to detect when there are no updates in this case almost no network traffic, while the main objective is to eliminate duplicate data during transmission.
|Signature in Mega Bytes||Delta in Mega Bytes|
The percentage of redundant data eliminated is shown on Table 3.3 for incremental synchronization scenarios. It is assumed that the ideal delta is the difference in data size between the latest and outdated archive. The duplicate data is the outdated archive itself or the latest archive substracted by the ideal delta, which is this much that had to be eliminated. The larger the experiment's delta size compared to the ideal delta, the worse the experiment's result. With these results the performance of the incremental synchronization can be evaluated by calculating the percentage of duplicated data eliminated which is the full latest archive substracted by experiment's delta size, next divided by duplicated data, and then converted to percentage. For large content synchronization there is one LMS Atutor which had a low result of 51.89 % due to size of generated archive itself (Table Table 3.1) and drop the whole average to 85.30%. Other than Atutor and Illias the duplicate data eliminated percentage is above 89%. For the medium content synchronization a very high average duplicate data eliminated percentage is achieve which is 97.90%, meaning duplicate data are almost completely eliminated. Though these results are obtain strictly under optimal block size configuration (Figure Figure 3.11) where the minimum network traffic consisted of uplink and downlink (affected by signature and delta size) is desired. There is no benefit of 100% duplicate data elimination if the uplink (signature size) is very large.
|In Mega Bytes||Large Content Synchronization||Medium Content Synchronization|
3.3.4 Advantage of Dump and Upload Based Synchronization
With the dump and upload based incremental synchronization model successfully able to eliminate very large amount of duplicate data the advantage compared to the previous dynamic content synchronization can be discussed:
- Since the model utilizes existing utilities mainly the export and import feature in LMSs one software application can be compatible to all LMS and all of its versions as long as it has this feature. The reason is because the export and import feature is guaranteedly maintain by the LMSs' developers, unlike dynamic content sychronization software application, there is no need to worry about structure changes on LMS. The advantage is actually on the developer side, when writing dynamic content synchronization software application the writer needs to coordinate the database and directories while for dump and upload based synchronziation it is already taken care of by the LMSs' developers.
- Other benefits can also be obtained from the export and import feature however relative to the LMS. For example on Moodle it has the capability to choose whether to include private data or not, meaning for synchronization it can have a flexible privacy option. While for other LMS private data is filtered out which means no other options other than retaining the privacy for synchronization. Another example also on Moodle it is able split a course into smaller blocks of learning contents and able to dump specific learning contents (not all). The synchronization software application can be tuned for partial synchronization, meaning other teachers can get only parts that they are interested in. Unfortunately this is available only on Moodle, other LMS have to dump the whole course contents.
- Since the method is dumping, it can easily be tuned for bidirectional synchronization, unlike dynamic content synchronization which is unidirectional. The incremental synchronization uses the pull concept where the requesting server only asked the difference from targeted server, while push concept is usually unidirectional where the master forcefully updates the slaves. Although dynamic content synchronization is claimed to be unidirectional, the author sees that it is possible to modify the software application to bidirectional because the differential synchronization method is general, however it is uknown whether it will be as easy to modify as the dump and upload base synchronization.
4 Conclusion and Future Work
Portable and synchronized distributed LMS was introduced to keep the contents up to date in environment of severe network connectivity. By replacing the servers with hand carry servers, the servers in severed network regions were able to move to find network connectivity for synchronization. The hand carry server was proved to be very portable because of its very small size and very light weight. The power consumption is very low that a power bank used on smart phone is enough to run the hand carry server for almost a whole day. Though very convenient however it has resource limitations mainly on CPU and memory, which limits the number of concurrent users. Still, the problem of unable to perform synchronization in no network connectivity area is solved.
The Incremental synchronization technique was beneficial for synchronization in distributed LMS, where it eliminates very large amount of duplicate data . Though in the past incremental synchronization was already proposed to be implemented in distributed LMS, this thesis provides a better approach which is dump and upload based synchronization. The advantages are that it is compatible to most LMSs and most of their versions, easily tuneable for bidirectional synchronization, and because it utilizes LMS features it can be tuned for example to configure privacy settings, and to perform partial synchronization.
4.2 Future Work
All of the experiment are done in the lab, and it is better to conduct real implementation in the future especially regarding the hand carry servers. A possible real implementation is to have drones carrying the hand carry servers. Performance issue is still a problem with hand carry servers that demands for enhancing techniques like integrating field programmable gate array (FPGA). For incremental synchronization it was discussed only the network issue but not yet resource such as CPU and memory. Although the synchronization on this thesis is bidirectional, distributed revision control system is needed to be implemented for larger collaborations. The distributed LMS here is a replicated system, but there is a better, more flexible trend to use especially for content sharing which is message oriented middleware (MOM) system that in the future is very interesting to be implemented.
I would like to give my outmost gratitude to the all mighty that created me and this world for his oportunity and permission to walk this path as a scholar and for all his hidden guidances.
The first person I would like to thank is my main supervisor Prof. Tsuyoshi Usagawa for giving me this topic, also to Dr. Royyana who was researching on this topic before me, and their countless wise advices for perfecting this research. The professor is also the one who gave me this oportunity to enroll in this Master's program in Graduate School of Science and Technology, Kumamoto University. It was also through his recommendation that I received the Ministry of Education, Culture, Sports, Science and Technology (MEXT) scholarship from Japan. Not to forget his invitation to join his laboratory, the facilities, and comfort that he had provided. Also, I would like to thank all the oportunities that he had given me to join many conferences such as in Tokyo, Myanmmar, and Hongkong.
Then I would like to thank the Japanese government for giving me this MEXT scholarship that I never have to worry about financial. Instead I can focus on my studies, research, planning my goals for the future, and helping other people. I also would like to thank my other supervisors Prof. Kenichi Sugitani and Prof. Kohichi Ogata for evaluating my research and my thesis.
Next I would like to thank my parents, family and my previous University Udayana University, for not only raising and allowing me, but also pushed me to continue my studies. I would to thank my project team Hendarmawan and Muhammad Bagus Andra that our work about hand carry servers contributes in forming this thesis. My project team also my friends in laboratory Alvin Fungai, Elphas Lisalitsa, Irwansyah, Raphael Masson, and Chen Zheng Yang who were mostly on my side and even contributes to some degree on all my research. Like my friends in previous University whom now walk our separate ways, often spent the night together in laboratory, are friends whom I can trust with my life.
I would to like thank the Indonesia Community, Japanese friends, and other international friends who helped me with life here for example finding an apartment for me, but mostly their friendliness. Lastly to all others that helped me whom I cannot mention one by one, whether the known or the uknown, and whether the seen and the unseen. To all these people, I hope we can continue to work together in the future.
- M. Kelly, “openclipart-libreoffice,” (2017), [computer software] Available: http://www.openclipart.org. [Accessed 27 June 2017].
- S. Paturusi, Y. Chisaki, and T. Usagawa, “Assessing lecturers and students readiness for e-learning: A preliminary study at national university in north sulawesi indonesia,”GSTF Journal on Education (JEd), vol. 2, no. 2, pp. 18, (2015), doi: 10.5176/2345-7163_2.2.50
- Monmon. T, Thanda. W, May. Z. O, and T. Usagawa, “Students E-readiness for E-learning at Two Major Technological Universities in Myanmar,” In Seventh International Conference on Science and Engineering, pp. 299-303, (2016), Yangon, Myanmar.
- O. Sukhbaatar, L. Choimaa, and T. Usagawa, “Evaluation of Students’ e-Learning Readiness in National University of Mongolia, ” Educational Technology (ET) Technical Report on Colloborative Support, etc., pp. 37-40 (2017). Soka University:Institute of Electronics, Information and Communication Engineers (IEICE).
- E. Randall, “Mongolian Teen Aces an MIT Online Course, Then Gets Into MIT,” [online] Available: http://www.bostonmagazine.com/news/blog/2013/09/13/mongolian-teen-aces-mit-online-course-gets-mit. [Accessed 27 June 2017].
- N. S. A. M. Kusumo, F. B. Kurniawan, and N. I. Putri, “Learning obstacle faced by indonesian students,” in The Eighth International Conference on eLearning for Knowledge-Based Society, Thailand, Feb. (2012), [online] Available: http://elearningap.com/eLAP2011/Proceedings/paper25.pdf. [Accessed 27 June 2017].
- Miniwatts Marketing Group, “Internet World Stats Usage and Population Statistics,” [online] Available: http://www.internetworldstats.com/stats.htm. [Accessed 27 June 2017].
- Q. Li, R. W. H. Lau, T. K. Shih, and F. W. B. Li, “Technology supports fordistributed and collaborative learning over the internet,” ACM Transactions onInternet Technology (TOIT) Journal, vol. 8, issue 2, no. 5, pp, (2008).
- F. Purnama, and T. Usagawa, “Incremental Synchronization Implementation on Survey using Hand Carry Server Raspberry Pi”,Educational Technology (ET)Technical Report on Colloborative Support, etc., pp. 21-24 (2017). Soka University: Institute of Electronics, Information and Communication Engineers (IEICE), doi: 10.1145/1323651.1323656.
- F. Purnama, M. Andra, Hendarmawan, T. Usagawa, and M. Iida, “Hand Carry Data Collecting Through Questionnaire and Quiz Alike Using Mini-computer Raspberry Pi”,International Mobile Learning Festival (IMLF), pp. 18-32 (2017), [online] Available: http://imlf.mobi/publications/IMLF2017Proceedings.pdf. [Accessed 27 June 2017].
- R. M. Ijtihadie, B. C. Hidayanto, A. Affandi, Y. Chisaki, and T. Usagawa, “Dynamic content synchronization between learning management systems over limited bandwidth network,” Human-centric Computing and Information Sciences, vol. 2,no. 1, pp. 117, (2012), doi: 10.1186/2192-1962-2-17
- F. Purnama, T. Usagawa, R. Ijtihadie, and Linawati, “Rsync and Rdiff imple-mentation on Moodle’s backup and restore feature for course synchronization overthe network”,IEEE Region 10 Symposium (TENSYMP), pp. 24-29 (2016). Bali:IEEE, doi: 10.1109/TENCONSpring.2016.7519372.
- The World Bank Group. Mobile cellular subscriptions (per 100 people). (2017,March 06). Retrieved from http://data.worldbank.org/indicator/IT.CEL.SETS.P2.
- R. M. Ijtihadie, Y. Chisaki, T. Usagawa, B. C. Hidayanto, and A. Affandi, “E-mail Based Updates Delivery in Unidirectional Content Synchronization among Learning Management Systems Over Limited Bandwidth Environment, ”IEEE Re-gion 10 Conference (TENCON), pp. 211215, (2011), doi: 10.1109/TENCON.2011.6129094.
- R. M. Ijtihadie, Y. Chisaki, T. Usagawa, B. C. Hidayanto, and A. Affandi, “Offline web application and quiz synchronization for e-learning activity for mobile browser” 2010 IEEE Region 10 Conference (TENCON), pp. 2402-2405, (2010), doi: 10.1109/TENCON.2010.5685899.
- M. Cooch, H. Foster, and E. Costello, “Our mooc with moodle," Position papers for European cooperation on MOOCs, EADTU, (2015).
- J. W. Hunt, and M. D. McIlroy, “An algorithm for differential file comparison,” Computing Science Technical Report, (1976). New Jersey: Bell Laboratories, [online] Available: https://www.cs.dartmouth.edu/~doug/diff.pdf. [Accessed 27 June 2017].
- T. Usagawa, A. Affandi, B. C. Hidayanto, M. Rumbayan, T. Ishimura, and Y.Chisaki, “Dynamic synchronization of learning contents among distributed moodle systems,” JSET, pp 1011-1012, (2009).
- T. Usagawa, M. Yamaguchi, Y. Chisaki, R. M. Ijtihadie, and A. Affandi, “Dynamic synchronization of learning contents of distributed learning management systems over band limited network contents sharing between distributed moodle 2.0 series," in International Conference on Information Technology Based Higher Education and Training (ITHET), (2013). Antalya, doi: 10.1109/ITHET.2013.6671058
- A. Tridgell and P. Mackerras, “The rsync algorithm," The Australian National University, Canberra ACT 0200, Australia, Tech. Rep. TR-CS-96-05, (1996), [online] Available: https://openresearch-repository.anu.edu.au/handle/1885/40765. [Accessed 27 June 2017].
- Written by Fajar Purnama
- Category: Uncategorised
- Hits: 21
- This is a dissertation submitted to Graduate School of Science and Technology, Computer Science and Electrical Engineering in Kumamoto University, Japan, on September 2020 in partial fulfillment of the requirements for the degree of Doctor of Philosophy but was not published thus the copyright remained with me "Fajar Purnama" the main author where I have the authority to repost anywhere and I claimed full responsibility detached from Kumamoto University. Except for contents marked with copyright (©), I hereby declare to license it as customized CC-BY-SA where you are also allowed to sell my contents but with a condition that you must mention that the free and open version is available here. In summary, the mention must contain the keyword "free" and "open" and the location such as the link to this content.
- The presentation is available at Slide Share.
- The source code is available at Github.
Declaration of Authorship
I, Fajar PURNAMA , declare that this thesis titled, “Development of a Lossy Online Mouse Tracking Method for Capturing User Interaction with Web Browser Content” and the work presented in it are my own. This thesis is based on few of my publications and I hereby confirmed that I have permission to reuse them:
- For my journal paper titled "Implementation of real-time online mouse tracking on overseas quiz session" (Purnama et al., 2020b), the copyright was transferred to Springer Science+Business Media, LLC, part of Springer Nature but the authors and I have been granted full permission to reuse the accepted version of the journal paper.
- For my journal paper titled "Using real-time online preprocessed mouse tracking for lower storage and transmission costs" (Purnama and Usagawa, 2020), is open access under creative commons (CC-BY) where anyone can reuse the whole material.
- For my proceeding paper titled "Rsync and Rdiff implementation on Moodle’s backup and restore feature for course synchronization over the network" (Purnama, Usagawa, et al. 2016), the copyright was transferred to IEEE but the authors and I does not need formal permission to reuse the accepted version of the proceeding paper.
- For my technical report titled "Incremental Synchronization Implementation on Survey using Hand Carry Server Raspberry Pi" (PURNAMA and USAGAWA 2017), the copyright was transferred to IEICE but the authors and I have been granted full permission to reuse the published version of the report paper (IEICE, 2015).
- For my proceeding paper titled "Demonstration on Extending The Pageview Feature to Page Section Based: Towards Identifying Reading Patterns of Users" (Purnama, Fungai, and Usagawa 2016), the copyright was not transferred, thus the copyright remains with the authors.
- More detailed information are available in Appendix B.
Though people are confined inside their houses due to COVID-19, they are forced to continue their activities online. The demand for tools to monitor these activities increases for example, making sure students reads materials, and examiners does not cheat during online examinations. Unfortunately, conventional web logs cannot monitor those kinds of activities. One monitor tool is mouse tracking that tracks the actions of the mouse cursor that includes clicks, movements, and scrolls, which covers the majority of online users’ interaction to the browser contents. Though mouse tracking is promising, very few implemented this tool because (1) previous mouse tracking tools requires desktop installations which is bothersome to the users and (2) the rumors that mouse tracking generates big data such as the saying a swipe from left to right generates a megabyte of data. This thesis tackles those problem by building a mouse tracking server application that is easily installable and does not require users to install any additional applications other than the web browser. The application was implemented in an overseas quiz session between National University of Mongolia and Kumamoto University where the amount of data generated was also investigated. This thesis also contributes to a lossy online mouse tracking method that can greatly reduce the amount of data generated. Finally, some visualization of the mouse tracking data are shown and possible application such as online examination cheating prevention and force reading of term of service are discussed.
My first gratitude would be to my supervisor Prof. Tsuyoshi Usagawa for taking care of me for five years starting from my Master’s program until the end of my Doctoral program. His deeds are almost immeasurable because without him, Kumamoto University, and The Ministry of Education, Culture, Sports, Science and Technology Japan, my currently best five years of my life may not be possible. I would like to thank my reviewers Prof. Kohichi Ogata, Prof. Kenichi Sugitani, Prof. Masahiko Nishimoto, and Prof. Masayoshi Aritsugi for their time in reviewing this thesis. I greatly thank my friend Alvin Fungai as the co-founder for this topic where without him, the topic of this thesis would have been different and I may be late in finishing this thesis because I have tried doing other topics and found to be much more difficult or just does not suit me. The critical development phase of this research was thanks to the Computer Algorithm class by Prof. Masayoshi Aritsugi and all of the participating members that included Hendarmawan, Hamidullah Sokout, Alhafiz Akbar Maulana, and Sari Dewi where in those moments that I decided this topic as my Doctoral thesis. The implementation and data were thanks to Dr. Otgontsetseg Sukhbaatar, Prof. Lodoiravsal Choimaa, and the students in School of Engineering and Applied Sciences, National University of Mongolia where without them, this topic may not make it to two international journal publication and may prevent the completion of this thesis. Lastly, I would like to thank my mother Linawati, father Teddy Junianto, and Ni Nyoman Sri Indrawati for their daily support.
Table of Contents
- Online Mouse Tracking Implementation and Investigation
Online Mouse Tracking Resource Saving Methods
- Existing Methods
- Real-Time Online Mouse Tracking
- Lossy Online Mouse Tracking
- The Depth Levels of Logs
- Conclusion and Future Work
List of Figures
- In Chapter 1:
- In Chapter 2:
- Mouse Tracking Illustration.
- DOM representation of Table 2.1 (Purnama and Usagawa, 2020). The html tag is the parent with head, body, and footer tag as the children. Head has a child tag title, body has a child tag p, and footer has a child tag p.
- Mouse Tracking Chrome extension.
- Mouse Tracking Plugin on Moodle.
- Online Mouse Tracking Framework.
- Moodle Plugin Install.
- P2P real-time mouse tracking experiment.
- A plot of data rate generated by a user based on the events generated per second ©(Purnama et al., 2020b). The horizontal axis represents the events per second or frequency in hertz (Hz) and the vertical axis represents the data rate in kilobytes per second. The different colored lines represent the number of variables included (refer to Table 2.2).
- Overseas real-time online mouse tracking implementation.
- Moodle Log.
- Moodle Grade.
- Screenshot of mouse tracking data of students from National University of Mongolia who attempted a quiz session on a Moodle server at Kumamoto University ©(Purnama et al., 2020b).
- Total query/rows/events generated by each students during mouse tracking implementation between National University of Mongolia and Kumamoto University and its estimated total data transmission size ©(Purnama et al., 2020b). The horizontal axis represents individual students, primary vertical axis is the query/rows/events, and secondary vertical axis is the estimated data transmission size.
- In Chapter 3:
- Data rate during mouse tracking implementation between National University of Mongolia and Kumamoto University. The horizontal axis represents 10 minute interval time and the vertical axis represents the data rate in kilobytes per second. The yellow horizontal line shows the average and the vertical lines shows the minimum and maximum during their respective interval ©(Purnama et al., 2020b).
- Flowchart of mouse tracking ©(Purnama et al., 2020b): offline (left), online (middle), real-time and online (right).
- Illustration of bottleneck network in regular online mouse tracking and real-time online mouse tracking as a solution ©(Purnama et al., 2020b).
- Whole page vs region of interest vs default mouse tracking illustration. The left scroll illustrates summarized event amount that summarizes the number of events occurring on the whole page; the middle scroll illustrates ROI tracking that summarizes the number of events occurring in defined areas, and the right scroll illustrates default mouse tracking that records every event and the precise point where it occurs, forming a trajectory.
- Three Types of Mouse Tracking Flowchart. The left flowchart is default mouse tracking, the middle flowchart is summarized event amount, and the right flowchart is region of interest mouse tracking (Purnama and Usagawa, 2020).
- In Purnama and Usagawa, 2020 the simulation is based on Figure 2.10. In this thesis, the server is changed to single board computer Raspberry Pi 3. The reason is to support regions with limited connectivity in Figure 3.7.
- Even though the ownership of computer and mobile devices increase drastically, the pace of Internet penetration may not be as fast. Those who are in limited connectivity region may not be able to enjoy online quizzes, let alone mouse tracking. Therefore Purnama et al., 2017 offers a hand carry server solution where the students’ computer devices can connect to the teachers’ single board computer server that runs quiz and mouse and touch tracking.
- The total script running time of three mouse tracking demo session by the author. The horizontal axis is the mouse tracking method. The data in order are from Mozila Firefox, Microsoft Edge, and Google Chrome. The vertical axis is the total running time in milliseconds. Among the three browsers Mozilla Firefox performs faster than Microsoft Edge and Internet Explore performs faster than Google Chrome for this work u, 2020.
- CPU and RAM usage and data rate comparison between default mouse tracking, summarized event amount, and ROI mouse tracking.
- Suppose there are two quiz sessions like the one in this thesis. The teacher have to synchronize the data two times which are after the first session and after the second session. Although the human mind knows that it is better to update, the computer today still does not operate that way. Even the default copying in most people desktop still functions as copying the whole data and replacing the old shown on the left. Today, a separate application must be used to perform incremental synchronization shown on the right that is able to calculate the difference between the old and new data ©(PURNAMA and USAGAWA, 2017).
- A detailed illustration of the rsync algorithm procedure where the steps in summary are splitting the data into blocks, scan for blocks relocation, and scan for blocks that does not exist where they can be to be newly added blocks or unused blocks to be deleted. Finally, execute relocation, addition, and deletion based on the obtained information from the scanning (Purnama, 2017)
- In Chapter 4:
- Six level of web logs in order from most shallow to deepest are Internet, websites, categories, web pages, area, and coordinates. .
- Six level of educational data in order from most shallow to deepest are Internet, academies, courses, course contents, area, and coordinates.
- Web Log vs Eye Tracking.
- Inactive Query Time Domain.
- An exam detector that tracks unwanted activities of participants such as mouse leaving the exam, tab and meta button to leave the exam, and other events indicating exam leaving.
- Mouse Tracking Heatmap.
- Mouse activity heatmap in quiz page locations in time series. The horizontal axis represents 10 minute interval time and vertical axis are quiz page locations. For the heatmap, green color is close to minumum activity, yellow color is close second quartile, and red is close to maximum activity.
- Mouse activity heatmap in quiz page locations of each students. The horizontal axis are quiz page locations and vertical axis are the students anonymized. For the heatmap, green color is close to minumum activity, yellow color is close second quartile, and red is close to maximum activity.
- Grade Heatmap.
- Illustration of force reading based on the duration of the mouse cursor stays in an area. The left example shows that the mouse cursor did not stay long enough in each area and tells the user to read everything, the middle example shows that the mouse cursor did not stay long enough in middle area and tells the user to complete reading middle area, and the right example shows satisfaction in user’s reading.
- Left Click Visualization.
List of Tables
- In Chapter 1:
- In Chapter 2:
- A web page code in simple HTML that contains html, head, title, body, p, and footer tags (Purnama and Usagawa, 2020)
- The data generated of one click posted to the server ©(Purnama et al., 2020b). The rows before the last row are the types of information, and the last row shows the data rate of the submitted post (Purnama et al., 2020a).
- Comparison of mouse tracking data size to daily pageview (monthlypageview), Moodle log and grades, Nasa server log 1995 (nasadata1995), Open University learning analytics dataset (openuniversitydata), and HarvardX Person-Course 2013 (DVN/26147_2014) ©(Purnama et al., 2020b).
- In Chapter 3:
- In Chapter 4:
Thanks to the development of information communication technology (ICT), humanity lives in convenience. It is no longer necessary to spend much effort to seek information. Whereas in the past, people needs to travel to libraries to seek books, buy newspapers to get the latest news, gather in a community to hear the latest rumors, or even start a pilgrimage to find a master. Nowadays, most information are available in the Internet. With ownerships of portable computer devices that can connect to the Internet from anywhere becoming mainstream, anyone can search for their desired information (Dentzel, 2013).
The Internet is not only an open massive source of information where anyone can publish, but also a tool for distant activities. People can interact with each other without meeting through text, voice, or video messages regardless the time and place. More people do not go to shop but order items through online shopping. In some countries like Indonesia, they develop an application that can order variety of services online (Azzuhri et al., 2018) such as meal delivery service, calling house cleaners, calling therapist, etc.
Due to the recent COVID-19 pandemic that occurred early February 2020, most regions are in a lockdown where people are to stay away from each other (mostly asked to stay at home) to prevent the spread of infection. Even school closes, most governments around the world have temporarily closed educational institutions in an attempt to contain the spread of the COVID-19 pandemic (UNESCO, 2020). All forms of activities are recommended to be done online which includes educational activities where courses are switched from face to face to online. The basic of online course that is known today is materials provided online, online text discussion forum, a feature to submit assignments online, online quiz session, (Linawati, Wirastuti, and Sukadarmika, 2017) and the features to analyze and evaluate students' performance. For interactivity, people prefer to join live streaming videos, webinars, online game sessions, interactive online programming, etc.
Unfortunately, conventional web analytic does not measure up to how teachers examine or analyze students during face to face private tutors. Teachers normally able to examine students' attention, emotion, and motivation during studying in real-time, but conventional web analytic does not provide such features for online education. This reason is especially true for a very crucial educational activity which is examination. Security is very tight for face to face examinations to prevent dishonest behavior but this is not true for online examinations today. This is why most educational institute implements blended (Paturusi, Chisaki, and Usagawa, 2012) learning which is a mixed face to face course and online course than implementing full online course. This applies to anything online, not only with education, for example during shopping, shop owners are able to identify the interest of their customers face to face and act accordingly. The simplest example people can see whether someone is skimming or pay close attention during reading when face to face. In online reading, people normally cannot know whether the viewer is actually reading the materials or not. An example crucial demand is reading detection of agreements or terms of services. Most people scrolls down and accept the terms of services without actually reading them.
The lack of data for online analytic can actually be solved by eye tracking, mouse tracking, and all other online monitoring techniques in real-time. Although these techniques were introduced in the early 20th century, they are still rarely implemented. One of the main reasons is the huge data generated by these techniques which is too much for most administrators and analyzers to handle (Leiva and Huang, 2015). This connects to the next reason that the previous applications only suit academia and does not suit wide implementation. For eye tracking is that the hardware are intrusive where users usually have to wear googles. Though non-intrusive ones exists but they are most likely expensive. Mouse tracking are non-intrusive and no cost because in default they are available in every computer where no additional hardware is needed. However, the previous application are only suited in laboratory where they are installed offline in each computer and not online. This thesis tackles that problem.
- There are almost no application to monitor crucial online activities such examinations.
- Although there are rumors of huge data generated by mouse tracking, there are almost no facts and investigations.
- The rumors already discourage mouse tracking application development for public development and today's most mouse tracking application are only suitable for academia and laboratories.
- The huge data generated are inline to the resource required for implementation, thus methods for reducing data generation are necessary.
- Create an online mouse tracking application that is easily implementable.
- Investigate the data generated and resource usage of the online mouse tracking application.
- Implement methods to reduce the data generated and resource usage of the online mouse tracking application.
- Use mouse tracking data to capture users' interaction with the web browser content and design a monitoring tool for crucial online activities which are examinations and passage reading.
This thesis proposes a new preprocessing based on demand method specifically for online mouse tracking. It is a method that allows the implementer to determine the data they need before implementation. Amongst those data, the geometrical data (x and y mouse coordinates) are the largest one generated. Most of the time, implementer do not need all the data. Therefore, the data generation along with resource usage can be reduced if they choose the region of interest beforehand. In summary, by summarizing the coordinates into areas, the data generated can be reduced which will also reduce the resource usage.
- Created an open source real-time online mouse tracking application that can be implemented on any website and browser.
- Investigated the data generated and resource usage of the real-time online mouse tracking application.
- A novel preprocessing based on demand method specifically for mouse tracking that reduces the data generation and resource usage.
- Implemented the mouse tracking application online and obtained mouse tracking data.
- Visualized the mouse tracking data and derive information which are usually underivable from conventional web logs and educational data.
- Designed a possible software implementation for monitoring online reading and examination.
1.6 Benefit and Significance
- Mouse tracking is one of the missing key for anything that are implemented fully online.
- Anyone can benefit the open source real-time online mouse tracking application in this thesis to implement or further develop online mouse tracking.
- The mouse tracking data generation and resource usage investigation can help companies and other parties to plan before implementing online mouse tracking.
- The methods presented to reduce mouse tracking data generation and resource usage gives opportunity for people in limited connectivity area to utilize online mouse tracking.
1.7 Thesis Structure
Other than the introduction, this thesis contains four more chapters. The second chapter is online mouse tracking implementation and investigation where this chapter discusses the implementation of online mouse tracking in any website and browser, and the amount of data generated. The third chapter is online mouse tracking resource usage reduction methods where known methods, real-time implementation, and the novel method of preprocessing based on demand is discussed. The fourth chapter is the depth levels of web logs and educational data which emphasizes mouse tracking logs as deeper level data than conventional educational data logs. The last chapter is conclusion and future work.
2 Online Mouse Tracking Implementation and Investigation
2.1 System Overview
2.1.1 Mouse Tracking in Web Development
The core of mouse tracking in web development is Domain Object Model (DOM) which is an Application Programming Interface (API) for Hypertext Markup Language (HTML) and Cross Markup Language (XML). It defines the logical structure of documents and the way a document is accessed and manipulated. Supposed a simple HTML page with the codes on Table 2.1, the DOM structure can be represented on Figure 2.2. With the Document Object Model, programmers can build documents, navigate their structure, and add, modify, or delete elements and content. Anything found in an HTML or XML document can be accessed, changed, deleted, or added using the Document Object Model, with a few exceptions. DOM is designed to be used with any programming language. Currently, it provides language bindings for Java and ECMAScript (an industry-standard scripting language based on JS and JScript) (Wood et al., 1998).
Table 2.1 A web page code in simple HTML that contains html, head, title, body, p, and footer tags (Purnama and Usagawa, 2020)
The implementation of mouse tracking is based on DOM events, specifically mouse, touch, and User Interface (UI) events which are actions that occur as a result of the user's mouse actions or as a result of state change of the user interface or elements of a DOM tree (Pixley et al., 2000). In this thesis jQuery is used to access the DOM API and receive information that are related to mouse, touch, and UI events. The following list shows the mouse events utilized in this thesis:
- Mousedown: when either one of the mouse buttons are pressed (usually left, middle, or right button)
- Mouseup: when either pressed mouse buttons are released
- Mousemove: when the mouse cursor moves
- Mouseleave: when the mouse leaves an element (we only indicate when temporary leaving a webpage)
- Mouseenter: when the mouse enters an element (we only indicate when temporary entering a webpage)
- Scroll: when the webpage scrolls
- Touchstart: when a computer device screen is touching
- Touchend: when a touch from touchstart is removed
- Touchmove: when a touch is moving
- Touchcancel: when a touch is interrupted
- Resize: when the webpage is zoomed in or out
There are many DOM events that are not implemented by the application in this thesis. However, they maybe implemented in the future if they are found to be useful. But for now, the following DOM events other than mouse events are worth considering and are implemented:
- Beforeunload: when the webpage almost closes
- Resize: when the webpage is zoomed in or out
- keypress: when a keyboard is pressed
- cut: when the user attempts to cut a content
- copy: when the user attempts to copy a content
- paste: when the user attempts to paste a content
- dblclick: when a double click is performed
- auxiliarymenu: when a right click menu is called
After implementing the DOM events, the information is processed by adding important labels. The first labels are time information such as the date of the received information and duration by calculating the difference between the current and previous received events. The second labels are the place information such as the category, page, post, course, course content, or if those information are not available then the default information is the Uniform Resource Locator (URL). More in-dept place information are the areas or sections of the page, and the deepest of them all are the coordinates of the page. The third label is the identity label if available and permitted such as the name, email address, ip address, and location of the user.
2.1.2 Online Mouse Tracking System
The author developed an online mouse tracking application implementable on any website where the code is open source on GitHub (Purnama, 2019). It is written in HTML, Cascading Style Sheets (CSS), JS, jQuery, and PHP. The mouse tracking code can either be implemented on client side shown on Figure 2.3 or server side shown on Figure 2.4. The difference is that the client side can capture anything including all the web page that the user visits while the server side can only capture the events that happen on the server's website.
Figure 2.5 shows a more detailed server side implementation. The mouse, touch, and UI DOM events in the previous subsection are written in JS and jQuery and are placed on the representation side which is the website along with the HTML and CSS. The order the online mouse tracking in Figure 2.5 are:
- The browser attempts to visit the website by requesting HTML, CSS, and JS. If the mouse tracking is written as a server application, then the code is in the JS section, otherwise it is directly installed on the client. The code is written in jQuery.
- The HTML, CSS, and JS are sent to the client.
- The browser renders the page by processing the HTML and CSS.
- JS and jQuery are often categorized as client side programming language. They run on the browser's background where in this case the mouse tracking is running on the background.
- What differentiates offline and online mouse tracking is the location of where the mouse tracking log is stored. Offline mouse tracking stores the logs on the client while online mouse tracking stores the logs on online server. When storing mouse tracking log online, the client side sends the log using Hypertext Transport Protocol (HTTP) post method.
- The server processes the received log usually using server side programming language such as PHP.
- The log can be stored as a file, in a database, or in any form of storage.
For the server application, the advantage is that client does not need to install additional application, just browse the website and mouse tracking runs automatically but the disadvantage is that it cannot track outside of the website however it can still tell whether users' are leaving the page or not. For the server hardware depends on the amount of users that the administrators want to handle and as for the hardware specification used in this thesis is discussed on the next section. For the software, a standard web server is enough such as a server equipped with Apache2, PHP, and MySQL. For the installation, the author made it easy that all that are needed are to download the codes and install. In this thesis, the mouse tracking server application was implemented on an Learning Management System (LMS) called Moodle which is used to handle online courses. The mouse tracking codes are rearranged as a Moodle plugin where the author made a block and theme plugin for the Moodle shown on Figure 2.4. For usage, online choose one form of the plugin, either block or theme. The installation is also easy shown on Figure 2.6 where the process are only download, upload the plugin to Moodle, and install.
2.1.3 Privacy Policies
Privacy policies should be disclosed to the users during any form of data gathering. In the European Union (EU) is more strict that cookie policies should be separated from the privacy policies. By disclosing privacy policies, not only being in compliance with the laws and regulations, but build trusts with the users as well (PrivacyPolicies.com, 2020).
Based on how mouse tracking is executed which more details are illustrated in Figure 2.5, users actually have full control over the mouse tracking process and they can stop the process anytime but they are usually unaware because the mouse tracking runs in the background. They would have to thoroughly inspect the background area to see the running mouse tracking and most users do not attempt to perform this task because they do not feel bothered by the process. This is the reason why mouse tracking is considered non-intrusive.
2.2 Network Data Transmitted by One Click
Leiva and Huang, 2015 stated that a mouse swipe from left to right can generate hundreds of cursor coordinates and a mouse activity over a minute can generate 1 MB (megabyte) of data. Huang, White, and Dumais, 2011 conducted a massive scale mouse tracking on Microsoft’s Bing search engine but in the middle of the experiment, they have to reduce the sampling rate because the data size was simply too much. Those two references are the only scientific record found that complains about the problem of huge data generated by mouse tracking. This shows that data generated and the resource usage are not officially investigated. Therefore, an implementation followed by investigation were conducted by Purnama et al., 2020b.
2.2.1 Peer to Peer Experiment
The one click Peer-to-Peer (P2P) experiment is an experiment that measures the amount of data transmitted from the client to server when the user performs one click shown on Figure 2.8. This experiment greatly helps the investigation because the result can be used to predict the data cost mathematically. However, the result is dependent on the application, as time passes people may find ways to reduce the data.
The online mouse tracking application was installed on the author’s Moodle server. The resource costs were then measured. The data rate of the network was measured using a tool called Wireshark. The server is an Ubuntu 18.04 Long Term Service (LTS) server equipped with an Intel(R) Core(TM) i7-6800K Central Processing Unit (CPU) @ 3.40 Giga Hertz (GHz) (with SSE4.2) CPU, 32 Giga Byte (GB) of DDR4 Random Access Memory (RAM), 10 Tera Byte (TB) of hard drive, and an allocated 2 Mega Byte per second (MBps) network.
2.2.2 Data Generation Estimation for Implementation Plan
The result on Table 2.2 showed that one click generates around 3-4 kilo Byte (kB) of transmission data. In other words, the mouse tracking application generates around 3-4 kB when one event occurs. The size depends on the metadata where in this case the size greatly increases when date and URL are included because they contain many characters.
|data rate (kB)||3.11||3.14||3.14||3.2||3.2||3.22||3.25||3.29||3.43||3.56||3.64||3.72|
If the administrator can estimate the amount of users and the average amount of events generated by users, then the administrator can estimate the amount of data to be generated. Rheem, Verma, and Becker, 2018 states that a very high activity is around 70 events per second. Based on Figure 2.9, expect a worst case scenario that a user generates a data rate of 210-280 kilo Byte per second (kBps).
2.3 Overseas Online Mouse Tracking Implementation
2.3.1 Quiz Details
An online quiz session was conducted on the 3rd of January 2019 between approximately 12:00 and 14:30 Japan standard time. There were 2 sessions, with each session lasting approximately an hour and including 20 and 21 students (41 total students participating) from the School of Engineering and Applied Sciences, National University of Mongolia accessing the Moodle server at the Human Interface and Cyber Communication Laboratory, Kumamoto University. The map illustration is shown on Figure 2.10.
The quiz is a part of a mid-term exam of Microprocessor and Interfacing Techniques course for sophomore and junior year students in Department of Electronics and Communication Engineering, National University of Mongolia. The quiz is on https://md.hicc.cs.kumamoto-u.ac.jp. Figure 2.11 shows a screenshot of the Moodle log file and Figure 2.12 shows a screenshot of students grade of the quiz session. The detailed anonymous log files are published in Mendeley Data (Purnama et al., 2020a). The internet protocol (IP) address of the students for example “22.214.171.124” can be tracked by geo-location that it originates from Mongolia and “https://md.hicc.cs.kumamoto-u.ac.jp” which can be nslookup as “126.96.36.199” originates from Japan.
2.3.2 Amount of Data Generated
The screenshot of mouse tracking log can be seen in Figure 2.13. Based on the data shared in Mendeley (Purnama et al., 2020a), the majority of the events are mouse movements and scrolls. That is because each change that occurred in either on the mouse cursor or scroll positions are captured. Rapid mouse movements or scrolls will generate large amount of data and how much depends on the capabilities of the computer. Theoretically, if the mouse cursor travels a distance of 1000 pixels than the number of mouse movement events generated are 1000, and if the scroll distance from top to bottom is 1000 pixels than the number of scroll events generated are 1000. In short, the capturing of geometrical data which is the x and y coordinates of the mouse cursor and scroll is the cause of the huge data generation. Also, the affect is multiplied to the amount of labels attached such as the user's identity that did the events, the place, and the time of the event occurrences. Just removing the URL label can save a lot of data space.
During the quiz session, Figure 2.14 shows that a student is capable of generating a total over 20000 events which is over 80 Mega Byte (MB) transmission data. This means that student had to upload 80 MB of data at the end of the student's mouse tracking session in each page. According to Ookla, 2020 the global average network speed is 9.3 MBps downlink and 3.9 MBps uplink. This means there exist countries with the average network speed below that. Although nowadays are common for university size institutions to have network speed over 100 MBps, those resources are usually already allocated for many things. For example, the author's laboratory was only given 2 MBps network speed, meaning the mouse tracking session can flood the network. This explains why administrators are reluctant in implementing online mouse tracking. Imagine how much data can be generated if online mouse tracking is implemented by the whole university daily and full time.
The amount of mouse tracking data compared to page view and other conventional web analytic were almost incomparable. Table 2.3 shows that the moodle log and grade of the quiz session were only a few kilobytes while mouse tracking log is already over a hundred megabytes. In that table is also shown other logs that required long duration and many users to reach the amount of data that mouse tracking log has. While a few hard drive storage are enough to store conventional web and educational logs, many more hard drive storage are needed to store mouse tracking logs.
|Daily Pageview City Archive||2 Month||-||13 kB|
|Moodle Log and Grades||3h 30min||41||191 kB|
|Mouse Tracking||3h 30min||41||122 MB|
|Nasa Server Log 1995||23 days||-||153 MB|
|Open University Learning Analytics||1 Year||32593||442 MB|
|HarvardX Person-Course 2013||1 Year||301609||33.8 MB|
3 Online Mouse Tracking Resource Saving Methods
It is unfortunate that the online mouse tracking resource usages are too much for regular people to implement daily and full time except for special occasions only such as examinations. The ones who can implement online mouse tracking daily and full time are big institutions such as Amazon and Google. Therefore, on this chapter is discussed the novel method of this thesis to reduce the resource usage of online mouse tracking.
3.1 Existing Methods
Existing methods to reduce mouse tracking data transmission are common sense and popular methods where most of them were discussed by Purnama et al., 2020b. They are:
- Redundant data reduction which is mostly about reducing meta data such as shorting date format, shorting URL, avoiding duplicate or repetitive data, and exclude information deemed unnecessary.
- Sampling rate reduction which is adding delay to the event capturing. The default is to capture immediately such as every time the mouse cursor and scroll moves even if they are only by one pixel point while with sampling rate reduction, there are pauses in the capturing process for example every 50 milliseconds, 1 seconds, 2 seconds, etc. where the longer the interval the more the data reduction but at the cost of data resolution.
- Adaptive sampling where the application does not capture if the mouse cursor and scroll are idle, unlike usual eye tracking where the eye gazes are capture every certain interval even though the gaze's position does not change.
- Compression methods which were researched by Leiva and Huang, 2015 and Martín-Albo et al., 2016.
3.2 Real-Time Online Mouse Tracking
The conventional data transmission method is to transmit the data as a single package at the end of each mouse tracking session. Based on Figure 2.14, this conventional transmission method floods the 2 MBps network. The author anticipated this and implemented real-time transmission (Purnama et al., 2020b) method avoiding often 2 MBps flood which was reduced to data rate of average around 100 kBps. Although the average data rate is 100 kBps, Figure 3.1 shows many spikes where the difference between average and maximum is large which indicates that there were moments of high activities. The highest spike is around 800 kBps. The spikes are not only pointing upward but pointing downward as well which indicates that there are also moments of low activities. Overall, the standard deviation is high where there were times when activities were high and activities were low, thus precise data usage can be difficult to predict.
The difference between offline mouse tracking, online mouse tracking, and real-time online mouse tracking can be described on Figure 3.2. While offline mouse tracking stores the data in each of the users' computers, online mouse tracking transmits the data to the server. While conventional online mouse tracking stacks the data until the end of every session before transmitting as a single package, real-time online mouse tracking transmits the data immediately after an event occurs every time. Real-time online mouse tracking helps in reducing the probability of bottleneck as illustrated on Figure 3.3. This helps to balance the transmission load.
3.3 Lossy Online Mouse Tracking
3.3.1 Three Mouse Tracking Preprocessing and Transmission Method
In the end of Chapter 2, it is known that the capturing of geometrical data which are the x and y coordinates of the occurred events and the time stamping of each events are the largest contribution to the data size. If the geometrical data can be reduced then the data size can be reduced as well. Based on many example mouse tracking data analysis, there are three possible cases illustrated on Figure 3.4:
- Default mouse tracking which is using all of the geometrical data when and where every events that occurred at each coordinate. An example of data visualization that can be generated by default mouse tracking is mouse trajectories and if the time is recorded as well, a video replay of the mouse trajectory can be generated.
- Summarized event amount which is not using any geometrical data where only the event amounts are captured not knowing when and where they occurred. Currently only the amounts of duration, mouse clicks, mouse movements, mouse scrolls, zooms, and keyboard typed of each session are captured, sacrificing the position and time information of these occurred events.
- Region of interest (ROI) mouse tracking which is using only selected geometrical data where the coordinates are summarized into selected areas. In other words, the mouse tracking is no longer able to identify the coordinates but only get the activity heatmap of the area. Currently the amounts of duration, mouse clicks, mouse movements, mouse scrolls, zooms, and keyboard typed of each session are captured on header, footer, navigation menu, and each of the quiz question area, sacrificing the exact coordinate information of each events. This method is actually a continuation based on previous work by Purnama et al., 2016 and Purnama, Fungai, and Usagawa, 2016.
By knowing the geometrical data that the analysers wants, the storage and transmission cost can be reduced by applying preprocessing and modifying the transmission method based on Figure 3.5. The default one is the real-time online mouse tracking where the event information is immediately sent to the server at the moment it occurred. For the summarized event amount, only the amounts of events are recorded excluding the place and time of occurrence. It is discouraged to update the event amount in real-time because that will cost data on the network. Instead, it is best to utilize the conventional transmission method where the final event amount value is sent only once at the end of each session (refer to Figure 3.2 online mouse tracking transmission not in real-time). Unfortunately, there are still some potential problems to this conventional transmission method implementation where if the user ends the session in haste, the time may not be enough to retrieve the mouse tracking from the client to the server and potentially losing the data. For ROI mouse tracking, the amount of events are accumulated when the mouse cursor is still within a specific area. When the mouse cursor moves to a new area, the event amount information of the previous area is sent to the server, and the process repeats. There is still a limit in determining and labelling web page areas. Usually, it is done manually by the analyzers but this way is very labor and time consuming. It is possible to determine and label areas automatically using offset DOM event, but not in a smart way where it depends on the layout of the web page. After the areas are determined for the ROI mouse tracking, the transmission method is a hybrid of conventional and real-time where the mouse cursor enters an area and accumulates the event amounts, then the result is transmitted after the mouse cursor leaves the area, and the process repeats upon entering a new area.
3.3.2 Three Mouse Tracking Preprocessing and Transmission Simulations
Since the author did not have another mouse tracking experiment opportunity, a simulation is conducted based on the previous mouse tracking experiment on Figure 3.6. It is possible to replay the scenario because the date of each events during the mouse tracking session was captured. However, there was a limit at that time that half of the students are using different time zone format which was difficult to simulate and half of the students are excluded leaving only 23 students.
Additionally in this thesis, the author simulate the mouse tracking on a single board computer Raspberry Pi 3 to sympathize with those that are in limited connectivity region where the method of mouse tracking quiz session is locally illustrated in Figure 3.7. Also, it is interesting to see how much the Raspberry Pi 3 can handle mouse tracking simulation in terms of CPU and RAM.
Five mouse tracking simulations are performed on a quiz page with a size or dimension of 1920x1080 pixels:
- Default mouse tracking simulation without changes in the original mouse tracking data.
- ROI mouse tracking where the coordinates are summarized into certain areas for each users. The summarising is based on the flow of time domain where a query based on the summarized coordinates is generated every time a user leaves an area and not a total summary of each area where more information can be found on Appendix
- ROI mouse tracking 1 where the coordinates are summarized into 50 areas which consists of header, title, quiz navigation, navigation, administration, footer, each quiz flags, each quiz questions, each quiz answers, and blank areas.
- ROI mouse tracking 2 where the coordinates are summarized into 35 areas where the quiz questions and answers each are summarized or combined.
- ROI mouse tracking 3 where the coordinates are summarized into 20 areas where the each quiz flags are summarized or combined to their respective quiz areas.
- Summarize amount of events mouse tracking simulation where the data is transformed by summarizing the event amounts of each users into a query and sent the queries based on the end session time of each users.
3.3.3 Three Mouse Tracking Preprocessing and Transmission Results
The result is that a great reduction in data size is achieved by sacrificing some geometrical data for ROI mouse tracking and all geometrical data for summarized event amount shown on Table 3.1. Surprisingly on the user side, the script total execution time on the browser was also reduced shown on Figure 3.8. The transmission cost was also reduced shown by the reduced data rate on Figure 3.9 which is also in parallel to the server's CPU and RAM usage.
|Default Mouse Tracking||286510||∼100 MB|
|ROI Mouse Tracking 1||28048||∼7.7 MB|
|ROI Mouse Tracking 2||19061||∼5.3 MB|
|ROI Mouse Tracking 3||17880||∼4.9 MB|
|Summarized Event Amount||23||∼16 kB|
The Raspberry Pi's CPU is not strong enough to handle the default mouse tracking simulation of around 20 users where the CPU often reach 100% usage. Even the RAM usage is abnormally high over hundreds of MB. However, it is able to handle ROI mouse tracking and summarized event amount method. This shows how useful the data reduction method are.
Among the three mouse tracking method, the summarized event amount method is the maximum resource reduction because all the geometrical and time data are excluded or simply only consist of one area. Theoretically, the amount of query is reduced to one per mouse tracking session. For the ROI mouse tracking, does not necessary always result in large resource reduction like the result in this thesis. Theoretically, it depends on the area division of the web page. The smaller the division, the larger the area, the larger the resource reduction, and vice versa. By performing more division, the areas become smaller, the resource usage becomes larger, and eventually the area will become as small is coordinates if areas are kept being divided which will become the same as default mouse tracking.
3.3.4 Synchronization for Hand Carry Server Quiz
The teacher may decide to conduct the quiz locally using hand carry server illustrated in Figure 3.7 for various limited connectivity reasons such as expensive or unstable Internet connection. If the log data is only for the teacher to use, then all is well, but if it is for institutional use, the teacher may have to synchronize the data to the institution's server. It will be wise to use incremental synchronization method illustrated on Figure 3.10 to reduce data especially for large data like mouse tracking log.
There are two ways to perform incremental synchronization. The first one is to store the data in Structured Query Language (SQL) which is mostly used in database applications. SQL stores the data in form of table and to update is just sending new rows from the teacher's database to the institution's database. Most log data are in unidirectional incremental/addition fashion which is why SQL is mostly used. However, if the update is more than just incremental such as correction where there are deletion and modification than it is more complicated for SQL to handle (Purnama, Usagawa, Ijtihadie, et al., 2016). The most popular algorithm to handle this update is the rsync algorithm illustrated on Figure 3.11. Example use case are when teacher forgot to exclude private data when privacy is a concern and accidentally upload to the server. In this case, the teacher would want to remove the private data in each query where rsync can save resource cost. Though, this is less likely to occur. A more realistic case is a teacher needed to update their quiz contents from the server where the update is made of addition, deletion, and relocation.
4 The Depth Levels of Logs
Back in Chapter 1, it was emphasized that conventional web logs and educational data have a limitation regarding to the information that they can derive. Mostly, it was about how those conventional logs could not capture the users or students behavior online. Eye and mouse tracking solves that problem by capturing how the students interact. It took some time for the author to understand and conceptualize the meaning behind those repeating statements about what conventional log data cannot tell while eye and mouse tracking log can tell. It turns out to be that the depth level of those logs are different where eye and mouse tracking logs belong to a deeper level than conventional logs.
This thesis defines six depth level of web logs from browser content point of view shown on Figure 4.1. Most analyzers do not know that there are deeper level of logs. Most tools do not generate data in deeper level than web page level logs. The web log depth levels converted to educational data can be illustrated on Figure 4.2. Most educational tools only generate logs up to course content level which are mostly how many time the students attempts the activity and what grade they received. This chapter discussed the three deepest log levels and explained how mouse tracking belongs to the deepest log level.
4.1 Web page / Course Content Level Logs
4.1.1 Conventional Web Logs and Educational Data
The conventional web logs belongs up to the web page level log. They are mainly page views which shows that a web page from a certain website and category have been viewed (Bluehost, 2016). Additional metadata can be attached to the page view:
- "Who", the identity of the viewer can be identified if the viewer register to the website, provides identity on the browser and gives permission to identify, or if not then the internet protocol (IP) address of the viewer can be captured.
- "Where" can be the link of the web page or the location of web server and viewer if they are identifiable.
- "When" is usually the date and time of the occurred page view or any action. More specifically, the duration can be calculated.
- "What" is usually the action of the viewer labeled by the analyzer. If the web page is a reading content then the viewer's action is labeled as reading. If it is an audio content then the viewer's action is labeled as listening. If it is a video content then the viewer's action is labeled as watching. If it is a forum then the viewer's action is labeled as discussing and etc.
As page view belongs up to the third deepest level log, there is a limit how much it can tell no matter how hard it is analyzed. For example, page view cannot tell how a user is reading a content such as whether the user is skimming or reading in detail. The limit is that page view cannot capture activities that occurred in specific area of the web page. In education, there are four popular logs that are used by teachers which are materials the student read, assignments submitted, topics discussed in forum, and quiz or exams grades. Unfortunately just as conventional web logs, conventional educational data can only tell what activities the students are doing and its duration but cannot tell how the students attempts those activities which can be more emphasized on Figure 4.3. In other words, it can identify a certain extent of what, when, where, and who but cannot identify deeper and how the viewer interacts with the contents (Purnama et al., 2016) (Purnama, Fungai, and Usagawa, 2016).
4.1.2 Amount of Interactions
Although the summarized event amount of mouse tracking is on the depth level of web pages or course contents, it is still not widely known by analyzers. DOM events can tell many other interactions users does on the web page. The simplest of them are knowing how much interaction the user does such as how many clicks, how many touch, how much mouse movements, how much scrolls, how much zoom in and zoom out, how many copy and paste, how many times the keyboard was pressed, and etc. Table 4.1 shows that the Mongolian students attempting the quiz session took at average 1368 seconds, performed at average 175 left clicks, 8 middle clicks, 11004 mouse movements, and 4158 scrolls.
Knowing the amount of DOM event occurrence on a web page may give a hint whether the web page fulfills its purpose or not. For example, a web page designed based on game theory are bound to be interactive where if there are less events such as clicks, movements, etc, may show that the users does not engage on the web page, whereas if the web page is designed for reading and there are many events, then there must be something wrong. The author expect high amount of DOM event done by the students because they are attempting a quiz where they need to perform many clicks to choose an answer, and need to perform many movements to read the questions carefully and maybe reviewing some questions. If there is no problem with the web page then there can be problems with the users. A study showed by Rodrigues et al., 2013 that high amount of events generated by a user can indicate that the user is stressed. Theoretically, there should be a common sense of how much a user should generate events within a certain amount of duration.
4.1.3 Web Page or Course Content Inactivity
Web page or course content inactivity is another DOM mouse event feature that analyzers does not know. In page view, the duration can be counted on visited web page but it cannot tell whether the users are actually in the web page the whole time because they can just open another tab and leave the previous ones open. With mouse DOM events, it is possible to distinguish the amount of active and inactive time of users within a web page. The inactivity is indicated when the mouse cursor leaves the web page for opening another tab or doing other activities and when the mouse cursor re-enters the web page, the status will show active again.
In Table 4.1, the amount of inactivity queries of each student are provided, and in Figure 4.4, the amount of inactivity in time domain are plotted. They showed that all the students does not always stay in the quiz page which opens the possibility that they are seeking information from outside source to answer the quiz better such as searching for answers in search engines and messaging friends online. The amount of inactivities could be exagerated due to system limitation reasons such as slow mouse leaves generates more inactivities query than fast mouse leaves. However, the system design still ensures that no inactivities queries will be generated if the mouse does not leaves the quiz area.
Aside from capturing inactivities, capturing highlight, copy, cut, and paste can help in detecting dishonest behaviors. An alarm system can be developed to inform the examiners when such events occurred. For important exams such as certifications, stricter systems can be implemented such as immediately failing the test when the mouse cursor leaves the exam illustrated on Figure 4.5.
4.2 Area Level Logs
Area level logs are logs showing activities within areas of the web page or course contents. This can be done by either or combination of capturing the mouse cursor position, the touch location, the scroll bar position, or tracking the eye ball position. Then capturing the date and time of the events that occurs in those positions. The ROI mouse tracking provides these kinds of information. The amount activity in each area for this thesis is based on the total amount of events.
The most popular analysis of area level logs are heatmap visualization. There are many indications that can be derived from heat maps. For example on a high activity or duration area, may indicate that users are interested in the area. If not, then they may have trouble with the area whether trouble in understanding the content, questions that are too difficult for example on Figure 4.6 that question three receives the most attention which may indicates difficulties, or there was design problems that results in unnecessary efforts on users to capture the information. On the other hand if the area has low activity or duration may indicate that the users are not interested, the design is not well enough to capture the users' attention, or the question in the quiz is simply too easy.
Figure 4.7 shows an even more detailed heatmap where the visualization was split into 10 minute intervals. Just from a glance it can be seen that the high activity time is the 30th, 90th, and 160th minute, they took a break on the 130th minute, and they finished on the 230th minute. Another interesting information is that they did not bother much with the last question, maybe whether they are too easy or they just want finish quickly because they are too tired.
Figure 4.8 shows another detailed heatmap regarding to the amount of activities done by each students on each area. The heatmap seems to vary to not showing much similarities between each students however, there are some. There can be seen a common correlation on question 13 that there are high activities and looking at the grade/score distribution in Figure 4.9, many students got the answer wrong which maybe common evidence that the question is too difficult for them that they had to take more effort in it. An opposite case is on question 6 where there are low activities but many students got the answer wrong which can lead the analyzer to wonder whether question is a trick question. Another similar case with strong similarity found between Figure 4.8 and Figure 4.9 that students did very little activity on the last question and but most the students got the answer wrong. Unlike question 6, it may not be a trick question but a difficult question because the score allocation is high. There maybe two possibilities where the first possibility is that the students ran out of time and since it is the last question, they may answer randomly, and the second possibility is that the students are lazy and/or tired that when they reach the last question that is difficult, they answer randomly because they may just wanted to finish the quiz quickly, giving up on the last question.
Those indications can be useful in many ways. For example, if the indications shows that users are not paying attention to areas which are intended to be emphasize by content creators then there needs design fixing or content revision. In education, the heat map can be useful to profile the students. It can then be followed by a guidance system that can automatically detects the students interest which the guidance system can guide the students in many ways such as linking to related resource, suggesting students their career path, grouping them with relevant community, etc. The profile can also be used in a stricter way where the teachers gives assignments to students about reading a context and the system will detect whether the students have sufficiently paid enough attention to the context or not.
Additionally there are some analyzers that counts the amount of mouse entering and leaving the area which is known as the mouse flow. In quiz sessions, it is normal to find many mouse flows because students tends to review or revisit the questions whether to double check or because they previously skipped them. On the other hand, for a website that is meant to guide or share information, many mouse flows may indicate problems for the website such as the users maybe confused in finding the information they need thus searching tirelessly (Hsu, Chang, and Liu, 2018).
A possible application is force reading illustrated on Figure 4.10, for example making sure the students read the agreement to tracked before exam and users read the term of service. The administrator can configure the variables such as the reading duration and amount of activities and areas. Simply, if the user did not read enough the area, then the user cannot pass and must read enough of the defined passage.
4.3 Coordinate Level Logs
The coordinate level are the deepest level logs. The coordinate values can either be based on document, screen, or windows perspective. This is the log that the default mouse tracking generates (Purnama et al., 2020b). It is overwhelming but contains the most information where this is the log that most analyzers should want to keep. The more shallow level such as the area level log can be derived from the coordinate level log and it is unidirectional where the vice versa is not possible (Purnama and Usagawa, 2020). The most popular analysis is to draw a mouse trajectory. If the time when the mouse cursor lands on the coordinates are recorded, then it is possible to replay what the users did.
An example visualization that can be drawn from the mouse tracking data is the mouse click trajectory in Figure 4.11. It shows a user highlighting a text which can indicate that a user is paying a attention to that text or attempts to copy that text to save in the user's note or to paste in the search engine to find more information about the text. The amount of highlights the students did was also summarized on Table 4.1 and showed that either the students who highlights gets high or low grade and not average grade. The speculation is that the questions they highlight are too difficult for them and either they succeeded in finding the answers on other sites or failed. Unfortunately, the copy and paste events were not implemented at that time. In fact, it is because the author found this highlighting that motivates the author to add copy, paste, and other DOM events into the mouse tracking application.
Although mouse tracking logs are part of the deepest level logs there is still a limit of how much the mouse cursor and scroll position can indicate because certain events does not necessary have to occur on those positions. For example, reading is based on the eye gaze and typing may occur not far from the mouse and scroll position but not necessarily exactly on those position. Each of these logs alone will not make the best logs but a combination of them. Combining conventional web logs or educational data with mouse tracking and eye tracking may provide a complete log.
5 Conclusion and Future Work
The author wrote an online mouse tracking application suitable for public implementation and implemented during a quiz session at the Human Interface and Cyber Communication Laboratory, Kumamoto University on the 3rd of January 2019 between approximately 12:00 and 14:30 Japan standard time. The amount of data generated by mouse tracking was investigated during the implementation and found that the cause of huge data generation is the capturing of geometrical data or coordinates of each event. Aside from existing solutions to reduce data, this thesis also implemented and discussed real-time transmission system in mouse tracking data retrieval helps distribute the network's burden across the time domain. The main novelty of this thesis is the select-able geometrical online mouse tracking method where there are possible cases that not all the geometrical data are required. The method allows summarizing of coordinates into areas or deleting the coordinates if they are not necessary. The results showed great reduction in storage and transmission costs. However, the method is lossy because the process is irreversible. Rich mouse tracking data were obtained and in this thesis a new concept of log dept level was discussed with example analysis that include click visualization and activity heatmap which help in identifying the interaction between the students' and the quiz page.
5.2 Future Work
The real-time transmission is not the best solution. A better method is to upgrade the real-time transmission method by integrating smart transmission method where the client can detect the traffic of the network and determine the optimal time for queuing and transmission. Although the select-able geometrical mouse tracking data method works perfectly, there are still problems with execution. If all of the geometrical data are excluded, the most efficient time to transmit the data is only once which is when the user leaves the page. However, the problem lies with the browser where there is currently no way to force the user to wait before the transmission process finishes, leaving potential problem of data loss. The problem for ROI tracking is that it cannot perform smart area determination and labelling. Normally, they are performed by humans. Therefore, one solution is to develop an artificial intelligence for this matter in the future. Finally, this doctoral thesis is only limited to mouse tracking with one type of activity which is examination. There are a various activities such as passage reading, e-commerce, entertainment, Geo-visualization reading, search engine, social media, etc which are open for future work.
Appendix A Data
A.1 Quiz Areas
|Area||x1, x2, y1, y2||Area||x1, x2, y1, y2|
|Header||0, 1920, 0, 64||Quiz8 Question||529, 1900, 2453, 2493|
|Title||16, 1904, 150, 270||Quiz8 Answers||529, 1900, 2494, 2730|
|Quiz Navigation||18, 364, 291, 532||Quiz9 Flag||384, 528, 2731, 3242|
|Navigation||16, 366, 551, 1042||Quiz9 Question||529, 1900, 2731, 2831|
|Administration||18, 364, 1062, 1693||Quiz9 Answers||529, 1900, 2832, 3242|
|Quiz1 Flag||384, 528, 291, 570||Quiz10 Flag||384, 528, 3243, 3580|
|Quiz1 Question||529, 1900, 291, 341||Quiz10 Question||529, 1900, 3243, 3343|
|Quiz1 Answers||529, 1900, 342, 570||Quiz10 Answers||529, 1900, 3341, 3580|
|Quiz2 Flag||384, 528, 571, 852||Quiz11 Flag||384, 528, 3581, 3856|
|Quiz2 Question||529, 1900, 571, 621||Quiz11 Question||529, 1900, 3581, 3631|
|Quiz2 Answers||529, 1900, 622, 852||Quiz11 Answers||529, 1900, 3632, 3856|
|Quiz3 Flag||384, 528, 853, 1133||Quiz12 Flag||384, 528, 3857, 4169|
|Quiz3 Question||529, 1900, 853, 903||Quiz12 Question||529, 1900, 3857, 3907|
|Quiz3 Answers||529, 1900, 904, 1133||Quiz12 Answers||529, 1900, 3908, 4169|
|Quiz4 Flag||384, 528, 1134, 1441||Quiz13 Flag||84, 528, 4170, 4746|
|Quiz4 Question||529, 1900, 1134, 1184||Quiz13 Question||529, 1900, 4170, 4520|
|Quiz4 Answers||529, 1900, 1185, 1441||Quiz13 Answers||529, 1900, 4521, 4746|
|Quiz5 Flag||384, 528, 1442, 1748||Quiz14 Flag||384, 528, 4747, 5295|
|Quiz5 Question||529, 1900, 1442, 1492||Quiz14 Question||529, 1900, 4747, 5097|
|Quiz5 Answers||529, 1900, 1493, 1748||Quiz14 Answers||529, 1900, 5098, 5295|
|Quiz6 Flag||384, 528, 1749, 2027||Quiz15 Flag||384, 528, 5296, 5842|
|Quiz6 Question||529, 1900, 1749, 1799||Quiz15 Question||529, 1900, 5296, 5646|
|Quiz6 Answers||529, 1900, 1800, 2027||Quiz15 Answers||529, 1900, 5647, 5842|
|Quiz7 Flag||384, 528, 2028, 2452||Footer||0, 1920, 5939, 6116|
|Quiz8 Flag||384, 528, 2453, 2730||Blank Areas||except listed here|
A.2 Full Quiz Page Heatmap
Appendix B Copyrights
Below are the publications reused in this thesis that does not require copyright clearance:
- Using real-time online preprocessed mouse tracking for lower storage and transmission costs" (Purnama and Usagawa, 2020).
- Demonstration on Extending The Pageview Feature to Page Section Based: Towards Identifying Reading Patterns of Users (Purnama, Fungai, and Usagawa, 2016).
Below are the publications reused in this thesis that requires copyright clearance and obtained:
- Rsync and Rdiff implementation on Moodle's backup and restore feature for course synchronization over the network (Purnama, Usagawa, Ijtihadie, et al., 2016).
- Incremental Synchronization Implementation on Survey using Hand Carry Server Raspberry Pi (Purnama, 2017).
- Implementation of real-time online mouse tracking on overseas quiz session" (Purnama et al., 2020b).
SPRINGER NATURE LICENSE
Sep 09, 2020
This Agreement between Mr. Fajar Purnama ("You") and Springer Nature ("Springer Nature") consists of your license details and the terms and conditions provided by Springer Nature and Copyright Clearance Center.
Jun 19, 2020
Licensed Content Publisher
Licensed Content Publication
Education and Information Technologies
Licensed Content Title
Implementation of real-time online mouse tracking on overseas quiz session
Licensed Content Author
Fajar Purnama et al
Licensed Content Date
Mar 6, 2020
Type of Use
academic/university or research institute
print and electronic
Will you be translating?
50000 or greater
Development of a Lossy Online Mouse Tracking Method for Capturing User Interaction with Web Browser Content
Expected presentation date
Mr. Fajar Purnama
Terms and Conditions
Springer Nature Customer Service Centre GmbH
This agreement sets out the terms and conditions of the licence (the Licence) between you and Springer Nature Customer Service Centre GmbH (the Licensor). By clicking 'accept' and completing the transaction for the material (Licensed Material), you also confirm your acceptance of these terms and conditions.
IN NO EVENT SHALL LICENSOR BE LIABLE TO YOU OR ANY OTHER PARTY OR ANY OTHER PERSON OR FOR ANY SPECIAL, CONSEQUENTIAL, INCIDENTAL OR INDIRECT DAMAGES, HOWEVER CAUSED, ARISING OUT OF OR IN CONNECTION WITH THE DOWNLOADING, VIEWING OR USE OF THE MATERIALS REGARDLESS OF THE FORM OF ACTION, WHETHER FOR BREACH OF CONTRACT, BREACH OF WARRANTY, TORT, NEGLIGENCE, INFRINGEMENT OR OTHERWISE (INCLUDING, WITHOUT LIMITATION, DAMAGES BASED ON LOSS OF PROFITS, DATA, FILES, USE, BUSINESS OPPORTUNITY OR CLAIMS OF THIRD PARTIES), AND
WHETHER OR NOT THE PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. THIS LIMITATION SHALL APPLY NOTWITHSTANDING ANY FAILURE OF ESSENTIAL PURPOSE OF ANY LIMITED REMEDY PROVIDED HEREIN.
Appendix 1 — Acknowledgements:
Reprinted by permission from [the Licensor]: [Journal Publisher (e.g. Nature/Springer/Palgrave)] [JOURNAL NAME] [REFERENCE CITATION (Article name, Author(s) Name), [COPYRIGHT] (year of publication)
Reprinted by permission from [the Licensor]: [Journal Publisher (e.g. Nature/Springer/Palgrave)] [JOURNAL NAME] [REFERENCE CITATION (Article name, Author(s) Name), [COPYRIGHT] (year of publication), advance online publication, day month year (doi: 10.1038/sj.[JOURNAL ACRONYM].)
Adapted/Translated by permission from [the Licensor]: [Journal Publisher (e.g. Nature/Springer/Palgrave)] [JOURNAL NAME] [REFERENCE CITATION (Article name, Author(s) Name), [COPYRIGHT] (year of publication)
Reprinted by permission from The [the Licensor]: on behalf of Cancer Research UK: [Journal Publisher (e.g. Nature/Springer/Palgrave)] [JOURNAL NAME] [REFERENCE CITATION (Article name, Author(s) Name), [COPYRIGHT] (year of publication), advance online publication, day month year (doi: 10.1038/sj.[JOURNAL ACRONYM])
Reprinted/adapted by permission from [the Licensor]: [Book Publisher (e.g. Palgrave Macmillan, Springer etc) [Book Title] by [Book author(s)] [COPYRIGHT] (year of publication)
- Original author's name: Fajar Purnama, Tsuyoshi Usagawa
- Document title: Incremental Synchronization Implementation on Survey using Hand Carry Server Raspberry Pi
- Book or journal title: Technical Report, vol. 117, no. 65, ET2017-4, pp. 21-24, year 2017, month 5.
- Portion: Figure 5
Permission No.: 20GB0052
IEICE hereby grant permission for the use of the material requested above on condition that their requirements are as follows:
- Indication of source (e.g., author's name, document title, name of journal, volume/issue/page number, publication date, etc.)
- Indication of copyright (e.g. "Copyright (c)2016 IEICE")
- Azzuhri, Abdul Adhim et al. (2018). “A Creative, Innovative, and Solutive Transportation for Indonesia with Its Setbacks and How to Tackle Them: A Case Study of the Phenomenal GOJEK”. In: Review of Integrative Business and Economics Research 7, pp. 59–67. URL: http://buscompress.com/uploads/3/4/9/8/34980536/riber_7-s1_sp_h17-051_59-67.pdf
- Bluehost (2016). Web Analytics for Beginners - Presented by Bluehost. Youtube. URL: https://youtu.be/PnVZ7_OA7Qo.
- Dentzel, Zaryn (2013). “How the internet has changed everyday life”. In: BBVA OpenMind:" Ch@nge. URL: https://www.bbvaopenmind.com/en/articles/internet-changed-everyday-life/
- Hsu, Ting-Chia, Shao-Chen Chang, and Nan-Cen Liu (2018). “Peer Assessment of Webpage Design: Behavioral Sequential Analysis Based on Eye Tracking Evidence”. In: Journal of Educational Technology & Society 21.2, pp. 305–321. URL: www.jstor.org/stable/26388409
- Huang, Jeff, Ryen W White, and Susan Dumais (2011). “No clicks, no problem: using cursor movements to understand and improve search”. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, pp. 1225–1234. DOI: 10.1145/1978942.1979125.
- IEICE (2015). IEICE Provisions on Copyright. IEICE. URL: https://www.ieice.org/eng/copyright/files/copyright.pdf.
- Leiva, Luis A and Jeff Huang (2015). “Building a better mousetrap: Compressing mouse cursor activity for web analytics”. In: Information Processing & Management 51.2, pp. 114–129. DOI: 10.1016/j.ipm.2014.10.005.
- Linawati, Linawati, NMAE Dewi Wirastuti, and Gede Sukadarmika (2017). “Survey on LMS Moodle for Adaptive Online Learning Design”. In: Journal of Electrical, Electronics and Informatics 1.1, pp. 11–16. DOI: 10.24843/JEEI.2017.v01.i01.p03.
- Martín-Albo, Daniel et al. (2016). “Strokes of insight: User intent detection and kinematic compression of mouse cursor trails”. In: Information Processing & Management 52.6, pp. 989–1003. DOI: 10.1016/j.ipm.2016.04.005.
- Ookla (2020). Speedtest Global Index. Ookla, LLC. URL: https://www.speedtest.net/global-index.
- Paturusi, Sary DE, Yoshifumi Chisaki, and Tsuyoshi Usagawa (2012). “Development and Evaluation of the Blended Learning Courses at Sam Ratulangi University in Indonesia.” In: International Journal of e-Education, e-Business, e-Management and e-Learning 2.3, p. 242. URL: http://www.ijeeee.org/Papers/118-CZ02027.pdf
- Pixley, Tom et al. (2000). “Document object model (DOM) level 2 events specification”. In: W3C recommendation, November. URL: https://www.w3.org/TR/2000/REC-DOM-Level-2-Events-20001113/
- PrivacyPolicies.com (2020). Privacy Policies are required by law. PrivacyPolicies.com. URL: https://www.privacypolicies.com.
- Purnama, Fajar (2017). “Portable and Synchronized Distributed Learning Management System in Severe Networked Regions”. MA thesis. Japan: Kumamoto University. URL: https://0fajarpurnama0.github.io/masters/2020/09/05/master-thesis-fajar-purnama
- Purnama, Fajar (2019). 0fajarpurnama0/Real-Time-Online-Mouse-Tracking-Animation. Github. DOI: 10.5281/zenodo.2589338. URL: https://github.com/0fajarpurnama0/RealTime-Online-Mouse-Tracking-Animation.
- Purnama, Fajar, Alvin Fungai, and Tsuyoshi Usagawa (2016). “Demonstration on Extending The Pageview Feature to Page Section Based: Towards Identifying Reading Patterns of Users”. In: 7th International Conference on Science and Engineering. Yangon Technological University, pp. 304–307. URL: https://0fajarpurnama0.github.io/masters/2020/05/25/extending-the-pageview-feature-to-page-section-based
- PURNAMA, Fajar and Tsuyoshi USAGAWA (2017). “Incremental Synchronization Implementation on Survey using Hand Carry Server Raspberry Pi”. In: IEICE technical report 117.65, pp. 21–24. URL: https://0fajarpurnama0.github.io/masters/2020/05/29/rsync-rdiff-moodle-backup-restore
- Purnama, Fajar and Tsuyoshi Usagawa (2020). “Using real-time online preprocessed mouse tracking for lower storage and transmission costs”. In: Journal of Big Data 7, pp. 1–22. DOI: 10.1186/s40537-020-00304-x.
- Purnama, Fajar, Tsuyoshi Usagawa, Royyana M Ijtihadie, et al. (2016). “Rsync and Rdiff implementation on Moodle’s backup and restore feature for course synchronization over the network”. In: 2016 IEEE Region 10 Symposium (TENSYMP). IEEE, pp. 24–29. DOI: 10.1109/TENCONSpring.2016.7519372.
- Purnama, Fajar et al. (2016). “Introductory Work on Section Based Page View of Web Contents: Towards The Idea of How a Page is Viewed”. In: 11th International Student Conference on Advanced Science and Technology (ICAST). Kumamoto University, pp. 9–11. URL: https://0fajarpurnama0.github.io/masters/2020/05/24/introductory-section-based-page-view
- Purnama, Fajar et al. (2017). “Hand Carry Data Collecting Through Questionnaire and Quiz Alike Using Mini-computer Raspberry Pi”. In: Proceedings of the International Mobile Learning Festival 2017: Mobile Learning, Emerging Learning Design & Learning 2.0, pp. 18–32. URL: https://0fajarpurnama0.github.io/masters/2020/05/30/hand-carry-server-survey
- Purnama, Fajar et al. (2020a). Data for: Implementation of Real-Time Online Mouse Tracking Case Study in a Small Online Quiz. Mendeley Data, v3. DOI: https://doi.org/10.17632/vznyfcx9xk.4.
- Purnama, Fajar et al (2020b). “Implementation of Real-Time Online Mouse Tracking on Overseas Quiz Session From Server Administrator Point of View”. In: Education and Information Technologies (forthcoming), p. 36. DOI: 10.1007/s10639-020-10141-3.
- Rheem, Hansol, Vipin Verma, and D Vaughn Becker (2018). “Use of Mouse-tracking Method to Measure Cognitive Load”. In: Proceedings of the Human Factors and Ergonomics Society Annual Meeting. Vol. 62. SAGE Publications Sage CA: Los Angeles, CA, pp. 1982–1986. DOI: 10.1177/1541931218621449.
- Rodrigues, Manuel et al. (2013). “Keystrokes and clicks: Measuring stress on e-learning students”. In: Management Intelligent Systems. Springer, pp. 119–126. DOI: 10.1007/978-3-319-00569-0_15.
- u, u-double (2020). SPEED-BATTLE statistics and browser comparison: Windows NT 10.0. u-double-u. URL: http://www.speed-battle.com/statistics_e.php.
- UNESCO (2020). COVID-19 Educational Disruption and Response. UNESCO. URL: https://en.unesco.org/covid19/educationresponse.
- Wood, Lauren et al. (1998). “Document object model (dom) level 1 specification”. In: W3C recommendation 1. URL: https://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/
- Written by Fajar Purnama
- Category: Uncategorised
- Hits: 18
|Number||Coin||Price USD||Holding||Current USD||Initial USD||Profit or Loss||USD Profit Taken|
Page 1 of 14