Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

A Deep dive into Java Agent Implementation of Production Traffic Replication

Posted on Oct 9 Production traffic replication is a technique used to record traffic in production environment, and replaying it in another. It is also considered the best solution for performing regression testing. Depending on the location where the Recording takes place, all the recording tools can be categorized as: web server-based recording, application layer-based recording, and network protocol stack-based recording.HTTP request recording based on web server refers to the process of recording and replicating the requests and responses between the web server and client. Advantages: Supports a diverse range of request formats and protocols.Disadvantages: High maintenance cost, consumes a significant amount of online resources.Directly monitoring to network ports and recording by duplicating data packets.Advantages: Low impact on the applicationDisadvantages: More low-level implementation, higher cost. In addition, it cannot be used to test non-idempotent interfaces, because traffic replay can result in the generation of invalid or dirty data, potentially impacting the correctness of the business operations.Representative tools: goReplay, tcpCopy, tcpReplaySince Spring Boot is widely used as a Java backend framework, we can leverage AOP (Aspect-Oriented Programming) to intercept the Controller layer and record traffic.Advantages: Non-intrusive to the code, relatively quick and simple implementation. In addition to validating the returned responses from server, it also supports validation of data written to the service, validation of database, message queue, Redis data, and even validation of runtime in-memory data.Disadvantages: May consume some online resources, potential impact on businessRepresentative tools: ngx_http_mirror_module, Java sandbox, AREXAREX is an automated regression testing platform based on traffic recording and playback. It utilizes Java Agent and bytecode enhancement technology to capture the entry points of real request chains and the request and response data of their 3rd-party dependencies in the production environment. Then, in the testing environment, it simulates to replay these requests and verifies the correctness of the entire invocation chain logic one by one.AOP (Aspect-Oriented Programming) is a programming paradigm that provides a way to modularize cross-cutting concerns, such as logging, security, and transaction management. It achieves this by separating these concerns from the core business logic, allowing you to add extra functionality to existing modules without directly modifying their code.Java Agent is a mechanism in Java that allows you to dynamically modify the bytecode of classes at runtime. It provides a way to instrument and manipulate the behavior of Java applications, including adding aspects or intercepting method calls.In the context of AOP, a Java Agent can be used to apply AOP principles by intercepting method calls and weaving in additional behavior. It provides the necessary infrastructure to implement AOP in Java applications.As shown in the figure below, a request typically has a chain of calls consisting of an entry point and dependencies that are either synchronous or asynchronous.The recording process is to connect the entry and dependency calls through a RecordId to form a complete test case. AREX-Agent enhances the bytecode of the entry and dependency calls, intercepts the call process when the code is executed, and records the entry parameter, return value, and exceptions of the call, and sends them to the storage service.During playback in the test environment, the Arex Agent simulates requests using the real data recorded in the production environment. The AREX Agent determines whether playback is required by identifying the playback flag. If playback is required, the actual method invocation is not performed. Instead, the stored response data from the storage service is retrieved and returned as the response.Taking the above function as an example:At the beginning of the function, a decision is made whether to playback or not. If playback is necessary, the collected data is utilized as the return result, which is commonly referred to as Mocking.At the end of the function, a decision is made whether to record or not. If recording is necessary, the intermediate data that the application needs to store is saved to the AREX database.The process of recording and replay is very complex. Next, we will dive into the challenges and technical details of AREX Agent.To ensure the AREX Agent code and dependencies do not have a conflict with the application code, the AREX Agent and application code are isolated by different class loaders. As shown in the figure below, AREX Agent overrides the findClass method by customizing AgentClassLoader to ensure that the classes used by AREX Agent will only be loaded by AgentClassLoader, so as to avoid conflicts with the application ClassLoader.Meanwhile, in order to let the application ClassLoader recognize the recording and playback code of AREX Agent, AREX Agent injects the byte code needed for recording and playback into the application ClassLoader through the ByteBuddy ClassInjector to make sure there is no ClassNotFoundException/NoClassDefFoundError during recording and replay.When the data is recorded and replayed, the entry point of a request and the calls of each dependency will be linked together by a RecordId. When dealing with multi-threading and various asynchronous frameworks, there are significant challenges in maintaining data continuity. AREX Agent addresses the issue of passing RecordId across threads by enhancing the thread behavior, ensuring seamless transfer of RecordId between threads. The supported threads and thread pools are as follows:Here's a simple example for better understanding of implementation.When invoking java.util.concurrent.ThreadPoolExecutor#execute(Runnable runnable), the parameter AgentRunnableWrapper is used to wrap the AgentRunnableWrapper runnable. When constructing AgentRunnableWrapper, the current thread context is captured. In the run method, the subthread context is replaced, and after execution, the child thread context is restored. Here is an example code snippet:The components introduced by an application may have multiple versions, and different versions of the same component may be incompatible, such as changes in package, addition or removal of methods, etc. In order to support multiple versions of components, AREX Agent needs to identify the correct component version for bytecode enhancement, to avoid duplicate enhancement or enhancement of the wrong version.AREX Agent identifies the Name and Version in the META-INF/MANIFEST.MF of the component JAR file, and matches the version during class loading to ensure that the correct version is used for bytecode enhancement.As shown above, during recording, the code first attempts to retrieve the value associated with the given key from the local cache (localCache.get(key)). If the value is not null, it means that the corresponding data is available in the cache during recording, and it is directly returned.However, during playback, the cache is not available. Therefore, if the value retrieved from the cache is null, it means that the data is not present in the cache during playback. In this case, the code needs to query the database (db.query()) to retrieve the data and return it as the result. In a word, the execution flow of the replay request is often different from the recording due to inconsistent local cache data with the recording, resulting a low pass rate of replay testing. There are a few challenges to solve this problem:Currently, the solution adopted by AREX Agent is to only record the cache data used in the current request chain. This is achieved by allowing the application to configure dynamic classes to identify the recording, and during playback in the test environment, the cache data is automatically replaced to ensure consistency between the recorded and playback memory data. We are still researching the solution of recording large cache data. Many business systems are time-sensitive, where accessing them at different times can result in different outcomes. If the recording and playback times are inconsistent, it can lead to playback failures. Additionally, modifying the machine time on the test server is not suitable as playback requests are concurrent, and many servers do not allow modification of the current time. Therefore, we need to implement mocking of the current time at the code level to address this issue.The currently supported time types are as follows:public static native long currentTimeMillis() is an intrinsic function. When the JVM performs inline optimization on intrinsic functions, it replaces the existing bytecode with internal code (JIT), which causes the enhanced code by AREX Agent to become ineffective. The JDK performs inline operations on System.currentTimeMillis() and System.nanoTime() as follows:AREX Agent has taken special care of this issue by replacing the code that uses the method System.currentTimeMillis() with AREX Agent's method of obtaining the time directly through the application configuration, avoiding inline optimizations.Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well Confirm For further actions, you may consider blocking this person and/or reporting abuse LambdaTest Team - Sep 15 Artem Sapegin - Sep 14 Santosh Viswanatham - Sep 26 guto - Sep 18 Once suspended, arextest will not be able to comment or publish posts until their suspension is removed. Once unsuspended, arextest will be able to comment and publish posts again. Once unpublished, all posts by arextest will become hidden and only accessible to themselves. If arextest is not suspended, they can still re-publish their posts from their dashboard. Note: Once unpublished, this post will become invisible to the public and only accessible to AREX Test. They can still re-publish the post if they are not suspended. Thanks for keeping DEV Community safe. Here is what you can do to flag arextest: arextest consistently posts content that violates DEV Community's code of conduct because it is harassing, offensive or spammy. Unflagging arextest will restore default visibility to their posts. DEV Community — A constructive and inclusive social network for software developers. With you every step of your journey. Built on Forem — the open source software that powers DEV and other inclusive communities.Made with love and Ruby on Rails. DEV Community © 2016 - 2023. We're a place where coders share, stay up-to-date and grow their careers.



This post first appeared on VedVyas Articles, please read the originial post: here

Share the post

A Deep dive into Java Agent Implementation of Production Traffic Replication

×

Subscribe to Vedvyas Articles

Get updates delivered right to your inbox!

Thank you for your subscription

×