- 
                Notifications
    You must be signed in to change notification settings 
- Fork 311
ENH: Using msgpack instead of json #1819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Draft
      
      
            waldbauer-certat
  wants to merge
  45
  commits into
  develop
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
waldbauer/msgpack-poc
  
      
      
   
  
    
  
  
  
 
  
      
    base: develop
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
                
     Draft
            
            
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    23cd283    to
    9bce822      
    Compare
  
    5c6bdd5    to
    9ab334e      
    Compare
  
    
              
                    ghost
  
              
              reviewed
              
                  
                    Apr 1, 2021 
                  
              
              
            
            
              
                    ghost
  
              
              reviewed
              
                  
                    Apr 1, 2021 
                  
              
              
            
            
              
                    ghost
  
              
              reviewed
              
                  
                    Apr 1, 2021 
                  
              
              
            
            
              
                    ghost
  
              
              reviewed
              
                  
                    Apr 1, 2021 
                  
              
              
            
            
              
                    ghost
  
              
              reviewed
              
                  
                    Apr 1, 2021 
                  
              
              
            
            
              
                    ghost
  
              
              reviewed
              
                  
                    Apr 1, 2021 
                  
              
              
            
            
              
                    ghost
  
              
              reviewed
              
                  
                    Apr 1, 2021 
                  
              
              
            
            
6d9e656    to
    40e4ae1      
    Compare
  
    5215a89    to
    d32bdb8      
    Compare
  
    Signed-off-by: Sebastian Waldbauer <[email protected]>
This commit adds license information to a lot of files and adds a .reuse/dep5 file that lists the license information for some folders The commit also changes the main license in setup.cfg from AGPL-3.0-only to AGPL-3.0-or-later because only one file has the AGPL-3.0-only file as license and multiple files have the AGPL-3.0-or-later in the license header. It also removes the cef_logo.png file, as there is no information about the licese anywhere to be found. It is now included directly from the website of the european union. Closes #1633
and add legacy tag to shadowserver caida config
and add legacy tag to darknet config
and add legacy tag to the configs it replaces and update changelog and documentation accordingly
fix mapping use compromised type if the data indicates an active webshell plus add testcases add changelog update bots documentation
enhance mappings add 4/6 agnostic mapping for `Sinkhole-Events` as well document feeds with IPv4 and IPv6 better and shorter
This commit adds a license header or a license file to most of the files, or documents the license in the .reuse/dep5 license file. Some of the process was automated, first by listing all the files that are not reuse lint compliant: > reuse lint > ../reuse.lst This list was then modified to remove metainformation and only list filenames. Also a couple of filenames that need manual modification were removed. Then using git and reuse: > for file in `cat ../reuse.lst`; do year=`git log --reverse --pretty="format:%ai" $file | head -1 | cut -d "-" -f 1`; author=`git log --reverse --pretty="format:%an" $file|head -1`; reuse addheader --copyright="$author" --year="$year" --license="AGPL-3.0-or-later" --skip-unrecognised $file; done Then the same process was repeated for files reuse does not recognize, like csv and json files or REQUIREMENTS.txt files.
match with RSIT in the taxonomy intrusions: compromised -> system-compromise unauthorized-command -> system-compromise unauthorized-login -> system-compromise adapt bots depending on the name add changelog and news entries, including SQL update statements
merged into information-content-security > unauthorised-information-modification adapt bots depending on the name add changelog and news entries, including SQL update statements
was renamed and marked as deprecated in 2.0.0.beta1 #1404
Compatibility with the deprecated configuration format (before 1.0.0.dev7) was removed. #1404
The deprecated shell scripts - `update-asn-data` - `update-geoip-data` - `update-tor-nodes` - `update-rfiprisk-data` have been removed in favor of the built-in update-mechanisms (see the bots' documentation). A crontab file for calling all new update command can be found in `contrib/cron-jobs/intelmq-update-database`. #1404
add two n6 images directly to the repository, as they are not displayed on readthedocs otherwise: The other websites hosting the images block loading images if the referer does not match a whitelist. we can't add a noreferer HTML attribute in rst as well. the option left is to add the files, that only implies adding the licensing information and the AGPL-3.0 license text as well. add two illustrations on the the flow n6 to intelmq and vice versa, own work. some textual improvements in the document itself.
The Aggregate Expert might be used to aggregate events within a given timespan and threshold. Signed-off-by: Sebastian Waldbauer <[email protected]>
Using msgpack instead of json results in faster (de)serialize and less memory usage. Redis is also capable of msgpack within its lua api i.e. https://github.com/kengonakajima/lua-msgpack-native. ====== Benchmark ======= JSON median size: 387 MSGPACK median size: 329 ------------------------ Diff: 16.20% JSON * Serialize: 39286 * Deserialize: 30713 MSGPACK * Serialize: 23483 * Deserialize: 12602 --------------------- DIFF * Serialize: 50.35% * Deserialize: 83.62% Data extracted from spamhaus-collector Measurements based on deduplicator-expert 460 events in total process by deducplicator-expert Signed-off-by: Sebastian Waldbauer <[email protected]>
Signed-off-by: Sebastian Waldbauer <[email protected]>
202fb1b    to
    1253c3e      
    Compare
  
    
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
NOTE: This is a proof of concept. Being heavily tested!
Introduction
Msgpack ( MessagePack ) is a (de)serialization format, which is similar to json, but more optimized for m2m ( Machine-to-Machine ) communication. For sure there are better protocols like protobuf, flatbuffers, capnproto, SBE and so on, but this doenst fit in intelmq very well. Msgpack uses a key-value pattern ( like in json ), so there wont be any major change. The real "magic" happens how the data is being stored, JSON is very human-readable due to its serialization, but msgpack packs data into binary format, which results in smaller size & faster processing - see the benchmark below.
If you want to know some specs, check it out here.
Msgpack itself is available for multiple languages like golang, python, javascript, php and so on.
In addition, Redis - our internal message queue - is also capable of using msgpack within its lua api.
Whats the goal?
Benchmark
For the benchmark, data was extracted from spamhaus-drop-collector, parsed by spamhaus-drop-parser and measured in deduplicator-expert. 460 events were processed in total.
I've tested the bots above, they worked fine with that change, it might break other bots ( which I havent tested yet )
Serialize
Deserialize
To sum up, changing from json to msgpack will result in a faster (de)serialization and a lower memory footprint.