Skip to content

Conversation

@saracreates
Copy link

Add on-the-fly full simulation tagging to FCCAnalyses master.

@saracreates
Copy link
Author

Next steps are not straight forward.
We need to retrieve the input variables for the tagger. This code has been written in FCCAnalyses to match fast sim syntax.

Unfortunately, we can not simply change the input collection names but have some mismatches.

E.g. the retrieval of pfcand_dxy:

  • in fast sim, we have this function which does something like <edm4hep::TrackState> mytrackstate = trackstates.at(recoparticledata.track_begin)
  • in full sim on the other hand, we would need to do something like <edm4hep::TrackState> mytrackstate = trackstates.at(tracks.trackStates_begin with tracks being the SiTracks_Refitted collection

Unfortunately, this issue repeats over all the code for flavor-tagging input variables retrieval.

The code in FCCAnalyses also repeats itself quite often and could be shortened. Maybe then, it would be easier to change to full simulation.

Maybe, there is an other work-around I don't see right now.

@saracreates
Copy link
Author

I have managed to make it run. For an example, use e.g. the histmaker of the the Hgamma analysis here.

# clone this branch of FCCAnlayses 
source ./setup.sh
fccanalysis build -j 8;

fccanalysis run ../MyFCCAnalyses/ana_ZHgamma/test/histmaker_flavor.py

You will see that it runs though but if I look at the b-jet scores of gammabb (b_tags_sum;1):
image

They should beek at 2 and they don't...

Next steps to debug:

  • Did I accidentally modify any variable retrieval? I only wanted to change the software part...
  • Is there any difference between the k4MLJetTagger variable retrieval (see code here ) and the retrieval here? (There should not. Apart from setting some dummy values (SIP sig) from -9 to -200 but I already changed this (double check?) )

General remark: to have an extra retrieval of the input variables here in FCCAnalyses will make the tagger prone to errors because if the retrievals do not match then the performance is worse. But as far as I am aware there is no workaround?

@saracreates
Copy link
Author

I've implemented the same PID retrieval as used in CLDConfig and implemented the usage of the reco PV.

Unfortunetly, the jet clustering defintion in FCCAnalyses and key4hep/CLDConfig (which the tagger was trained on) differ a lot, see this CLDConfig issue.

Next possible step:

One could use data with jets already clustered in CLDConfig and use the inference in FCCAnalyses to check if the inference is correctly implemented in FCCAnlayses.

To have both, jet clustering and tagging, implemented on-the-fly in FCCAnlayses in full simulation, one must first agree on a common jet clustering definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant