Multiple screens vs single screen

I am designing an online assessment tool where candidates need to be authenticated before they start taking the assessment. The flow is like:

Start assessment -> proctoring guidelines, system check, seek webcam permission, candidate declaration, take identity image to be used for proctoring --> Assessment screen

Here, taking the identity image is a separate workflow with its own set of instructions and CTA. If the image is not proper, the candidate is asked to take the image again.

Now between the start assessment button and the Assessment screen, should I fit everything in one screen or should I break it down into 2 or more modal views? or is there a better way to do it?