Create a Lexical Analyzer

  • Aim

    To design an online lexical analyzer simulator for C programming language that detects token and classifies them into keywords, identifiers, special characters and operators while ignoring tab, new line and redundant spaces.

  • Programming Languages Used

    • C
    • HTML
    • CSS
    • JavaScript
  • Theory

    Given line of code as user input, representing a program snippet, the task is to detect tokens in a C program called the lexical analysis of a compiler. Hence, this simulator is referred to as the lexical analyzer. The lexical analyzer is a part of the compiler that detects tokens of a program and sends it to the syntax analyzer. Token is the smallest logical unit of a program and can be of the following types:

    • Keywords/ Function Names
    • Identifier
    • String Constants
    • Special Characters
    • Operators
  • Program Logic

    • Traverse the input program snippet character by character
    • Tokenization i.e., dividing the program into valid tokens
    • Remove tab and white space characters
    • Remove comments
    • Remove the rest parts of the program that are meant for the understanding of the user and are in no way needed during compilation
    • Place the tokens under each of the below sub-categories:
      • Keywords for the function names
      • Identifiers for the variables
      • String constants for constants with fixed sequence of characters
      • 192 Special characters among 256 defined characters by ASCII
      • Operators for the mathematical operations
  • Input Output Examples

    • Input: float x = a + 1b

    • Output:
      All tokens are:-
      Valid Keyword: float
      Valid Identifier: x
      Valid Operator: =
      Valid Identifier: a
      Valid Operator: +
      Invalid Identifier: 1b

  • C Code

    Go to this Repo to get the code.

  • Online Simulator

    Click on this Link to go to the simulator.

  • Time and Space Complexity

    • Time Complexity: O(n^2)

    • Space Complexity: O(1)

  • Discussion

    We wrote the online simulator using C programming language and integrated it with the website designed using HTML, CSS, JavaScript. The lexical analyzer simulator is designed for C language that detects token and classifies them into keywords, identifiers, string constants, special characters and operators while ignoring tab, new line and redundant spaces.