Saturday, June 27, 2015

C Overview


  • History
    • Dennis Richie, invented it, to write the Unix with it -> 1970
    • Previously Unix was written by assembly -> Bad portability :(
      • Rewrite Unix again for each new processor
    • ANSI (American National Standardization Institute), released the first C standard in 1989 ( C89 )
    • Amendment 1 was added to C89 in 1995
    • C89 with Amendment 1 is the base of C++
    • C89 with Amendment 1 is called the C Subset of C++
    • C99 was released by ANSI in 1999
      • Nearly the same as C89  + some added features (Such as: Variable Length Arrays and strict pointer qualifier)

  • Middle Level Language
    • High level features:
      • C is like the High Level Language in the following features:
        • Portable (easy to adapt the program written for a platform to another platform)
        • Supports DataTypes
          • A datatype defines
            • A set of values that the variable can store
            • A set of operations that can be performed on that variable
      • C is unlike the High Level Languages in the following features:
        • C does not support Run-Time Error Checking, like "array index out of bound"
        • C is a Weakly typed language
          • Implicit casting is allowed
          • Implicit casting happens for arguments that does not match the parameter type.
        • C has few keywords:
          • 32 Keyword in C89
          • 5 more Keywords in C99
          • BASIC (a high level language) defined 100 keywords
    • Low Level Features:
      • Manipulation of Bits, Bytes and Addresses
C key words,  from "The Complete C Reference - 4th Edition"
  • C is a Structured Language
    • The code consists of component blocks (functions - while - if - ...etc )
    • goto usage is either forbidden or discouraged
    • Assembly is not a structured language, cause the code contains jumps and branches -> Spaghetti code!

  • C is a programmer language
    • Unlike BASIC language, which is developed for non-programmers to solve simple problems.

  • Compiler Vs Interpreter
    • Interpreter parses one line of code at a time, execute it.
    • Compiler parses the whole program, generates machine code, that is executed.
    • Interpreter parses the code line each time it is being executed
    • Compiler parses the code lines only once.
    • Java is designed for interpretation.
    • C is designed for compilation (However we can make C interpreters, however will not best utilize C)

Friday, June 26, 2015

ET-STM32F103 ARM Cortex board get started

  • Board ET-STM32F103
  • Based on ARM CORTEX-M3
  • Keil uVision
    • can be used to build the project
    • can be used for debugging, using HW "ULINK".
  • Code Worior
    • I was able to build the project using code worrior
    • I opened the debugger, and tried to debug offline (need more investigation)
  • STMicroelectronics Flash loader: used to flash the board with the hex file
    • Connect the usb-to-serial to UART1
    • Switch To Bootloader (from the boot push button on the board)
    • Press reset (from the reset push button on the board)
    • Open the "STMicroelectronics Flash loader"
      • COM3 (check the port from device manager)
      • Baud 115200
      • 8 Bits
      • Parity: even
      • Echo: disabled
      • Timeout: 5

Friday, June 5, 2015

Compiler

Compilation Steps
Very good tutorial
Very good Video tutorial

  • Pre-processing
    • Process the # Directives
  • Lexical Analysis
    • Strips out white spaces and comments
    • Divide the source code into lexemes, Outputs stream of tokens
    • Generates error on illegal lexemes: example:
      • 1_x = 1 + 2;    --> error: identifier the start with number
  • Syntax Analysis
    • Check the code against grammar, generate errors for wrong grammar.
    • Outputs a parse tree
    • Example of errors:
      • x = 1 + ;       --> can not generate parse tree
      • struct mystruct g,x;
        i = g * x;       --> i is not defined, g and x can not be multiplied, however no error is generated from syntax analyzer.
  • Semantic Analysis
    • Program symbol table is created, debug info is inserted
    • Checks the code for logical 
    • Error Examples:
      • struct mystruct g,x;
        i = g * x;    --> the multiplication operation is illegal for structs, i is not defined
      • warning: variable used before initialization
      • int i = "hi";  --> warning: implicit conversion from pointer to integer
  • Intermediate Code Generation
    • IR, Intermediate code Representation, is a machine code independent representation.
    • This helps to keep the parsing part of the compiler unchanged for different targets, while only the synthesis part is changed from target to target.
    • Output: AST (Abstract Syntax Tree) or Pseudo Code.
  • Machine Independent Code Optimization
    • Loop Unrolling
    • Expanding inline functions
    • Dead code removal
  • Code Generation
    • Conver IR code to machine opcodes
  • Machine Dependent Code Optimization
    • Register Allocation

More Details

  • Lexical Analysis (Tokenizing)
    • Input:
      • Is the c code
      • It consists of lexemes
    • Output
      • Tokens
        • a token is a <Token Class, "lexeme"> pair
    • Ex:
      • if (i == j)
            z = 0;
      • This is read by the parser as follows:
        • if (i == j)\n\tz = 0;
      • Step 1: Devide into lexemes:
        • |if| |(|i| |==| |j|)|\n\t|z| |=| |0|;|
      • Step 2: Identify the "Token Class" of each "lexeme"
        Hence, generate the tokens: <class,"lexeme"> pairs
        • <keyword,"if">
        • <whitespace," ">
        • <PAREN_OPEN,"(">
        • <identifier,"i">
        • <whitespace," ">
        • <operator,"==">
        • <whitespace," ">
        • <identifier,"j">
        • <PAREN_CLOSE,")">
        • <whitespace,"\n\t">
        • <identifier,"z">
        • <whitespace," ">
        • <EQ,"=">
        • <whitespace," ">
        • <integer,"0">
        • <SEMI_COLON,";">
      • Token Classes:
        • Whitespace
          • nonempty sequency of blanks, new lines or tabs
        • Integers
        • Keywords
        • Identifiers
        • Operators
        • Normally each "punctuation" lexemes, has its own class:
          • PAREN_O:           (
          • PAREN_C:           )
          • SEMI_COLON:    ;
          • EQ:                       =
  • Parsing (Syntax Analysis)
    • Input:
      • Sequence of tokens from the lexer
    • Output:
      • Parse Tree
    • Parser does the following:
      • Ensure that the the tokens follow C rules, to avoid syntax errors
      • Generates Parse tree:
      • Intermediate representation:
        • AST: Abstract Syntax Tree 
          • Also a pseudo code can replace AST
          • If the compiler supports different languages on different targets, then this AST or Pseudo code is machine independent
          • AST is like the Parse tree, but removing some unneeded info
    • Example:
      Parse Tree:

      The pic is from Online Courses Compiler Course

      Parse Tree Has some unneeded info:

      The pic is from Online Courses Compiler Course
  • AST (Abstract Syntax Tree):
    Generated as a sort of Intermediate Representation

    The pic is from Online Courses Compiler Course

  • Debug info: (mapping source code to machine code) maps functions, instructions in the mapped binary program, to the source code

        binary instruction => Item name, Item type, file name, line number, ...etc)
  • Symbol table: All identifiers in the source code is related to their memory segments and addresses

Compare signed to unsigned

If you have this code:

main(){
    unsigned int x = 5;
    int y = -3;
    if (y>x)
        printf("y is bigger\n\r");
}

What is the output of this?


  • int = signed int
  • when comparing unsigned to signed, the signed is casted into unsigned
  • in our case
    • x = 00000005
    • y = FFFFFFFD   (MSB = sign bit = 1, the other bits shall be the 2's complement of 3)
    • casting u into unsigned, then comparing, y > x --> true
    • the output: "y is bigger"

What about normal operation?
main(){
int x;
int y = -10;
unsigned int z = 4;
x = y + z;

printf("%d\n\r",x);
}

// -6?
// Also y will be casted to unsigned, then added to z,  the result will be correct, thanks to the 2's complement representation
// y + z = FFFFFFF6 + 00000004 = FFFFFFFA = - 6


Ex:

main(){
short x;
 long x2;
short y = -10;
unsigned short z = 4;
x = y + z;
 x2 = y + z;
 x3 = y + (short)z;

printf("%d, %d, %d\n\r",x, x2, x3);
}

in 16 bit machine:

  • y + z = 0xFFF6 + 0x0004 = 0xFFFA
    • x = 0xFFFA
      • Since x is signed, then x = - 6
    • x2 = 0x0000FFFA
      • Since x2 is signed, then x = + 65530
      • No sign extension happened. because (y + z) is unsigned
    • x3 = 0xFFFFFFFA
      • Since x3 is signed, then x = - 6
      • sign extension happened because (y + z) is signed
in 32 bit machine:
  • y + z = 0xFFFFFFF6 + 0x00000004 = 0xFFFFFFFA
    • x = x2 = x3 = - 6