I want to use the output of another module inside an always block. Currently the only way to make this code work is by adding #1 after the pi_in assignment so that enough time has passed to allow Pi to finish.
Relevant part from module pLayer.v:
Pi pi(pi_in,pi_out);
always @(*)
begin
    for(i=0; i<constants.nSBox; i++) begin
        for(j=0; j<8; j++) begin
            x               = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
            pi_in           = 8*i+j;#1; /* wait for pi to finish */
            PermutedBitNo   = pi_out;
            y               = PermutedBitNo>>3;
            tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
        end
    end
    state_out = tmp;
end
Modllue Pi.v
`include "constants.v" 
module Pi(in, out);
input  [31:0]   in;
output [31:0]   out;
reg [31:0] out;
always @* begin
    if (in != constants.nBits-1) begin
        out = (in*constants.nBits/4)%(constants.nBits-1);
    end else begin
        out = constants.nBits-1;
    end
end
endmodule
Delays should not be used in the final implementation, so is there another way without using #1?
In essence i want PermutedBitNo = pi_out to be evaluated only after the Pi module has finished its job with pi_in (=8*i+j) as input. How can i block this line until Pi has finished?
Do i have to use a clock? If that's the case, please give me a hint.
update:
Based on Krouitch suggestions i modified my modules. Here is the updated version:
From pLayer.v:
    Pi pi(.clk (clk),
      .rst (rst),
      .in  (pi_in),
      .out (pi_out));
counter c_i (clk, rst, stp_i, lmt_i, i);
counter c_j (clk, rst, stp_j, lmt_j, j);
always @(posedge clk)
begin
    if (rst) begin
        state_out = 0;
    end else begin
        if (c_j.count == lmt_j) begin
            stp_i = 1;
        end else begin
            stp_i = 0;
        end
        // here, the logic starts
        x               = (state_value[(constants.nSBox-1)-i]>>j) & 1'b1;
        pi_in           = 8*i+j;
        PermutedBitNo   = pi_out;
        y               = PermutedBitNo>>3;
        tmp[(constants.nSBox-1)-y] ^= x<<(PermutedBitNo-8*y);
        // at end
        if (i == lmt_i-1)
            if (j == lmt_j) begin
                state_out = tmp;
            end
    end
end
endmodule
module counter(
  input  wire clk,
  input  wire rst,
  input  wire stp,
  input  wire [32:0] lmt,
  output reg  [32:0] count
);
always@(posedge clk or posedge rst)
    if(rst)
        count <= 0;
    else if (count >= lmt)
        count <= 0;
    else if (stp)
        count <= count + 1;
endmodule
From Pi.v:
always @* begin
    if (rst == 1'b1) begin
        out_comb = 0;
    end
    if (in != constants.nBits-1) begin
        out_comb = (in*constants.nBits/4)%(constants.nBits-1);
    end else begin
        out_comb = constants.nBits-1;
    end
end
always@(posedge clk) begin
    if (rst)
        out <= 0;
    else
        out <= out_comb;
end
				
                        
That's a nice piece of software you have here...
The fact that this language describes hardware is not helping then.
In verilog, what you write will simulate in zero time. it means that your loop on
iandjwill be completely done in zero time too. That is why you see something when you force the loop to wait for 1 time unit with#1.So yes, you have to use a clock.
For your system to work you will have to implement counters for
iandjas I see things.A counter synchronous counter with reset can be written like this:
You specify that you want to sample
pi_outonly whenpi_inis processed. In a digital design it means that you want to wait one clock cycle between the moment when you are sendingpi_inand the moment when you are readingpi_out.The best solution, in my opinion, is to make your
pimodule sequential and then considerpi_outas a register.To do that I would do the following:
Quickly if you use counters for
iandjand this lastpimodule this is what will happen:iandjwill change -->pi_inwill change accordingly at the same time(in simulation)out_combwill be stored inoutand then you will have the new value ofpi_outone clock cycle later thanpi_inEDIT
First of all, when writing (synchronous) processes, I would advise you to deal only with 1 register by process. It will make your code clearer and easier to understand/debug.
Another tip would be to separate combinatorial circuitry from sequential. It will also make you code clearer and understandable.
If I take the example of the counter I wrote previously it would look like :
Here I see why you have one more cycle than expected, it is because you put the combinatorial circuitry that controls your
pimodule in you synchronous process. It means that the following will happen :clkpositive edgeiandjwill be evaluatedpi_inis evaluatedpi_outis capturedSo it makes sense that it takes 2 cycles.
To correct that you should take out of the synchronous process the 'logic' part. As you stated in your commentaries it is logic, so it should not be in the synchronous process.
Hope it helps