# Step 3: Create data generator functionality

The following are several functions used to generate data. The utility function, `marketcap_to_DECIMAL`, is used by the `seed_trades` stored procedure to convert a string value for the market capitalization of a stock into a decimal value.

`DELIMITER //CREATE OR REPLACE FUNCTION marketcap_to_DECIMAL(s VARCHAR(15))  RETURNS DECIMAL(18,2) ASDECLARE  m CHAR(1) = SUBSTR(s, LENGTH(s), 1); -- M or B  raw_v DECIMAL(18,2) = SUBSTR(s, 2, LENGTH(s) - 1);  v DECIMAL(18,2) = NULL;BEGIN  IF m = "B" THEN    v = raw_v * 1000;  ELSE    v = raw_v;  END IF;  RETURN v;END //DELIMITER ;`

The `seed_trades` stored procedure generates rows of seed trade data based on an integer input value. This provides a query to rank the companies by market capitalization limited to the top 200.

The stored procedure generates approximately the same number of trades for each company.

`DELIMITER //CREATE OR REPLACE PROCEDURE seed_trades(num_trades INT) RETURNS INT ASDECLARE  ranked_companies ARRAY(RECORD(symbol CHAR(5), _rank INT));DECLARE  q QUERY(symbol CHAR(5), _rank INT) =    SELECT symbol, rank() OVER (ORDER BY marketcap_to_DECIMAL(market_cap)) AS _rank    FROM company    WHERE LENGTH(symbol) < 5    ORDER BY _rank DESC LIMIT 200;  i INT = 0;  rank_num INT;  next_id INT = 1;  sym CHAR(5);  price_base DECIMAL(18,4);  current_prices ARRAY(INT);  l ARRAY(RECORD(symbol CHAR(5), _rank INT));BEGIN  l = collect(q);  FOR r IN l LOOP    i += 1;    rank_num = r._rank;    sym = r.symbol;    price_base = FLOOR(rand() * 50) + 50;    FOR j IN 1..((rank_num / 10) + RAND() * 10) LOOP      INSERT trade VALUES(        next_id,        sym,        FLOOR(1 + RAND() * 10) * 100, -- shares        price_base,      -- share_price        DATE_ADD(NOW(), INTERVAL RAND() * 6 HOUR)); -- random time during trading day, roughly      next_id += 1;      IF next_id > num_trades THEN RETURN(next_id); END IF;    END LOOP;  END LOOP;  RETURN(next_id);END //DELIMITER ;`

This last stored procedure, `iter_stocks`, generates additional trade events using a random walk to determine the next price for each company. It will store the current price for each company, then add a random number between -1 and 1 to set the new price for each iteration.

Each iteration inserts new records equal to `num_shares`. When `iter_stocks` completes, the total number of records in the `trade` table will be equal to `num_shares` * `iterations` + `num_shares`.

`DELIMITER //CREATE OR REPLACE PROCEDURE iter_stocks(iterations INT) asDECLARE    tickers ARRAY(CHAR(5));    prices ARRAY(DECIMAL(18,4));    last_ids ARRAY(bigINT);    counts ARRAY(INT);    next_id bigINT = 1;    ticker CHAR(5);    price DECIMAL(18,4);    c INT;    rand DECIMAL(18,4);    tickers_q QUERY(t CHAR(5), p DECIMAL(18,4), lid BIGINT, c INT) = SELECT stock_symbol, share_price, MIN(id), COUNT(*) FROM trade GROUP BY stock_symbol;    q ARRAY(RECORD(t CHAR(5), p DECIMAL(18,4), lid bigINT, c INT));    q_count QUERY(c INT) = SELECT COUNT(*) FROM trade;    total_c INT;BEGIN    q = COLLECT(tickers_q);    tickers = CREATE_ARRAY(LENGTH(q));    prices = CREATE_ARRAY(LENGTH(q));    last_ids = CREATE_ARRAY(LENGTH(q));    counts = CREATE_ARRAY(LENGTH(q));    total_c = SCALAR(q_count);    FOR r IN 0..LENGTH(q)-1 LOOP        tickers[r] = q[r].t;        prices[r] = q[r].p;        last_ids[r] = q[r].lid;        counts[r] = q[r].c;    END LOOP;    FOR j IN 0..(iterations-1) LOOP        FOR i IN 0..LENGTH(tickers)-1 LOOP            ticker = tickers[i];            price = prices[i];            next_id = last_ids[i];            c = counts[i];            rand = POW(-1, FLOOR(RAND()*2)) * RAND();            INSERT INTO trade            SELECT id + total_c, stock_symbol, shares, share_price + rand, trade_time FROM trade WHERE stock_symbol = ticker AND id >= next_id;            prices[i] = price + rand;            last_ids[i] = next_id + total_c;        END LOOP;    END LOOP;END //DELIMITER ;`