In SQL Server 2016 there is database called wide-world-importers, where we can simulate data to generate millions of data by calling SQL Server's inbuilt stored procedures.
Is there a way to do the same thing in google-big query? Or is there a way to to simulate data in google-big query as my requirement? Or even is there a place to get sample data sets with millions of records like HR, Finance, Sales, etc for big-query; public data sets in big-query not satisfied my requirement?
just run below as is
in BigQuery Standard SQL
CREATE TEMPORARY FUNCTION gender(num STRING)
RETURNS STRING
LANGUAGE js AS """
var n = parseInt(num);
if (Math.floor(Math.sin(n*n)) == 0) return "f"
return "m";
""";
CREATE TEMPORARY FUNCTION department(num STRING)
RETURNS STRING
LANGUAGE js AS """
var n = parseInt(num);
return Math.floor(Math.sin(n) * 3 + 3)
""";
WITH sequence AS (
SELECT FORMAT("%07d", ROW_NUMBER() OVER()) AS num
FROM UNNEST((SELECT SPLIT(FORMAT("%1000000s", ""),"") AS h FROM (SELECT NULL))) AS pos
)
SELECT
num AS id,
CONCAT("FN ", num) AS first_name,
CONCAT("LN ", num) AS last_name,
gender(num) AS gender,
CONCAT("Department ", department(num)) AS department
FROM sequence
ORDER BY num
be patient as it took few min to run - you can change 1000000 in %1000000s
to lower number (it sets row count in result) to make it fast while playing with it
This is just super simple example of how to generate your own data for free
You can extend this example to whatever specific requirements you have
Endless opportunities here - like controlling distribution over different attributes, having nested data - records / repeated, etc.
You can see Standard SQL Reference if any questions on Syntax or Functions
Also note: I am using here recently introduced Scalar User-Defined Functions
Enjoy and Hope this helped!