Data & Format

Free Mock Data Generator Online

Generate fake names, emails, addresses, phone numbers, dates and more in JSON or CSV.

Mock Data Generation: Why It Matters and How to Use It

Realistic test data is one of the most undervalued assets in software development. Developers frequently test with trivial datasets — one user named "Test User" with email "test@test.com" — that fail to surface real bugs. Production data is too sensitive to use in development, and developers rarely have time to hand-craft representative datasets. Mock data generators fill this gap: they produce plausible, diverse, and privacy-safe data that exercises code paths which trivial test data misses.

Why Realistic Mock Data Matters

Several categories of bugs only appear with realistic data:

Encoding and character set bugs: A name field that always contains "Alice Smith" will never reveal that your app crashes on "José García" or "王小明". Realistic names from diverse cultures surface encoding bugs in UTF-8 handling, database collation, and string truncation.
Layout bugs: Short names never reveal that a 40-character company name overflows a UI card. Realistic variation in string lengths exposes layout issues early.
Edge cases in sorting and pagination: A dataset of 3 users never reveals that your pagination breaks at 100 results, or that sorting by name is case-sensitive when it shouldn't be.
Performance regressions: Loading 10 test records is fast even if your query is O(n²). Load 10 000 mock records and a poorly-indexed query becomes immediately obvious.
Validation bugs: Test data that always matches the happy path never reveals that your email validator rejects valid plus-addressing (user+tag@example.com) or your phone validator fails on international formats.

Generated Field Types Explained

UUID v4: Randomly generated universally unique identifiers in the format xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx. The 4 in position 13 indicates version 4 (random). These are appropriate for use as primary keys in distributed systems where auto-increment integers would cause conflicts between nodes. Generated using the Web Crypto API for cryptographic randomness.

Email addresses: Generated emails use realistic first/last name combinations paired with fictional domains (example.com, test.dev, sample.org). These domains either don't have real mailboxes or are explicitly reserved for documentation purposes, so there is no risk of accidentally emailing real people. Plus-addressing and subaddressing are not included by default to avoid triggering validation edge cases — add them manually if you want to test those paths.

Phone numbers: Generated in a US format (+1-NXX-NXX-XXXX). If your application handles international numbers, you may want to add country codes and vary the format. Common international formats to test: UK (+44 7700 900xxx), German (+49 30 12345678), Australian (+61 4xx xxx xxx).

Dates (ISO 8601): Generated as YYYY-MM-DD strings covering a 10-year range backwards from today. If your application uses Unix timestamps, full ISO 8601 datetime strings, or locale-formatted dates, adjust the format in your seeding script.

IP addresses (IPv4): Generated as dotted-decimal notation in the valid public range. Useful for testing audit logs, access control, geolocation lookups, and rate limiting code. For IPv6 testing, you'll need to generate those separately — the format is significantly different (eight colon-separated groups of four hex digits, with possible double-colon compression for zero runs).

Boolean values: Random 50/50 true/false. If your application logic depends on a certain ratio of active vs. inactive users (for example), generate two separate datasets and combine them, or post-process to set a specific ratio.

Generating Mock Data Programmatically

For larger datasets or CI integration, programmatic generation is more practical than a browser tool. The leading library in the JavaScript/TypeScript ecosystem is Faker.js (formerly faker.js, now @faker-js/faker on npm):

import { faker } from '@faker-js/faker';

function generateUser() {
  return {
    id:        faker.string.uuid(),
    firstName: faker.person.firstName(),
    lastName:  faker.person.lastName(),
    email:     faker.internet.email(),
    phone:     faker.phone.number(),
    company:   faker.company.name(),
    jobTitle:  faker.person.jobTitle(),
    city:      faker.location.city(),
    country:   faker.location.country(),
    createdAt: faker.date.past({ years: 2 }).toISOString(),
  };
}

// Generate 1000 users
const users = faker.helpers.multiple(generateUser, { count: 1000 });
console.log(JSON.stringify(users, null, 2));

Faker.js supports seeded generation for reproducible datasets:

faker.seed(42); // same seed → same data every run
const user = generateUser(); // deterministic output

For Python projects, the equivalent library is Faker:

from faker import Faker
import json

fake = Faker()
Faker.seed(42)  # reproducible

users = [
    {
        "id":         str(fake.uuid4()),
        "first_name": fake.first_name(),
        "last_name":  fake.last_name(),
        "email":      fake.email(),
        "phone":      fake.phone_number(),
        "company":    fake.company(),
        "city":       fake.city(),
        "country":    fake.country(),
        "created_at": fake.date_time_this_decade().isoformat(),
    }
    for _ in range(1000)
]
print(json.dumps(users, indent=2))

Database Seeding Strategies

Seed scripts in your repository: Keep a scripts/seed.ts (or seed.py) file that populates a development database with representative data. Run it as part of local setup (npm run db:seed). This ensures every developer starts with the same baseline data.

Fixtures: Store small, curated datasets as JSON or CSV files in test/fixtures/. These are committed to version control and loaded by tests. Unlike randomly generated data, fixtures are deterministic — the same inputs produce the same test results. Use fixtures for unit and integration tests; use generated data for load testing and exploratory testing.

Factory functions: Rather than seeding a database upfront, define factory functions that create records on demand during tests:

// TypeScript / Prisma example
import { faker } from '@faker-js/faker';
import { prisma } from '@/lib/db';

export async function createUser(overrides = {}) {
  return prisma.user.create({
    data: {
      email:    faker.internet.email(),
      name:     faker.person.fullName(),
      password: faker.internet.password({ length: 16 }),
      ...overrides, // allow test-specific values
    },
  });
}

// In a test:
const user = await createUser({ email: 'specific@example.com' });
const admin = await createUser({ role: 'admin' });

Factory functions are more flexible than static fixtures — each test creates exactly the data it needs, with only the relevant fields overridden. Libraries like fishery and factory-bot formalise this pattern.

Testing With Production-Like Data Volumes

Performance bugs rarely surface at development data volumes. A database query that returns 10 rows in 5ms might take 30 seconds when the table has 10 million rows and the index is missing or the query plan degrades.

Generating large datasets efficiently:

-- PostgreSQL: generate_series for bulk inserts
INSERT INTO users (id, name, email, created_at)
SELECT
  gen_random_uuid(),
  'User ' || i,
  'user' || i || '@example.com',
  NOW() - (random() * INTERVAL '2 years')
FROM generate_series(1, 1000000) AS s(i);

# Python: batch inserts for better performance
from faker import Faker
import psycopg2

fake = Faker()
conn = psycopg2.connect("postgresql://localhost/mydb")
cur = conn.cursor()

BATCH_SIZE = 1000
for batch in range(1000):  # 1 million rows total
    data = [
        (fake.name(), fake.email(), fake.date_time_this_decade())
        for _ in range(BATCH_SIZE)
    ]
    cur.executemany("INSERT INTO users (name, email, created_at) VALUES (%s, %s, %s)", data)
    conn.commit()
    print(f"Inserted {(batch+1) * BATCH_SIZE} rows")

Privacy and GDPR Considerations

Using production data in development environments is a significant GDPR and privacy risk. Key requirements:

Data minimisation: Development environments should not have access to more personal data than necessary. Mock data satisfies this requirement entirely — no real personal data is processed.
Anonymisation vs. pseudonymisation: If you must use production data (for example, to reproduce a specific bug), anonymise it first. Replace real names and emails with generated values, but preserve the structure and relationships. This is pseudonymisation if the mapping is reversible; anonymisation if it is not.
Data residency: Mock data generated in the browser never leaves the browser — no GDPR concerns. This tool generates all data locally using JavaScript without any server communication.
Staging environments: Production data in staging is a common GDPR gap. Use the same mock data generation approach for staging as for local development. Staging should look like production in volume and structure, not in actual personal data.

Output Formats: JSON vs. CSV

JSON is best when: seeding APIs or databases directly, working with JavaScript/TypeScript code, or importing into document databases (MongoDB, Firestore, DynamoDB). JSON preserves types (numbers stay numbers, booleans stay booleans, null is distinct from empty string).

CSV is best when: importing into SQL databases via COPY/LOAD DATA, opening in Excel or Google Sheets, passing to data analysis pipelines (Pandas, R), or sharing with non-developers. Note that CSV doesn't distinguish between types — everything is a string, and your import process must handle type conversion.

Choosing the right format upfront saves a conversion step. If you're seeding a PostgreSQL table, generate CSV and use COPY for the fastest import. If you're writing a Jest seed script, generate JSON and parse it directly.

FAQ

Common questions

What types of fake data can I generate?

The generator supports: full name, first name, last name, email address, phone number, street address, city, country, postal code, company name, job title, username, UUID (v4), integer, float, boolean, date, URL, IP address (v4), colour (hex), and lorem ipsum paragraph. You can combine any fields in a single dataset.

How realistic is the generated data?

Names, cities, and companies are drawn from large curated lists of real-world values to produce believable test data. Emails are constructed from the generated name + a realistic domain. Phone numbers follow common national formats. The goal is plausible-looking data for UI testing and database seeding — not data that passes real-world validation.

What output formats are supported?

You can export as JSON (array of objects) or CSV. JSON is best for seeding APIs and databases. CSV opens directly in Excel and Google Sheets. Future formats (SQL INSERT statements, TypeScript types) are planned.

Can I generate data with a consistent seed?

The generator uses the browser's Math.random(), which is not seeded. If you need reproducible datasets for testing, generate the data once, download the file, and commit it to your test fixtures. Alternatively, use a seeded library like Faker.js in your test setup for programmatic reproducibility.

How many rows can I generate?

Up to 1 000 rows per generation. For datasets larger than ~500 rows, use the Download button rather than copying from the preview — large text areas can be slow to scroll and copy reliably.

Are generated emails valid for sending?

No. Generated emails like alice.johnson@example.com use domains such as example.com, example.org, and test.dev — domains that are reserved or unlikely to have real mailboxes. They are safe for UI testing, database seeding, and demos without accidentally emailing real people.

Can I use the generated data in my application?

Yes — freely. The generated data is random and not bound by any licence. It is intended for development, testing, demos, database seeding, and UI mockups. Do not use generated data to mislead anyone about the existence of real people or organisations.

Is my data sent anywhere?

No. All generation happens locally in your browser using JavaScript. No data is sent to a server. You can use this tool offline — disconnect from the internet and it works identically.