Setting up your Dev Environment
In order to contribute to Great Expectations, you will need the following:
- A GitHub account—this is sufficient if you only want to contribute to the documentation. 
- If you want to contribute code, you will also need a working version of Git on your computer. Please refer to the Git setup instructions for your environment. 
- We also recommend going through the SSH key setup process on GitHub for easier authentication. 
Fork and clone the repository
1. Fork the Great Expectations repo
- Go to the Great Expectations repo on GitHub. 
- Click the - Forkbutton in the top right. This will make a copy of the repo in your own GitHub account.
- GitHub will take you to your forked version of the repository. 
2. Clone your fork
- Click the green - Clonebutton and choose the SSH or HTTPS URL depending on your setup.
- Copy the URL and run - git clone <url>in your local terminal.
- This will clone the - developbranch of the great_expectations repo. Please use- develop(not- main!) as the starting point for your work.
- Atlassian has a nice tutorial for developing on a fork. 
3. Add the upstream remote
- On your local machine, cd into the great_expectations repo you cloned in the previous step. 
- Run: - git remote add upstream git@github.com:great-expectations/great_expectations.git
- This sets up a remote called - upstreamto track changes to the main branch.
4. Create a feature branch to start working on your changes.
- Example: - git checkout -b feature/my-feature-name
- We do not currently follow a strict naming convention for branches. Please pick something clear and self-explanatory, so that it will be easy for others to get the gist of your work. 
Install Python dependencies
(Easy version of steps 5-7 below for Mac/Linux users)
Create a virtual environment in your locally cloned repo, use the same version of pip that we use in our CI/CD pipelines (for Python 3.7 - 3.10), and install the fewest dependencies needed for a dev environment (to minimize potential setup headaches).
python3 -m venv ge_dev
source ge_dev/bin/activate
pip install --upgrade pip==21.3.1
pip install -c constraints-dev.txt -e ".[test]"
Note: You may specify other "extras" in the square brackets next to "test" if you separate with a comma (i.e.
-e ".[test,postgresql,trino]")Allowed extras currently include:
arrow,athena,aws_secrets,azure,azure_secrets,bigquery,dev,dremio,excel,gcp,hive,mssql,mysql,pagerduty,postgresql,redshift,s3,snowflake,spark,sqlalchemy,teradata,test,trino,verticaBefore
pip install, you may need to install some system packages.
- For extras that will install - psycopg2-binary(- postgresqland- redshift), the- pg_configexecutable must be on the system already- sudo apt-get install -y libpq-dev
 or
 brew install postgresql
- For extras that will install - pyodbc(- dremioand- mssql), you will need- unixodbc- sudo apt-get install -y unixodbc-dev
 or
 brew install unixodbc
 Macs with M1 ARM chips may need additional compiler/linker options as well
 export LDFLAGS="-L/opt/homebrew/Cellar/unixodbc/[your version]/lib"
 export CPPFLAGS="-I/opt/homebrew/Cellar/unixodbc/[your version]/include"
Confirm that tests are passing (only against pandas and sqlalchemy with sqlite), without the need for running any Docker containers.
ulimit -n 4096
pytest -v
In your
~/.zshrcor~/.bashrcfile, you will want to addulimit -n 4096so that it is already set for future runs. You WILL eventually see many tests failing withOSError: [Errno 24] Too many open filesif you do not set it!
Later on, try setting up the full dev environment (as mentioned in step 6) when you are ready for more robust testing of your custom Expectations!
5. Create a new virtual environment
- Make a new virtual environment (e.g. using virtualenv or conda), name it “great_expectations_dev” or similar. 
- Ex virtualenv: - python3 -m venv <path_to_environments_folder\>/great_expectations_devand then- <source path_to_environments_folder\>/great_expectations_dev/bin/activate
- Ex conda: - conda create --name great_expectations_devand then- conda activate great_expectations_dev
This is not required, but highly recommended.
6. Install dependencies from requirements-dev.txt
- pip install -r requirements-dev.txt -c constraints-dev.txt
- MacOS users will be able to pip / pip3 install - requirements-dev.txtusing the above command from within conda, yet Windows users utilizing a conda environment will need to individually install all files within requirements-dev.txt
- This will ensure that sure you have the right libraries installed in your Python environment. - Note that you can also substitute requirements-dev-test.txt to only install requirements required for testing all backends, and requirements-dev-spark.txt or requirements-dev-sqlalchemy.txt if you would like to add support for Spark or SQLAlchemy tests, respectively. For some database backends, such as MSSQL additional driver installation may required in your environment; see below for more information. 
 
7. Install great_expectations from your cloned repo
- pip install -e .- -ewill install Great Expectations in “editable” mode. This is not required, but is often very convenient as a developer.
 
(Optional) Configure resources for testing and documentation
Depending on which features of Great Expectations you want to work on, you may want to configure different backends for local testing, such as PostgreSQL and Spark. Also, there are a couple of extra steps if you want to build documentation locally.
If you want to develop against local PostgreSQL:
- To simplify setup, the repository includes a - docker-composefile that can stand up a local PostgreSQL container. To use it, you’ll need to have Docker installed.
- Navigate to - assets/docker/postgresqlin your- great_expectationsrepo and run- docker-compose up -d
- Within the same directory, you can run - docker-compose psto verify that the container is running. You should see something like:- Name Command State Ports
 ———————————————————————————————————————————
 postgresql_travis_db_1 docker-entrypoint.sh postgres Up 0.0.0.0:5432->5432/tcp
- Once you’re done testing, you can shut down your PostgreSQL container by running - docker-compose downfrom the same directory.
- Caution: If another service is using port 5432, Docker may start the container but silently fail to set up the port. In that case, you will probably see errors like this: - psycopg2.OperationalError: could not connect to server: Connection refused
 Is the server running on host "localhost" (::1) and accepting
 TCP/IP connections on port 5432?
 could not connect to server: Connection refused
 Is the server running on host "localhost" (127.0.0.1) and accepting
 TCP/IP connections on port 5432?
- Or this… - sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) FATAL: database "test_ci" does not exist
 (Background on this error at: http://sqlalche.me/e/e3q8)
- Once the local PostgreSQL container is working, the tests against the PostgreSQL backend can be run using the - --postgresqlflag.- pytest -v --postgresql
If you want to develop against local MySQL:
- To simplify setup, the repository includes a - docker-composefile that can stand up a local MySQL container. To use it, you’ll need to have Docker installed.
- Navigate to - assets/docker/mysqlin your- great_expectationsrepo and run- docker-compose up -d
- Within the same directory, you can run - docker-compose psto verify that the container is running. You should see something like:- Name Command State Ports
 ------------------------------------------------------------------------------------------
 mysql_mysql_db_1 docker-entrypoint.sh mysqld Up 0.0.0.0:3306->3306/tcp, 33060/tcp
- Once the local MySQL container is working, the tests against the MySQL backend can be run using the - --mysqlflag.- pytest -v --mysql
- Once you’re done testing, you can shut down your MySQL container by running - docker-compose downfrom the same directory.
- Caution: If another service is using port 3306, Docker may start the container but silently fail to set up the port. 
If you have a Silicon Mac (M1) this Docker image does not work
If you want to develop against local Spark:
- In most cases, - pip install requirements-dev.txtshould set up pyspark for you.
- If you don’t have Java installed, you will probably need to install it and set your - PATHor- JAVA_HOMEenvironment variables appropriately.
- You can find official installation instructions for Spark here. 
If you want to build documentation locally:
- pip install -r docs/requirements.txt
- To build documentation, the command is - cd docs; make html
- Documentation will be generated in - docs/build/html/with the- index.htmlas the index page.
- Note: we use - autoapito generate API reference docs, but it’s not compatible with pandas 1.1.0. You’ll need to have pandas 1.0.5 (or a previous version) installed in order to successfully build docs.
Run tests to confirm that everything is working
- You can run all tests by running pytestin the great_expectations directory root. Please see Testing for testing options and details.
Start coding!
At this point, you have everything you need to start coding!