
import { Container, Row, Col } from 'react-bootstrap';

import SiteCard from '../../components/SiteCard';
import SEOComponent from '../../components/SEOComponent';




const HowTo = ({ url }) => {

    return (
        <>
            <SEOComponent 
                title={'Modernising House Price Indices with Machine Learning'}
                description={'Modernising House Price Indices with Machine Learning'}
                url={url}
                imageUrl={'https://otta.property/blog_images/how_to/England.png'}
            />
            <Container>
                <SiteCard 
                    header={'otta.property index'}
                    title={'Making a modern house price index using Machine Learning'}
                    content={
                        <Container>
                        <Row>
                        <Col md={10} style={{margin: 'auto'}}>
                                <Row>
                                <h2><strong>N.B.: This post is now out of date, but I'm leaving it here for posterity.</strong></h2>
                                <h2>Introduction</h2>
                                <hr/>
                                <p>
                                    It's clear that my approach of relying on the raw data from the Land Registry to create a house price index is not perfect.
                                    What I'm describing below is an attempt at using what I know (data science) to create a more accurate index for property prices in
                                England and Wales.
                                </p>

                                <p>
                                Creating an accurate index is about ensuring consistency in the items being compared over time. 
                                Traditional indexes, such as the Retail Price Index (RPI) or Consumer Price Index (CPI), often use a "basket of goods" approach. 
                                This basket is a carefully selected group of items whose prices are tracked monthly, 
                                ensuring a representative and stable sample. Much has been written about <a href="https://blog.ons.gov.uk/2023/07/19/keeping-it-consistent-how-we-quality-adjust-cpi/">how the ONS quality adjusts the CPI </a> 
                                to ensure that the index remains relevant and accurate.
                                </p>
                                <p>
                                    When it comes to housing transactions, properties differ from month to month and the mix of houses will differ too. 
                                    I've relied on using the sheer volume of data points to overcome this issue, but it's clear 
                                    that to be able to make a direct comparison across indices (and months), we need a better approach.
                                </p>
                                <h2>Defining the problem</h2>
                                <hr/>
                                </Row>
                                <Row style={{textAlign: 'center', margin: '2rem'}}>
                                <h5>We want to directly compare one month's 
                                    property transactions with another.</h5>
                                </Row>
                                <Row>
                                <p>This is challenging using raw data because:</p>
                                <p>
                                    <li>The mix of houses sold changes monthly.</li>
                                    <li>Houses themselves change over time (renovations, deterioration).</li>
                                    <li>We're looking to compare "like with like" without having the same set of houses each month.</li>
                                    <li>We must account for changing external factors (e.g., location desirability).</li>
                                </p>
                                <p>
                                We need a method that can handle these variables and still produce a consistent, 
                                reliable index reflecting true market price changes, not just the variance in the data.
                                </p>
                                </Row>
                                <Row>
                                <h2>The current ONS approach: hedonic regression</h2>
                                <hr/>
                                <p>
                                The Office for National Statistics (ONS) in the UK has been tackling this challenge using hedonic regression since 2016. 
                                This well-established method treats a property as a bundle of price-determining characteristics,
                                 such as location, size, and type of property. Using data on the latest property transactions,
                                  hedonic regression estimates an average price for a property that accounts for differences in each of these characteristics, 
                                  effectively stripping out quality changes to give a modelled price. 
                                  </p>
                                    <p> 
                                  These characteristics are:
                                </p>
                                <p>
                                    <li>Location</li>
                                    <li>Property type</li>
                                    <li>Number of rooms</li>
                                    <li>Floor area</li>
                                    <li>ACORN area classification</li>
                                    <li>New or old property</li>
                                </p>
                                <p>
                                The process involves mix-adjustment, where the property market is divided into similar groupings 
                                or 'strata' to compare like-with-like each period. These strata are then combined using weights
                                 to produce published indices.

                                Further detail can be found <a href="https://blog.ons.gov.uk/2023/11/15/on-the-market-how-the-ons-measures-property-prices/">here</a>
                                </p>
                                </Row>
                                <Row>
                                    <h2>An alternative approach: a machine learning methodology</h2>
                                    <hr/>
                                    <p>
                                    I propose using Machine learning (ML) as the answer to the problem described above. Let's think about why ML might be a good fit:
                                    </p>
                                    <p>
                                    <li style={{marginBottom: '1rem'}}>
                                        ML models can deal comfortably with multivariate data. In linear regression type models, 
                                        this is arguably very comparable to hedonic regression... but with tree based or ensemble type models, 
                                        ML can go further and parse out non-linear relationships which more standard equation based models will struggle with.
                                    </li>
                                    <li style={{marginBottom: '1rem'}}>
                                        An ML model will adapt across the months by incorporating factors that might influence a property's price,
                                         even if these factors are not explicitly defined. For example, 
                                         if a new community centre or motorway is opened nearby, the changes in price (sensitive to location) 
                                         will be detected and the model will be updated to reflect this. The result
                                          is a hyper local model adapting to a changing built environment without manual intervention (as is the current
                                           approach with hedonic regression).
                                    </li>
                                    <li style={{marginBottom: '1rem'}}>
                                        Rather than eliminating variables between properties (done in hedonic regression), 
                                        an ML model is designed to understand how the variables affect prices inherently. This is <b>extremely </b> 
                                        powerful because it means that the model can be used for predicting any property type as 
                                        long as the model has been exposed to enough data. In practice, this means that an 'index' month can 
                                        be used again and again for each month; resulting in a directly comparable result across many months.
                                    </li>
                                    <li style={{marginBottom: '1rem'}}>
                                    Speed - the training of ML models take seconds on modern hardware. Also - the weights
                                     described in the hedonic regression approach are calculated by the model itself. The model does not need
                                     manual intervention to update the weights as the market develops; this is a much simpler process than the current approach.
                                    </li>
                                    </p>
                                    <h2>So let's get it done!</h2>
                                    <hr/>
                                    <h4>Step 1: Data Collection</h4>
                                    <p>We start by merging various datasets to create a comprehensive set for model creation.</p>
                                    <p>
                                        The datasets include:
                                        </p>
                                        <p>
                                        <li>Postcode data (for lat, long coordinates - mapping a string “AB12 3CD” to a float 51.5, -1.5)</li>
                                        <li>EPC data (this includes total floor area amongst other things)</li>
                                        <li>Prices Paid (the price paid for the property alongside the property type, freehold/leasehold, old/new etc)</li>
                                        </p>
                                    <h4>Step 2: Feature Selection</h4>
                                    <p>
                                    From the datasets described above, we curate a set of columns which we know 
                                    might affect the price of a property (and is broadly aligned with the hedonic approach).
                                    </p>
                                    <p>
                                        <li>Property type</li>
                                        <li>Location</li>
                                        <li>Floor area</li>
                                        <li>New or old property</li>
                                        <li>Freehold or leasehold</li>
                                    </p>

                                    <h4>Step 3: Model Creation</h4>
                                    <p>
                                    We train a model for each month. We split each month's datasets into a 'train' and 'test' dataset. 
                                    The train dataset is 80% of the data, and the remaining 20% is used to check the quality of the fit.
                                    </p>
                                    <p>
                                    A good machine learning model is one which has been trained on a specific subset of data but can then be used in a more general fashion to predict on unseen data.
                                    In our case, we are trying to train a model on how features map to the price of a property. There are some various
                                    intermediate steps which must take place (converting some input values from words to numbers) but the aim is to have a set of numbers going in and a single number coming out: the price.
                                    During model training, the model will iteratively minimise the error between the predicted price and the actual price by changing its internal understanding
                                    of how the features affect the price of a property. That's really all there is to ML - it's about minimising the difference
                                    between the predicted and actual values.                                
                                    </p>
                                    <p>
                                    Below, we can see the test dataset's known price plotted against the predicted price.
                                    I've selected a plot to show how on the whole, the modelling approach works well. 
                                    On some occasions, we see a low r squared value and this is basically saying 
                                    “the features alone do not explain all of the variance in the observed price”,
                                    but the comparatively low median % error also confirms that on average we are achieving a level 
                                    of accuracy to within approximately ~3%.
                                    </p>
        
                                    <div style={{
                                    display: 'flex',
                                    flexDirection: 'column',
                                    alignItems: 'center',
                                    gap: '20px'  // Adds space between the images
                                    }}>
                                    <h3>Model performance</h3>
                                    <h5>An example month</h5>
                                    <img src="/blog_images/how_to/2006-01_model.png" alt="England" style={{width: '80%'}}/>
                                    <h5>The most recent month</h5>
                                    <img src="/blog_images/how_to/2024-05_model.png" alt="England" style={{width: '80%'}}/>
                                    </div>
                                    <h4>Step 4: Using the model(s) to create an index.</h4>
                                    <p>
                                    We select a standard month as a baseline for predictions. We then loop through all the 
                                    models trained for each month and make a prediction for each property within the baseline
                                    month for the month on which that model was trained. This results in a direct comparison from month to month.
                                    </p> 
                                    <img src="/blog_images/how_to/England.png" alt="England" style={{width: '100%'}}/>
                                </Row>
                                <Row>
                                    <h3>Conclusion</h3>
                                    <hr/>
                                    <p>
                                    The approach I've outlined above is a quick, relatively simple way to create a house price index. It requires no
                                    manual weights or adjustments and can be used to create an index within minutes of the data being released.
                                    </p>
                                    <p>
                                        I'm not proposing that this is a replacement for the ONS's hedonic regression approach, or a replacement for any
                                        other index. I'm proposing that this provides more choice and a more timely delivery of reliable house price insight data.
                                    </p>
                                    <p>
                                        The current delivery schedule for the ONS's house price index up to 2 months after the data used to create the index is 
                                        released.
                                    </p>
                                    <p>
                                        The method described above can be used to create an index within minutes of the data being released. I think this is 
                                        a pretty big deal!
                                    </p>
                                </Row> 
                                <Row>
                                
                                <h4>Disclaimer</h4>
                                <p>
                                The method described in this article presents a novel approach to creating house price indexes using machine learning techniques. 
                                To the best of our knowledge, this method has not been previously published or implemented elsewhere.
                                This article is intended to introduce this novel concept while protecting our intellectual property rights. 
                                </p>
                                <p>
                                The ideas, methods, and techniques presented are protected by copyright law and are the property of Inorite Ltd.
                                This statement clearly establishes Inorite Ltd as the copyright holder and reserves all rights to the intellectual property described.
                                Unauthorised use, reproduction, or implementation of these concepts without express written permission is 
                                strictly prohibited and may constitute copyright infringement.
                                Readers are advised that this information is shared for educational and informational purposes only. 
                                Implementation or commercial use of these concepts requires explicit authorisation from the copyright holder.
                                </p>
                                </Row>
                                



                                    

                                  
                                

                      </Col>
                        </Row>
                        </Container>
                    }
                    />
                
                            
            </Container>
        </>
    );
};


export default HowTo;